I played around a bit with the meliae memory profiler for Python.
It's quite basic and not really documented. This seems to be the best available
information on it:
http://jam-bazaar.blogspot.de/2009/11/memory-debugging-with-meliae.html. I
mainly chose it because it is available in the standard Ubuntu repository,
though only for Python 2 (python-meliae).
Here is the memory summary for PSR-Large #50, our 2nd worst example, with a
memory snapshot taken right after the "Translating task" block in translate.py.
(I tried snapshots taken at various places within translate.main and
translate.pddl_to_sas. This place was where I saw the highest memory usage for
this task, probably because the next step is simplification, which for this task
is able to throw away lots and lots of things.)
Total 22045420 objects, 75 types, Total size = 2977.5MiB (3122177101 bytes)
Index Count % Size % Cum Max Kind
0 3634140 16 1250144160 40 40 344 Atom
1 4734669 21 531867936 17 57 7051064 list
2 5605588 25 423745744 13 70 1792 tuple
3 783439 3 275770528 8 79 352 PropositionalAxiom
4 544209 2 191561568 6 85 352 SASMutexGroup
5 798018 3 136621644 4 89 184 unicode
6 4724225 21 113381400 3 93 24 int
7 1095692 4 88312329 2 96 4871 str
8 468 0 51042144 1 9825166104 dict
9 113884 0 40087168 1 99 352 SASAxiom
10 110 0 16806320 0 9916777448 set
11 1541 0 530104 0 99 344 NegatedAtom
12 1122 0 394944 0 99 352 SASOperator
13 1122 0 394944 0 99 352 PropositionalAction
14 952 0 327488 0 99 344 TypedObject
15 103 0 294192 0 99 12624 module
16 272 0 245888 0 99 904 type
17 1402 0 179456 0 99 128 code
18 1449 0 173880 0 99 120 function
19 1057 0 84560 0 99 80 wrapper_descriptor
(I've slightly reformatted the table from meliae's output to avoid columns
running into each other.)
The top row shows us that 16% of objects in the snapshot are of class Atom, and
they add up to a size of 1250144160 bytes. So more than 1 GiB is locked up in
these atoms alone, and this is likely an underestimate of how much space they
use because the total size for everything reported (2977.5 MiB) is about 25%
lower than the actual memory usage, which is likely due to internal stuff and
losses that meliae cannot track.
Diving Size by Count in the first row, we see that (at least according to
meliae), each Atom takes 344 bytes. This is quite a lot because atoms only have
three attributes, and the size of an atom does not include the recursive size of
its attributes, just the equivalent of three references. To see if we can reduce
this, I added a __slots__ declaration to pddl.conditions.Literal (the base class
of Atom), and at least on my computer it leads to very large memory savings. I
tested Python 2.7 and Python 3.5:
psr-large:p50-s219-n100-l3-f30.pddl:
current code, Python 2.7: peak mem 4114668 KiB, runtime 210.68s
with __slots__, Python 2.7: peak mem 2946448 KiB, runtime 208.69s
current code, Python 3.5: peak mem 2685192 KiB, runtime 199.65s
with __slots__, Python 3.5: peak mem 2505324 KiB, runtime 198.54s
So we see that for this instance, __slots__ solves our memory problem, at least
in the sense that we should be able to translate it in a "normal" task on our grid.
We also see that Python 3.5 does much better than Python 2.7 here and that
__slots__ has much less impact there (although it still saves roughly 180 MiB
of memory). I think this is because Python 3 has recently seen great
improvements in the dict implementation for the case of object attribute dicts.
In particular, key-sharing dicts probably reduce the object overhead hugely,
which makes __slots__ much less useful. Here is a very nice talk on the topic
that I watched recently: https://www.youtube.com/watch?v=p33CVV29OG8
The data also indicates that we might want to investigate using Python 3 on the
grid in the future. I'm not sure how easy this would be and if suitable modules
are available out of the box. For Python 3, the minor version number makes a
huge difference in (time and memory) performance, so we would probably want a
fairly recent version, such as Python 3.5. With Python 3.5, we wouldn't have had
a memory issue for this task in the first place, so it's probably worth an
experiment.
[Addendum]
I have now also checked the worst Satellite and Scanalyzer tasks according to
Jendrik's data. Together with the PSR-Large task above, these are the top three
memory hogs. All other tasks in the top 10 come from the same domains, so
checking these three is hopefully sufficiently representative.
Here, I didn't repeat the memory profiling with meliae, though it might be a
good idea to do this and see if there are other classes that have a huge number
of instances, as PSR is in many ways special.
What I did observe without looking too closely is that in Satellite peak memory
usage is reached slightly later, I think somewhere close to the end of
simply.filter_unreachable_propositions. This makes sense because in Satellite
there is very little to prune here, and the filtering code itself needs memory
for its own computations.
satellite:p33-HC-pfile13.pddl:
current code, Python 2.7: peak mem 4165552 KiB, runtime 189.63s
with __slots__, Python 2.7: peak mem 3373232 KiB, runtime 176.42s
current code, Python 3.5: peak mem 3196556 KiB, runtime 199.98s
with __slots__, Python 3.5: peak mem 2936464 KiB, runtime 195.09s
So in this Satellite task, Python 2.7 is a bit faster than Python 3.5, but I
think Python 3.5 would still be a decent choice due to the lower memory usage.
Finally, the Scanalyzer results. Here, it looks like peak memory usage is
reached at a similar place as with Satellite, which makes sense to me.
scanalyzer-08-strips:p28.pddl:
current code, Python 2.7: peak mem 3936680 KiB, runtime 162.37s
with __slots__, Python 2.7: peak mem 2745500 KiB, runtime 156.30s
current code, Python 3.5: peak mem 2832972 KiB, runtime 177.85s
with __slots__, Python 3.5: peak mem 2440588 KiB, runtime 164.14s
Again, Python 3.5 is slightly slower, but more memory-efficient.
|