Issue410

Title iPDB rejects candidate patterns due to their size although they are small enough
Priority bug Status resolved
Superseder Nosy List malte, silvan
Assigned To silvan Keywords
Optional summary

Created on 2014-01-08.11:12:16 by silvan, last changed by silvan.

Messages
msg2971 (view) Author: silvan Date: 2014-02-15.17:33:04
Closed.
msg2969 (view) Author: malte Date: 2014-02-15.15:13:17
OK, the newest results are quite close to each other, so I don't think we need
to change the current parameter settings.

I agree that this one can be merged. Thanks!
msg2968 (view) Author: silvan Date: 2014-02-15.11:58:15
Interesting findings.

I'll just paste the most recent experiment in, where I used smaller values for
pdb_max_size (1500000, 1000000, 500000) and also for collection_max_size
(20000000 (default) and 15000000).

http://ai.cs.unibas.ch/_tmp_files/sieverss/2014-02-14-issue410-v1-conf.html

I'll have another more close look at it later. But as far as I understand your
comment, you think that issue408 should deal with the running out of time and
also possibly find new better default configs.

Maybe we can even merge this one already? Issue408 would anyway need the code
merged in to find useful default configs.
msg2967 (view) Author: malte Date: 2014-02-14.22:00:53
I had a closer look at the data, and I think the main problem isn't that the
PDBs we generate use too much memory. Here's what I got from the data:

- The new version is better than the previous one w.r.t. expansions (which
confirms that it indeed finds better patterns, presumably because it permits
larger ones).

- On the two tasks we lose compared to the old version, we run out of time
during heuristic construction.

- Comparing the "search_returncode" attribute of the two versions, the new
version has fewer cases of running out of memory (exit code 6) and more cases of
running out of time (exit code 152) than the old one. In detail:
  - There are 2 cases where the old version succeeds and the new one runs out of
time.
  - There are 10 cases where the old version runs out of memory and the new one
runs out of time.
  - There is 1 case where the old version runs out of time and the new version
runs out of memory.

So I think our default max limits might still be reasonable w.r.t. memory for
PDBs in absolute terms, but sometimes we spend too much time on the heuristic
precomputation. Changing the max states parameters might of course fix that, but
it's an indirect fix: max states measure memory, not time.

I think a more reliable fix might be to introduce a time limit parameter for the
PDB generation phase. Florian already implemented this once, I think, and I
guess it would be a good idea to merge this soon.

In summary: it will be interesting to see the results for different parameter
settings, but in the long run we should really integrate the max time setting,
and at this point, I think we need to run experiments again to see what the good
parameters are.
msg2965 (view) Author: malte Date: 2014-02-14.15:04:23
Interesting. I'll have another look at the diff to remind myself which things
have changed, but in the meantime it might be useful to run experiments with
different parameter settings.
msg2964 (view) Author: silvan Date: 2014-02-14.15:01:38
Here are some experimental findings:
http://ai.cs.unibas.ch/_tmp_files/sieverss/2014-02-13-issue410-v1-conf.html

I am afraid that the default config turns out to become worse with the fix to
the pdb_max_size computation. I interpret it as follows:
- We now allow for much larger pattern candidates (previously, we compare the
collection's size against pdb_max_size rather than the pattern's size)
- This yields to larger pdb collections and thus a much higher memory usage as
can be seen from the experiments
- We run out of memory

I would try to reduce the parameters pdb_max_size and maybe also
collection_max_size to check their influence.

Do you have further suggestions?
msg2908 (view) Author: silvan Date: 2014-01-08.15:29:19
Malte, I sent you a code review invitation.
msg2904 (view) Author: silvan Date: 2014-01-08.11:22:46
Another error: num_improvements was not computed correctly, as it was set to 0
only in hill_climbing rather than at construction time. Also in initialize(),
generate_candidate_patterns() is called and patterns can be rejected due to
violating the size constraint.
msg2902 (view) Author: silvan Date: 2014-01-08.11:12:16
In the iPDB implementation, generate_candidate_patterns takes the size of the
current canonical heuristic function to test whether the given candidate can be
extended or not, rather than the size of the candidate.
History
Date User Action Args
2014-02-15 17:33:04silvansetstatus: in-progress -> resolved
messages: + msg2971
2014-02-15 15:13:17maltesetmessages: + msg2969
2014-02-15 11:58:15silvansetmessages: + msg2968
2014-02-14 22:00:53maltesetmessages: + msg2967
2014-02-14 15:04:23maltesetmessages: + msg2965
2014-02-14 15:01:38silvansetmessages: + msg2964
2014-01-08 15:29:19silvansetnosy: + malte
messages: + msg2908
2014-01-08 11:22:46silvansetmessages: + msg2904
2014-01-08 11:12:16silvancreate