I had a closer look at the data, and I think the main problem isn't that the
PDBs we generate use too much memory. Here's what I got from the data:
- The new version is better than the previous one w.r.t. expansions (which
confirms that it indeed finds better patterns, presumably because it permits
larger ones).
- On the two tasks we lose compared to the old version, we run out of time
during heuristic construction.
- Comparing the "search_returncode" attribute of the two versions, the new
version has fewer cases of running out of memory (exit code 6) and more cases of
running out of time (exit code 152) than the old one. In detail:
- There are 2 cases where the old version succeeds and the new one runs out of
time.
- There are 10 cases where the old version runs out of memory and the new one
runs out of time.
- There is 1 case where the old version runs out of time and the new version
runs out of memory.
So I think our default max limits might still be reasonable w.r.t. memory for
PDBs in absolute terms, but sometimes we spend too much time on the heuristic
precomputation. Changing the max states parameters might of course fix that, but
it's an indirect fix: max states measure memory, not time.
I think a more reliable fix might be to introduce a time limit parameter for the
PDB generation phase. Florian already implemented this once, I think, and I
guess it would be a good idea to merge this soon.
In summary: it will be interesting to see the results for different parameter
settings, but in the long run we should really integrate the max time setting,
and at this point, I think we need to run experiments again to see what the good
parameters are.
|