Issue416

Title Get rid of global state registry
Priority feature Status resolved
Superseder Nosy List florian, jendrik, malte
Assigned To florian Keywords
Optional summary

Created on 2014-02-07.20:08:33 by florian, last changed by florian.

Messages
msg5449 (view) Author: malte Date: 2016-06-11.10:02:59
Please do.
msg5448 (view) Author: florian Date: 2016-06-10.19:54:24
I implemented your suggestions. Should I merge?
msg5445 (view) Author: malte Date: 2016-06-10.16:48:30
I made a few comments.
msg5444 (view) Author: malte Date: 2016-06-10.16:35:17
I'll have a quick look.
msg5443 (view) Author: florian Date: 2016-06-10.16:33:31
I agree. Should we merge this now or did you want to have a look at the code first?
msg5442 (view) Author: malte Date: 2016-06-10.16:01:17
I'm not sure what the full LAMA plots measure. As far as I know, "total time"
and "memory" do not meaningfully measure anything for anytime configurations.
I'd focus on coverage and quality for such configs, which look good.
msg5439 (view) Author: florian Date: 2016-06-09.20:51:34
The other experiment just finished.

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416-v2-lama-issue416-v2-base-issue416-v2-compare.html

Again, almost no effect on memory:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_memory_ehc_lm_zhu.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_memory_lama_first.png

Total time looks similar to the optimal case but LAMA-first is more noisy:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_total_time_ehc_lm_zhu.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_total_time_lama_first.png

I also have the plots for the full LAMA configuration, but I don't know how
useful they are. There seem to be trends for some domains:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_memory_seq_sat_lama_2011.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_total_time_seq_sat_lama_2011.png
msg5438 (view) Author: malte Date: 2016-06-09.20:14:39
Performance looks good to me. (I only looked at the blind search plots.)
msg5437 (view) Author: florian Date: 2016-06-09.17:36:06
Thanks, Jendrik. The first set of experiments is already done. A second set with
landmark configs is still running.

For the optimal configs, results look fine to me: almost no effect in memory and
about 1-2% performance drop in total time.

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416-v2-issue416-v2-base-issue416-v2-compare.html

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_memory_astar-blind.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_memory_astar-ipdb.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_memory_astar-lmcut.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_memory_astar-seq_opt_bjolp.png

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_total_time_astar-blind.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_total_time_astar-ipdb.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_total_time_astar-lmcut.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v2_total_time_astar-seq_opt_bjolp.png
msg5436 (view) Author: jendrik Date: 2016-06-09.13:27:34
I made some comments. The code looks good to me.
msg5435 (view) Author: jendrik Date: 2016-06-09.12:26:13
I'll have a look. Here is the updated URL:
https://bitbucket.org/FlorianPommerening/downward-issues/pull-requests/17
msg5434 (view) Author: florian Date: 2016-06-09.12:06:42
Now that issue592 is merged, I think this is ready for another round of review.

There still is a temporary hack for the landmark code (issue551) and a TODO for
handling the initial state (see issue509).

https://bitbucket.org/flogo/downward-issues/pull-requests/17/issue416/diff

I'll re-run the experiments for the new version.
msg5422 (view) Author: florian Date: 2016-06-03.20:22:42
issue627 is done now, so we can continue here. The change from reach_state to
notify_initial_state and notify_state_transition (see msg5120) is already done
in the pull request. However, it looks like issue592 will be merged soon, so
I'll wait with experiments until it is.
msg5141 (view) Author: malte Date: 2016-01-22.12:30:30
> * In ehc_lm_zhu the cost of psr-middle/p31 changes even though the task is
> solved well within the time limit. Is there something non-deterministic in this
> code?

Nothing that I'd know of.
msg5140 (view) Author: florian Date: 2016-01-22.12:18:06
I reported the VAL issue at https://github.com/KCL-Planning/VAL/issues/10
msg5137 (view) Author: malte Date: 2016-01-22.12:02:07
Ah, I found it: issue481. In particular, check msg4867 starting from "One note:"
and msg4916.
msg5136 (view) Author: malte Date: 2016-01-22.12:00:15
This is not new. I observed this in another issue recently and asked if someone
would like to report it to the VAL authors. I don't know if someone has done it,
but we should check to make sure we don't create duplicates there. Unfortunately
I don't recall the issue, or who was cc'ed on it. But if you ask Jendrik or
Silvan, perhaps one of them remembers.
msg5135 (view) Author: florian Date: 2016-01-22.11:58:46
I verified that the unexplained error was caused by VAL. Checking only the first
plan takes more memory than we have in the 32-bit version. Checking only the
second plan works locally (but just barely within the 4GB limit) and reports the
plan as valid. When checking both plans in one call, VAL reports the mem-out for
the first, but then reports the second plan as invalid. I'll report this to the
VAL developers.

I guess there is not much we can do about this. Switching to the 64-bit version
might help, but this would require more memory per run, so we couldn't run one
task per core. Since this only happens in one task, I suggest we ignore it.
msg5133 (view) Author: florian Date: 2016-01-22.08:20:28
A similar picture for satisficing configs:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416-v1-lama-issue416-base-issue416-v1-compare.html

Two things stand out to me:
* There is an unexplained error in psr-large/p41. This seems to be a bug in the
base version already. The log shows that the search finds two plans and then
runs out of memory during the third search. The validator then runs out of
memory checking the first plan and reports the second plan as invalid (Goal not
satisfied). This might be an issue with VAL, such as a bad recovery after the
mem-out. I can look into it and open another issue for it.
* In ehc_lm_zhu the cost of psr-middle/p31 changes even though the task is
solved well within the time limit. Is there something non-deterministic in this
code?

Memory:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_memory_ehc_lm_zhu.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_memory_lama_first.png

Time:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_total_time_ehc_lm_zhu.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_total_time_lama_first.png



I also generated plots for Lama, but since this is a portfolio, I don't think
the numbers will help that much:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_memory_seq_sat_lama_2011.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_total_time_seq_sat_lama_2011.png
msg5132 (view) Author: florian Date: 2016-01-21.22:34:35
I'm still waiting on the change from issue627, but this affects only the CEGAR
code, so I already ran experiments for other configs:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416-v1-issue416-base-issue416-v1-compare.html

Memory is almost completely unaffected:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_memory_astar-blind.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_memory_astar-ipdb.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_memory_astar-lmcut.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_memory_astar-seq_opt_bjolp.png

Total time is varying a lot mostly between -10% and +10%, and there is a trend
towards slower times:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_total_time_astar-blind.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_total_time_astar-ipdb.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_total_time_astar-lmcut.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue416_base_v1_total_time_astar-seq_opt_bjolp.png
msg5125 (view) Author: florian Date: 2016-01-20.16:03:17
Referenced msg5120 in summary.
msg5108 (view) Author: florian Date: 2016-01-19.10:10:04
Good thing you mentioned that. The test run already had an unexplained error in
seq_opt_bjolp. It turns out that the hack I introduced to work around issue551
doesn't work. The landmark heuristic need the registered initial state from the
same registry that the search uses because this registry is later used to access
a per state information. Later states are registered by the search and passed to
the landmark code and the landmark code uses the id of the state to access
reached landmarks.

I think this brings up an interesting question: who should be responsible for
registering the states that the landmark heuristic uses to access its data? I
suggest we discuss this in issue551.

For now, I would put this issue on ice until issue551 and issue627 are merged.
msg5107 (view) Author: malte Date: 2016-01-18.23:36:38
Sounds good. If it affects per-state information, perhaps also seq_opt_bjolp.
msg5106 (view) Author: florian Date: 2016-01-18.23:21:48
I updated the pull request by making the axiom evaluator and int packer global
again.

What kind of configurations should I run? How about astar with blind, lmcut, and
ipdb heuristic on the optimal suite?
msg5103 (view) Author: malte Date: 2016-01-16.13:29:38
Experiments would be good, but also I have one comment regarding the code (added
to bitbucket).
msg5102 (view) Author: florian Date: 2016-01-16.13:13:03
Thanks, I'll do that. However, I wanted to do a smaller step in this issue.
Instead of switching to the task interface in one step, I wanted to first avoid
having the global variables g_state_registry and g_initial_state, and avoid
using global variables in the StateRegistry class.

I think it looks ok in most places. There are two temporary hacks for the
landmark (issue551) and CEGAR (issue627) code and a TODO for handling the
initial state (see issue509).

https://bitbucket.org/flogo/downward-issues/pull-requests/17/issue416/diff

I can run some experiments to see if we lose any performance.
msg5098 (view) Author: malte Date: 2016-01-16.09:52:31
I wouldn't be too worried about not passing in an operator at this point. It
would be nice to be able to avoid having the operator here, but it's not
critical right now, and not that closely related to this issue.

What I would try is to move the details of the operator semantics (with
conditional effects etc.) out of this method and and make them a function of the
task classes (e.g. of OperatorProxy); something like
OperatorProxy::apply_to_buffer(...).
msg5097 (view) Author: florian Date: 2016-01-16.09:10:50
I started working on this and got rid of the global int packer first. The next
step would be to move the successor state generation out of the state registry
and I'm not sure how to do this efficiently.

Currently, the state registry copies the parent into the place where the
successor should end up if it is a new state. It then changes values according
to the operator (and axioms) and checks if the state is new. If it's new, it
does not have to be touched again; if it's old, it is popped from the vector. So
there is just one copy of the state values.

We wanted to move the successor state generation into the search, passing just
the finished buffer to the registry. Is there a way to avoid copying the buffer
twice (once in the search and once in the registry)?

I thought about using a callback, but this would still need the operator in its
signature.

A two-step process might also work: ask the state registry to create an in-place
copy, then modify it and ask for a registered state of the last copy. But the
interface would be fragile, e.g., you are not allowed to ask for a second copy
before registering the first one.

Any suggestions?
msg2936 (view) Author: florian Date: 2014-02-07.20:08:33
We should move the global state registry inside the search engine class to allow
having multiple search engines in parallel in the long run.
History
Date User Action Args
2016-06-11 12:07:37floriansetstatus: reviewing -> resolved
2016-06-11 10:02:59maltesetmessages: + msg5449
2016-06-10 19:54:24floriansetmessages: + msg5448
2016-06-10 16:48:30maltesetmessages: + msg5445
2016-06-10 16:35:17maltesetmessages: + msg5444
2016-06-10 16:33:31floriansetmessages: + msg5443
2016-06-10 16:01:17maltesetmessages: + msg5442
2016-06-09 20:51:34floriansetmessages: + msg5439
2016-06-09 20:14:39maltesetmessages: + msg5438
2016-06-09 17:36:06floriansetmessages: + msg5437
2016-06-09 13:27:34jendriksetmessages: + msg5436
2016-06-09 12:26:13jendriksetstatus: chatting -> reviewing
messages: + msg5435
2016-06-09 12:06:42floriansetmessages: + msg5434
summary: * Wait for issue592. ->
2016-06-03 20:22:42floriansetmessages: + msg5422
summary: * Wait for issue627. * Add methods notify_initial_state and notify_state_transition to Heuristic (see msg5120) to get rid of g_initial_state. -> * Wait for issue592.
2016-01-22 12:30:30maltesetmessages: + msg5141
2016-01-22 12:18:06floriansetmessages: + msg5140
2016-01-22 12:02:07maltesetmessages: + msg5137
2016-01-22 12:00:15maltesetmessages: + msg5136
2016-01-22 11:58:46floriansetmessages: + msg5135
2016-01-22 08:20:28floriansetmessages: + msg5133
2016-01-21 22:34:35floriansetmessages: + msg5132
2016-01-20 16:03:17floriansetmessages: + msg5125
summary: Wait for issue551 and issue627. -> * Wait for issue627. * Add methods notify_initial_state and notify_state_transition to Heuristic (see msg5120) to get rid of g_initial_state.
2016-01-19 10:10:04floriansetmessages: + msg5108
summary: Wait for issue551 and issue627.
2016-01-18 23:36:38maltesetmessages: + msg5107
2016-01-18 23:21:49floriansetmessages: + msg5106
2016-01-16 13:29:38maltesetmessages: + msg5103
2016-01-16 13:13:03floriansetmessages: + msg5102
2016-01-16 09:52:31maltesetmessages: + msg5098
2016-01-16 09:10:50floriansetstatus: unread -> chatting
assignedto: florian
messages: + msg5097
2014-02-08 13:25:03jendriksetnosy: + jendrik
2014-02-07 21:36:21maltesetnosy: + malte
2014-02-07 20:08:33floriancreate