Issue 348: Rethink how and when to unpack states

Title	Rethink how and when to unpack states
Priority	wish	Status	resolved
Superseder		Nosy List	DominikDrexler, andrew.coles, cedric, florian, gabi, guillem, jendrik, johannes, malte, silvan
Assigned To	florian	Keywords
Optional summary	This is part of issue401 TODOs for prototype implementation: 1. ~~Return EvaluationContext instead of registered state in search engines.~~ 2. ~~Check for additional copies added by the change so far.~~ 3. ~~Unpack state data on demand.~~ 4. ~~Investigate the typical "life cycle" of a state and when it is unpacked/copied.~~ 5. ~~Decide when to unpack states.~~ 6. ~~Unpack states in successor generator only in the presence of axioms and otherwise on demand.~~

Created on 2012-08-30.13:28:12 by florian, last changed by silvan.

This is part of issue401

TODOs for prototype implementation:
1. Return EvaluationContext instead of registered state in search engines.
2. Check for additional copies added by the change so far.
3. Unpack state data on demand.
4. Investigate the typical "life cycle" of a state and when it is unpacked/copied.
5. Decide when to unpack states.
6. Unpack states in successor generator only in the presence of axioms and otherwise on demand.

Files
File name	Uploaded	Type	Edit	Remove
benchmark.cc	malte, 2021-01-21.13:29:04	text/x-c++src
branch.callgrind.out	florian, 2019-06-05.10:22:07	application/octet-stream
callgrind.airport.p08.base	florian, 2021-01-25.12:45:19	application/octet-stream
callgrind.airport.p08.version2-v3	florian, 2021-01-25.12:43:46	application/octet-stream
callgrind.airport.p09.default	florian, 2014-10-08.17:34:34	application/octet-stream
callgrind.airport.p09.patched	florian, 2014-10-08.17:34:50	application/octet-stream
callgrind.miconic.s8-4.base	florian, 2021-01-22.13:35:53	application/octet-stream
callgrind.miconic.s8-4.version1-v2	florian, 2021-01-22.13:36:12	application/octet-stream
default.callgrind.out	florian, 2019-06-05.10:22:20	application/octet-stream

Messages
msg9964 (view)	Author: silvan	Date: 2021-01-30.18:32:59
Regarding the Github actions: I don't understand the source of the error on the last commit of the branch, but the actions went through on the merge commit, if I see this right. Congrats for merging this old issue!
msg9962 (view)	Author: jendrik	Date: 2021-01-30.17:57:52
More than eight years after its creation, the issue is finally done. Congrats, that took quite some stamina!
msg9961 (view)	Author: malte	Date: 2021-01-30.17:45:25
Hooray! :-)
msg9958 (view)	Author: florian	Date: 2021-01-30.17:38:21
It took a few more iterations but I got rid of the warnings and finally merged the issue. Thanks everyone.
msg9957 (view)	Author: malte	Date: 2021-01-30.15:57:58
Feel free to merge, but we should also seek to make the Github actions green. One of them looks related to CPLEX setup and probably needs input from the Github actions experts, but at least one of them seems to be just a basic code change required (missing "override").
msg9956 (view)	Author: jendrik	Date: 2021-01-30.08:56:42
I approved the pull request now.
msg9955 (view)	Author: florian	Date: 2021-01-30.00:18:52
I added a change log entry, merged the last changes from default into the issue branch, fixed some comments that still mentioned GlobalState, and did some quick local tests on {lazy_greedy, EHC, A*} x {release, debug}. The new Github actions still complain and the review status of Jendrik is still "requested changes" but I still think this is ready to merge now.
msg9950 (view)	Author: malte	Date: 2021-01-29.14:12:47
Florian pushed some more changes after the code review. They look good to me, but probably require at least some local tests with lazy and eager search with and without debug mode to see that the EvaluationContext changes don't crash.
msg9948 (view)	Author: malte	Date: 2021-01-29.13:51:39
Experimental results look good, no reason not to merge. Unexplained errors: 1) We have the usual segmentation faults that we attribute to memory-related crashes in the translator. (Would be good to deal with this in some way at some point, but not here and now.) 2) We have too much output in parcprinter and in a few other cases for iPDB, but not exceeding the hard limit. Interpretation of results: Overall, we're faster than base and Dominik's previous version. Compared to base, our score_total_time differences are: +1.65 for lama-first +2.13 for EHC-FF -0.51 for iPDB +0.39 for LM-Cut +0.90 for blind +0.47 for lazy-greedy with h^FF and h^cea and preferred ops The improvement of blind search is unexpected, but we'll take it. We can swallow the slight penalty for iPDB. No substantial changes in memory usage, as expected. Overall, we lose a bit of coverage: +1 for lama-first -2 for EHC-FF But looking at the details, this may very well be noise. We lose the following tasks with EHC-FF: - depot-p09, solved by the baseline in 1768.27 seconds - mprime-p13, solved by the baseline in 1390.25 seconds - pipesworld-tankage-p40, solved by the baseline in 1782.17 seconds We gain the following task with EHC-FF: - mprime-p14, solved by v24 in 1554.73 seconds We observed recently that in some cases we see more dramatic noise than we're used to, and we don't know why, but it may very well be a grid issue. The mprime tasks are hard to explain otherwise, we don't see that much variation in this domain for the other tasks. The other two tasks would be explainable by "normal" noise.
msg9945 (view)	Author: malte	Date: 2021-01-29.13:20:10
Florian, Patrick, Silvan and I also reviewed the pull request, leading to various smaller local change suggestions, some for now and some as TODOs for later. Apart from the ones that should be done now (like making fewer attributes of State mutable), this looks fine.
msg9944 (view)	Author: florian	Date: 2021-01-29.13:18:44
Results for the experiment of v24 (the code we reviewed) are in. I also re-ran the code Dominik ran on the Freiburg Grid on our cluster so it can be compared to our results (version2-v3 in the data). https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue348-v24-eval.zip
msg9938 (view)	Author: jendrik	Date: 2021-01-29.10:54:07
I looked at the code and made some suggestions.
msg9886 (view)	Author: florian	Date: 2021-01-28.15:52:53
Point 6 in the summary is done and we decided that the remaining TODO in the summary will not be handled here. The performance of the goalcount heuristic should improve again when we complete our plan for the component interaction problem. I moved that point to issue559.
msg9884 (view)	Author: florian	Date: 2021-01-28.15:47:52
We discussed the results and found them acceptable. I cleaned up the commit history and the code a bit and added documentation. The changes I made are small compared to the revision that we ran the tests on. I would suggest that we review the code under the assumption that the results of v23 also apply to this version and I run another experiment to confirm this in parallel.
msg9876 (view)	Author: florian	Date: 2021-01-28.12:05:35
I ran experiments for the version of the code that uses a vector<int> in pdb::get_value and in the successor generator. For the former, we call State::unpack outside of loops when evaluating multiple PDBs, for the latter, we unpack within the successor generator (the public interface still takes a State). There are only small (positive) differences in coverage and memory is unaffected as expected. The search becomes faster for LAMA (the total time score summed over all instances increases by +1.99), EHC with h^FF (+2.62), and A* with LM-cut (+0.5). It becomes slower for blind search (-0.51), iPDB (-1.76), and lazy greedy search with h^FF and h^CEA (-0.4). Full Report: https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue348-v23-issue348-base-issue348-v23-compare.html Relative scatter plots for total time: https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue348-v23-issue348-base-issue348-v23-total_time-blind.png https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue348-v23-issue348-base-issue348-v23-total_time-ehc-ff.png https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue348-v23-issue348-base-issue348-v23-total_time-ipdb.png https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue348-v23-issue348-base-issue348-v23-total_time-lama.png https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue348-v23-issue348-base-issue348-v23-total_time-lazy.png https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue348-v23-issue348-base-issue348-v23-total_time-lmcut.png
msg9872 (view)	Author: florian	Date: 2021-01-27.23:29:39
I think changing the interface in a way that the subscript operator of State returns an int is a good idea regardless of performance. I tried this out and found that most users immediately call get_value() and there is only one slightly dodgy use of the actual FactProxy. I created issue997 for this in case it doesn't fit here. On the performance side of things, there is no big difference. I implemented this on top of version2-v3 and the profile of our airport instance improved a little bit but not by much (18330 vs. 18734 in v3 and 11979 in base). You can see the patch here (the iterator is a bit hacky but not used in this configuration): https://github.com/FlorianPommerening/downward/compare/issue348-version2...issue348-debug-no-fact-proxy
msg9870 (view)	Author: florian	Date: 2021-01-27.21:21:12
The use-vector-and-unpack version of the code required 12003 * 10^6 simulation steps in valgrind, very close to the base version. This was expected for iPDB. The real test is to see how it performs in other configs.
msg9869 (view)	Author: florian	Date: 2021-01-27.21:00:25
We discussed the performance loss in iPDB and saw the fundamental problem that we have classes (SuccessorGenerator and PDB in this case) where we ideally want better performance than what we would get from an general interface covering both packed and unpacked states. We discussed some alternatives to a polymorphic class: A) Using templates for the classes to support anything that has a state-like interface. This could cover vector<int> and State for example. Because State::operator[] currently returns a FactProxy, we would have to change this to int and (if needed) offer a different way of getting FactProxies. The templating would then introduce the usual problems that templates come with, e.g., that we have to implement the classes in the header, etc. B) Have a simple class UnpackedState and have the classes that require unpacked values for efficiency be explicit about this. This is similar to the situation in the main branch right now but we would try to make the choice of registered/unregistered independent of the choice of packed/unpacked. The downside of this would be that it may require duplicate implementations again of the kind we got rid of in this issue. C) In the longer run, a simple class containing unpacked data directly might be all we need. To make this efficient, we would have to take an even closer look at where states are copied and avoid copies more or less completely. This would require some work on the EvaluationContexts. D) Some classes that currently work on states, could also work on a lower-level representation (like vector<int>). Switching the SuccessorGenerator to this would mean that state always have to be unpacked before successors are generated. We want to evaluate if option D) is viable and also test what Jendrik suggested in msg9866.
msg9866 (view)	Author: jendrik	Date: 2021-01-27.19:55:59
Apart from the if-statement, the version pairs you mention also differ in whether they return a FactProxy or an int. Maybe it would be good to test a variant of v3 that returns an int?
msg9856 (view)	Author: florian	Date: 2021-01-27.13:15:52
I ran more profiles on different ways of accessing the state. The numbers I report here are from the profile of Airport/p08 with iPDB and measure the simulation steps in PatterCollectionGenerator::hillclimbing where the bulk of the time is spent and where the difference between the versions shows. All numbers are divided by 10^6 to make things more readable. The difference in the code can be seen in the commits on the debug branch (https://github.com/FlorianPommerening/downward/commits/issue348-debug) but I also have some mock code at the end of this message. base: 11979 int : 11779 (base - 200) vector: 12095 (base + 116) shared_ptr: 12405 (base + 426) int in State: 18344 (base + 6365) v3: 18734 (base + 6755) The main difference between versions "int " and "int in State" is that the if statement checks within operator[] whether unpacked data is available (and that the access happens inside the State class). The same is true for the difference between versions "shared_ptr" and "v3". So I assume that the if statement blocks some compiler optimization. ### base class State { vector<int> values; FactProxy operator[](size_t var_id) { return FactProxy(..., values[var_id]); } ... } pdb::get_values(const State &) {...} ### int * class State { shared_ptr<vector<int>> values; const int get_data() { return values->data(); } ... } pdb::get_values(const int ) {...} ### vector class State { shared_ptr<vector<int>> values; const vector<int> &get_values() { return values; } ... } pdb::get_values(const vector<int> &) {...} ### shared_ptr class State { shared_ptr<vector<int>> values; const shared_ptr<vector<int>> &get_values() { return values; } ... } pdb::get_values(const shared_ptr<vector<int>> &) {...} ### int in State class State { shared_ptr<vector<int>> values; const int data; // initialized to values->data() once FactProxy operator[](size_t var_id) { if (data) { return FactProxy(..., data[var_id]); } else {...} } ... } pdb::get_values(const State &) {...} ### v3 class State { shared_ptr<vector<int>> values; FactProxy operator[](size_t var_id) { if (values) { return FactProxy(..., (values)[var_id]); } else {...} } ... } pdb::get_values(const State &) {...}
msg9814 (view)	Author: malte	Date: 2021-01-26.18:42:02
> Do you have some time to look at this together? Not today, possibly tomorrow after the exam. > I don't really understand this: the old code had one indirection, this code has > none and the version of the new code that had one indirection (msg9807) still > had the overhead. I would have expected the code from msg9807 to have the same > performance as the base and this code to be faster. I think it also matters where the indirection happens, but I cannot articulate this very clearly. In some cases the compiler will be better able to determine that the pointer cannot be changed from the outside. This may also be related to inlining. > I also don't know where to go from here: the efficient code requires all the code > duplication again which we probably do not want to reintroduce. Understanding what is going on is a good starting point. We don't necessary need to get maximum performance everywhere immediately, but I think now is a good time to look at things carefully.
msg9813 (view)	Author: florian	Date: 2021-01-26.16:27:36
I implemented the version that Malte suggested in 993b9ed6 (https://github.com/FlorianPommerening/downward/commit/993b9ed657e5f2e4944078dbb0a823e1b27ebe20). This re-introduces the code duplication that we got rid off during the issue but I think this is OK since we only plan to use the cod for debugging. The profile for this was clear again and showed the same performance as the base code. I don't really understand this: the old code had one indirection, this code has none and the version of the new code that had one indirection (msg9807) still had the overhead. I would have expected the code from msg9807 to have the same performance as the base and this code to be faster. I also don't know where to go from here: the efficient code requires all the code duplication again which we probably do not want to reintroduce.
msg9807 (view)	Author: florian	Date: 2021-01-26.14:47:50
> If I understand this correctly, it only saves one indirection and makes the > State class more expensive. This is true but it also makes the code more comparable to the base implementation because that also has one indirection for accessing the vector<int>. I would have expected the difference to go away but today I cannot even replicate the 1.9 seconds, I measured yesterday. I undid the change yesterday and redid it today and got different results. I looked closely at the diff and am convinced it is the same (but I cannot check). Profiling the patched code had results that look almost identical to the unpatched version, still showing a similar difference in the number of profiling samples overall and in the individual methods. I double-checked that I profiled the new code and could see the patched source code in the profiler. If the results are true, the indirection should not be the problem but then I do not see what else could be the cause. Do you have some time to look at this together?
msg9805 (view)	Author: malte	Date: 2021-01-25.14:37:14
If I understand this correctly, it only saves one indirection and makes the State class more expensive. I would be interested in trying out the alternative solution to see what its impact is. A quick-and-dirty test that disables everything that isn't necessary for iPDB would be good enough for this. I'm happy to do this myself it it helps, but not today.
msg9804 (view)	Author: florian	Date: 2021-01-25.14:23:35
> For example, you could add a temporary get_data() method that returns an int . > It would make sure that the vector of state values exists and then return > values.data() (assuming the vector is called "values"). Then use this int > instead of the state reference to access the state contents. I did something similar locally. Instead of a method get_data, I added a mutable const int unpacked_data; to the State class and initialized it to values->data() at the time values is initialized. In operator[] I then swapped out (values) for unpacked_data. The issue with your suggestion is that it would require a lot of changes in all PDB classes and sampling methods if we want to save the overhead across more than one PDB. My change reduced the overhead in airport/p08 somewhat but not completely (eyeball average over 5 runs each): base ~1.75s version2-v3 ~2.00s unpacked_data ~1.90s I don't have time to profile this and look into more details right now.
msg9803 (view)	Author: malte	Date: 2021-01-25.13:33:22
I doubt this is related to the cache, but it could certainly be related to the additional indirection, especially if the compiler cannot prove that the pointers involved don't remain constant throughout the execution of the critical loop. If we are accessing the state by reference, I count three levels of indirection to get at the data: from the state reference to the state, which is a shared_ptr; from the shared_ptr to its contents, which include a vector; from the data pointer of this vector to the actual data. I can think of a number of ways to make "bulk access" cheaper by reducing this to one level of indirection. The question is what the beauty/speed trade-off is here. One things we could do is experiment with something potentially ugly to see if it has a performance impact, so that we can make a more informed decision. For example, you could add a temporary get_data() method that returns an int . It would make sure that the vector of state values exists and then return values.data() (assuming the vector is called "values"). Then use this int instead of the state reference to access the state contents.
msg9802 (view)	Author: florian	Date: 2021-01-25.12:41:03
Patrick and I profiled iPDB on Airport/p08 (I'll attach the files, don't confuse them with Airport/p09 which is already there from a previous code version). The results clearly showed that the only significant difference was in the pattern collection generator. Within that code, the difference was distributed over two methods: SuccessorGenerator::generate_applicable_ops and PatternDatabase::get_value. Both of them take a const reference to a state and access its values. In the contexts in which they are called, the state is always unpacked and unregistered, i.e., it consists of a shared pointer to a vector<int> and a bunch of nullptrs for registry, buffer, ID, state_packer. No part related to creating or copying states showed a difference in the profile, only accessing the unpacked data showed up. In the main branch, the State class has a vector<int> and the operator[] is implemented like this: FactProxy operator[](std::size_t var_id) const { assert(var_id < size()); return FactProxy(task, var_id, values[var_id]); } In the new code (version 2), the state has a shared_ptr<vector<int>> and is implemented like this: inline FactProxy State::operator[](std::size_t var_id) const { assert(var_id < size()); if (values) { return FactProxy(task, var_id, (*values)[var_id]); } else { // access to packed state. Not used in this profile. } } Both methods are inlined (one in-place, the other with the inline keyword further down in the header). We don't suspect the additional if statement to be the problem, especially since it will have the same result for a lot of calls in a row, which should be good for branch prediction. That leaves the additional indirection when accessing values. Could it be that this uses up more space in the cache because the shared pointer has to be cached in addition to the actual data? If this is the issue, we don't know an easy way around this. One complicated way would be to write a class analogous to the StateRegistry to store values of unpacked states. This way, we could store a pointer to the data directly in the state, rather than behind a shared_ptr.
msg9801 (view)	Author: florian	Date: 2021-01-24.13:24:21
The results for version 2 look good in blind search. There is still a negative effect of the patch but it is now very small and I think we could live with this. The results for LAMA and EHC still look OK (the effect size is even smaller than for blind search). But the iPDB configuration still shows a stronger negative effect. Airport/p08 or p09 look like good tasks for profiling this.
msg9800 (view)	Author: DominikDrexler	Date: 2021-01-23.10:34:43
Here are the experiments where version1-v3 and version2-v3 only unpack a state in the successor generation if axioms are used. https://mega.nz/file/0eZlgCaQ#0QmOc74ifkk4cX-MgE6AIhsYXQ6LbPGM96OQXX5zixg In comparison to the previous experiment (v2) where we always unpacked in the successor generation, the search time now gets much closer to the baseline.
msg9799 (view)	Author: malte	Date: 2021-01-22.15:33:56
> The next step we discussed is to change the code of get_successor_state > in a way that it will operate only on the packed data in tasks with no > axioms, while unpacking the data in tasks with axioms. Let me add that this different behavior with and without axioms is of course ugly, and perhaps it can eventually go away. As discussed elsewhere, it is wasteful in terms of memory that we store the values of derived variables in our permanent state storage. Florian's numbers shed light on another aspect of this: we currently need to evaluate axioms for all generated states because we need all variable values to look the state up in the hash table. But more than 90% of states are duplicates at least in the task we looked at here (duplicates: (#generated - #evaluated) / #generated). For the duplicate test, evaluating the axioms is not logically necessary. If two state agree on the non-derived variables, they must be the same. If the 90% duplicate ratio carries over to other tasks (a big if: I think this has a lot to do with the use of blind search and with the domain), this suggests that more than 90% of the calls to the axiom evaluator could be removed. This is somewhat orthogonal to the question whether or not to store derived variables in the state storage. We currently store and hash all variables. One option we previously discussed is to store and hash only the non-derived variables. But there is also the option of storing all variables but only using the non-derived ones for the hash. The same analysis that Florian did in msg9794 regarding the number of memory allocations could also be done regarding the number of calls to the axiom evaluator. If we did this currently, I assume we would see that we have #generated calls (at least). I think this could be reduced to #evaluated if we continue to store the derived variables but don't use them for lookup, and to #evaluated + #expanded if we don't store them. (#evaluated because the heuristics may need to look at the derived variables and #expanded because the successor generator might need to look, too -- plus the heuristic again in some configurations, for example with eager search using preferred operators). In tasks like this one, both of these alternatives look much better than the current behaviour. (This discussion should probably be eventually moved to an issue that deals with the axioms, but the analysis fits well with what we looked at for msg9794.)
msg9796 (view)	Author: florian	Date: 2021-01-22.13:40:11
I updated the TODO list for the summary. What my previous message didn't mention was that we expect to see a benefit mostly in version 2 (the one uses shared_ptr<vector<int>> instead of shared_ptr<StateData>) because in this case, slim states that do not have unpacked data do not require dynamic allocation. We still plan to implement the change in both versions to confirm this.
msg9795 (view)	Author: florian	Date: 2021-01-22.13:35:53
I'm attaching the profiles of the miconic task discussed in the previous message.
msg9794 (view)	Author: florian	Date: 2021-01-22.13:32:57
We looked closely at the results and saw that memory is unaffected as expected. For the running time, results varied with the configuration and domain. For configurations other than blind search, we saw no large difference. In those cases, the heuristic evaluation may outweigh and hide the overhead of states. In blind search, we saw a positive impact for domains with axioms and a ~20% decrease in other domains. We profiled miconic/s8-4 in the base version and in version 1 (using shared_ptr<StateData>). The profile showed that roughly half of the performance difference was caused by dynamic memory allocations (malloc/free), so we investigated where the calls for new come from. We could clearly see that some calls happen once per evaluated state, some once per generated state and so on. In most cases the number calls matched (exactly) between the base and the issue version: ~6 * #evaluated: reallocate size for evaluation cache ~5 * #expanded: reallocate vector for applicable operators #evaluated + #expanded: CombiningEvaluator::compute_result (allocating vector for results?) #evaluated + #expanded: Hashtable::rehash (adding elements to the cache of an evaluation context) #evaluated: vector<>::reserve (vector<int> key from TieBreakingOpenList) #evaluated: SearchProgress::process_evaluator_value (adding entries to a hash map) #evaluated: Heuristic::convert_global_state (copying the state for the task transformation) All of the above have the same number of calls in both version. While they cannot cause the performance difference we saw, they can be targets for other issues to save some of the allocations. For example, reusing the same vector for applicable operators sounds like low-hanging fruit. The only differences between the versions was in allocations caused by the state unpacking and generating successor states. The base version unpacks global states before converting it along a task transformation. As we only used one heuristic, this resulted in #evaluated dynamic allocations. On the other hand, the new code, unpacks in the state registry and creates a new vector<int> for the unpacked copy of the data. This leads to #generated dynamic allocations. (Actually the number is higher. We assume this is caused by delayed duplicate detection where we generate more states in the registry than the ones that are considered "generated states".) The new code also has to construct states which needs one dynamic allocation for the shared pointer. This happens once per generated state, and some additional times (for example when looking up states during deferred duplicate detection). So in the end, the new code has at least (2*#generated - #evaluated) dynamic allocations more than the old one. This is significant since #generated is much higher than #evaluated (factor of 15 in the task we looked at). The next step we discussed is to change the code of get_successor_state in a way that it will operate only on the packed data in tasks with no axioms, while unpacking the data in tasks with axioms.
msg9788 (view)	Author: DominikDrexler	Date: 2021-01-22.09:39:43
experimental results for a comparison of version1, version2, and base: https://mega.nz/file/kCgUAL6A#WGkdBm_MHLDlaAqNnIIsOzDLz5qDWn2xmePHwJw6yJM In summary: base performs better than version1 and version2 on most benchmarks. The difference between version1 and version2 is rather small.
msg9786 (view)	Author: malte	Date: 2021-01-21.16:03:05
Great link, thanks! The first thing the article discusses (constant-sized vectors) is something that has bothered me many times, and there are a few places in the code where we avoid vector because of it. (I think one of them is the abstraction representation for Cartesian abstractions? Another is inside the delete relaxation. And of course the way we store things in state registries and PerStateInformations.) Also interesting that they point to Compiler Explorer. Before I did my benchmark, I started to look there at the assembly of the two variants too, but quickly decided that it was too much. :-)
msg9785 (view)	Author: jendrik	Date: 2021-01-21.15:40:33
Your microbenchmark results are in line with a recent blog article: https://wolchok.org/posts/cxx-trap-1-constant-size-vector/
msg9783 (view)	Author: malte	Date: 2021-01-21.13:29:04
In my microbenchmark (attached, easy to run for yourself; check the first lines), resize-then-assign is much faster than reserve-then-push on my two machines, at least for the scenario I tested. It seems that the compiler cannot fully optimize away all the "test against capacity, increment the size" shenanigans that push_back involves, which is fair enough. For completeness, I also tested whether something like "vector<int> v; v.resize(100);" behaves significantly differently from "vector<int> v(100)" (it doesn't) and whether push_back behaves significantly differently from emplace_back (it doesn't). So the conclusion seems to be to favor the resize-then-assign version, and I would suggest using that in both cases. I would say it is slightly less in the spirit of what counts as clean C++ these days (because we first assign a wrong value and then the correct one), so I would add a comment stating that this was faster than reserve-then-push_back experimentally. Of course, even better would be to actually measure this, rather than rely on a microbenchmark. But for the time being I think it's enough to use a consistent method in both cases. (In case it's not obvious, we are bothering with these low-level things here because this can be one of the critical paths based on some things we've seen in the past. Perhaps it isn't, but these state-copying or state-converting loops were among the perfomance-critical bits of code in the past. We clearly shouldn't worry about this kind of low-level performance in 99% of the code.)
msg9780 (view)	Author: malte	Date: 2021-01-21.13:08:13
I'm running a benchmark, and perhaps we should change it the other way around, so perhaps wait a bit with experiments.
msg9779 (view)	Author: malte	Date: 2021-01-21.12:51:23
> In one case the vector already exists (but is empty) in the other case, we > create it there. I understand, but I think the reserve then push version might be preferable in both cases because it only has to write to every vector element once. If you create a vector of size N (rather than reserving capacity), the default constructor has to be invoked and assigned to each element, only to be immediately overwritten. (A good compiler may or may not be able to compile this away.) > Isn't there a default capacity for a vector if it is created or are they > created without reserving space as long as they are unused? I'd be surprised if any specific behavior was mandated by the standard. But I'd also be surprised for a common implementation to reserve nonzero capacity because vectors can frequently remain empty, and then you never need to do a dynamic memory allocation if you begin with capacity 0. > Should we switch to creating an empty vector and using reserve/push_back > in the second case? I think this would be better in terms of reducing spurious performance differences between the two versions. I don't want to claim that either version will be better than the other without benchmarking. :-) But I'm leaning towards reserve/push_back being no worse at least, and then I think it's best to make both the same. Here is a small test program to show the capacity of an empty vector. It is 0 for me, but like I said I doubt this is a language guarantee: =========================================================================== #include <iostream> #include <vector> int main(int, char **) { std::cout << std::vector<int>().capacity() << std::endl; }
msg9778 (view)	Author: florian	Date: 2021-01-21.12:36:00
> Isn't there a default capacity for a vector if it is created Answering my own question: the default seems to be 0 for all common compilers. Makes sense. So we'll keep the reserve/push_back strategy in both cases. I will update the code (the links will stay the same).
msg9777 (view)	Author: florian	Date: 2021-01-21.12:21:17
In one case the vector already exists (but is empty) in the other case, we create it there. Isn't there a default capacity for a vector if it is created or are they created without reserving space as long as they are unused? Should we switch to creating an empty vector and using reserve/push_back in the second case?
msg9775 (view)	Author: malte	Date: 2021-01-21.12:00:00
I didn't find a way on github to comment directly on the diff, so one comment here: Why the two different strategies (presize vector and assign vs. reserve and push_back) in State/StateData::unpack()? These can have different performance characteristics. For example, the resize-then-assign version has to touch every element twice (once to assign 0, then again to assign the final value).
msg9774 (view)	Author: florian	Date: 2021-01-21.11:58:45
I updated the summary: - mentioned the meta issue (issue401) - point 3 is done in the current prototypes - added points 4 and 5 as next steps
msg9773 (view)	Author: florian	Date: 2021-01-21.11:53:19
Dominik and I implemented the two variants mentioned in msg9771. You can see them on Github Variant 1 (State has shared_ptr<StateData>): https://github.com/FlorianPommerening/downward/compare/main...issue348-v21#files_bucket Variant 2 (State has shared_ptr<vector<int>>): https://github.com/FlorianPommerening/downward/compare/main...issue348-v22#files_bucket Difference between the versions: https://github.com/FlorianPommerening/downward/compare/issue348-v21...issue348-v22 Dominik is now setting up experiments for both variants.
msg9772 (view)	Author: DominikDrexler	Date: 2021-01-20.19:09:16
Aspects to discuss about the current prototype for version 1: - How to implement StateData in the Header file? - How to access StatePacker inside State class? Will be good if someone can review the code.
msg9771 (view)	Author: florian	Date: 2021-01-20.14:25:04
We discussed the patch and some of the more ugly parts of it this morning. The result was that we are going to try an additional iteration of the prototype where we try a different approach: This iteration will use a state class that can have packed and unpacked, registered and unregistered data inside. It will offer methods to access the data, and create missing representations (e.g. unpack packed data). We will use indirection with shared pointers to keep the actual state class light-weight and easy to copy. The state will remain immutable in the sense that the data inside the state can only change by adding and equivalent representation (unpacking, registering) or by the assignment operator. One advantage is that in the future, we can try things like not storing derived variables in tha packed data and treat evaluated axioms as another (optional) piece of the state that is created on demand. However, this is not part of this issue. We also discussed the way states are currently copied when transforming them from the task used in the search to the task used in the heuristic (both in the current main branch and the issue branch). This will hopefully become obsolete with our planned solution to the component interaction problem, so we decided to leave it like it is in this issue. As a next step, we want to try two variants of the idea described above. Here is some mock code ### Variant 1: class StateData { mutable const PackedStateBin buffer; mutable StateID id; mutable vector<int> unpacked_data; mutable const StateRegistry registry; public: void unpack() const; operator []; } class State { shared_ptr<StateData> data; public: void unpack() const; operator []; } ### Variant 2: class State { mutable const PackedStateBin buffer; mutable StateID id; mutable shared_ptr<vector<int>> unpacked_data; mutable const StateRegistry registry; public: void unpack() const; operator []; };
msg9764 (view)	Author: DominikDrexler	Date: 2021-01-19.11:07:47
Here is a comparison of the branch tip denoted by issue348-v20 with issue348- base using blind search: ComparativeReport: https://mega.nz/file/xDwEFJqb#QTGQyjN4wiz69Rk4fi5jpQikn31-ABCH5FsqcMXV5ZM Scatterplot adl: https://mega.nz/file/lXo1zSxR#AbALakrN3hB1Uy8arM0jsQvVUaqpASJuJ0o0BVQJE3I Scatterplot strips: https://mega.nz/file/hLhFEa6C#d_KX1vpU2NORLPhcEJleCfpezU1CLGW9mdAe5vy0qCs The branch tip yields better performance on adl and worse on strips.
msg9762 (view)	Author: florian	Date: 2021-01-18.16:33:37
To bring this issue back up to date, I re-recreated the pull request on GitHub and rebased it to the current main branch. https://github.com/aibasel/downward/pull/17 Since the patch is fairly large, I created a summary of the most important changes: * States (objects of the class State) are always unpacked and can be registered or unregistered. The only place packed state values occur is inside the StateRegistry. * States internally use a shared_ptr<vector<int>> for the data, so they remain cheap to move around and copy. This could also be used to unpack the values on demand (there is a second branch that tries that here: https://github.com/FlorianPommerening/downward/compare/issue348...FlorianPommerening:issue348-experimental). * Axioms are always evaluated on unpacked data. * State creation is still in the StateRegistry (we once planned to separate it but couldn't find a good solution here). * Successor state creation copies the packed parent state, then unpacks it and evaluates effects and axioms on the unpacked data, then repacks everything again for storage. The unpacked data is returned (without unpacking it again). * Unregistered successors of unpacked states can also be created through the task interface. This copies the unpacked data and then evaluates effects and axioms on it. * SearchNode now stores a mutable unique_ptr<State> instead of a GlobalState. * (Unpacked) state is now stored in EvaluationContext rather than EvaluationCache (was only accessed through EvaluationContext). * We reintroduced the StateHandle class to represent a state (StateHandle = state ID + reference to registry). * PerStateInformation is now indexed with StateHandles. * Duplicated code in the successor generator was removes (overloads for States and GlobalStates not necessary anymore). * Heuristics no longer have two overloads for compute_heuristic. * The AxiomEvaluator no longer needs to overloads to evlauate axioms on packed and unpacked data.
msg8920 (view)	Author: jendrik	Date: 2019-06-19.09:18:50
Here are the results. base (05d98c9): start of the branch v13 (2f7831a): "rename interface for unpacked states" v14 (cba70832): "keep unpacked and packed data synchronized instead of unpacking at the end of successor generation" v19 (540e85c): branch tip ("Allow to generate unregistered successor states on tasks with axioms") v13 vs. 14: https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-v14-blind-issue348-v13-issue348-v14-compare.html https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-v14-blind-issue348-v13-issue348-v14-total_time-blind-strips.png https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-v14-blind-issue348-v13-issue348-v14-total_time-blind-adl.png base vs. v19: https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-v19-blind-issue348-base-issue348-v19-compare.html https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-v19-blind-issue348-base-issue348-v19-total_time-blind-strips.png https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-v19-blind-issue348-base-issue348-v19-total_time-blind-adl.png
msg8919 (view)	Author: jendrik	Date: 2019-06-18.15:10:28
Experiments are running (see msg8911).
msg8915 (view)	Author: florian	Date: 2019-06-17.11:14:54
The pull request at https://bitbucket.org/FlorianPommerening/downward-issues/pull-requests/59 is the one that we made for review. It contains the changes from the prototype branches that we want to keep. The other pull requests were of the prototype branches and we only used them while working on them to keep track of the changes. There is also pull request 60 that was just to see if we missed anything important in the pull request above (see msg8904).
msg8914 (view)	Author: malte	Date: 2019-06-17.10:59:20
Which pull request have you been reviewing? There seem to be multiple open ones. I'm not entirely sure which one the comments refer to. > * Check how "lightweight" classes such as StateHandle are passed around as > function parameters. Didn't see any reference to that in the coding guidelines, > but StateHandle is ATM being passed in most (all?) of the new code by value, not > by const ref. I don't think we have a written-down policy for this. In principle, very small objects can be more efficiently passed by value than by const reference, both because there is less to pass, but more importantly because indirection can be reduced and more optimization assumptions are possible. For example, we wouldn't want to pass an int by const reference. But it is not easy to come up with general rules that are easy to understand and avoid code churn. > * The SearchNode class now holds a unique_ptr to the unpacked State (which acts > as a cache to the calls to SearchNode::get_state). We should perhaps evaluate > the memory impact of this (I would expect 8Bytes times the number of nodes, which > could be maneageable, but still) I didn't see such a change (which is one reason why I'm asking which pull request you are referring to), but for what it's worth: SearchNode is a short-lived temporary class. I don't think we have more than five or so SearchNode instances alive at the same time. An overhead of 8 bytes per state is not something we would introduce without a very strong need.
msg8913 (view)	Author: guillem	Date: 2019-06-17.10:37:44
Cedric and me went through the proposed changes in the prototype and wrote down a few notes that might (or might not :-)) be of some use: * The State class now contains some redundancy: StateHandle contains a pointer to the registry, and the AbstractTask pointer contains that as well. * Florian was not too happy about the introduction of the StateHandle class - perhaps this could be avoided. * The State class seems to have too many constructors, some could probably be refactored in terms of others. * The comments about design decisions related to the State class in task_proxy.h should be revised / updated. * The modifications in StateRegistry::get_successor_state might be introducing some performance overhead, this should be checked. In particular, both the parallel update of the packed and unpacked state, and the way axioms are evaluated. * Check that the class EvaluatorCache is still needed - i.e., seems like a very thin wrapper around EvaluationResults that provides very little extra functionality. * Check how "lightweight" classes such as StateHandle are passed around as function parameters. Didn't see any reference to that in the coding guidelines, but StateHandle is ATM being passed in most (all?) of the new code by value, not by const ref. * The SearchNode class now holds a unique_ptr to the unpacked State (which acts as a cache to the calls to SearchNode::get_state). We should perhaps evaluate the memory impact of this (I would expect 8Bytes times the number of nodes, which could be maneageable, but still)
msg8911 (view)	Author: florian	Date: 2019-06-14.21:55:35
The first two points are about the performance and surprised me a lot because they seem to directly contradict our previous experiments. I ran blind search on gripper/prob07 locally and it took 27 seconds, roughly 10% slower than on the default branch. Jendrik, could you run an experiment comparing revision cba70832 to its parent (2f7831a) and create relative scatter plots for tasks with axioms and tasks without axioms separately? This way we could have a look at the change for axioms in isolation. Could you also run an experiment for the tip of the issue (540e85c) vs. the start of the branch (05d98c9) to see the overall impact? I'll start another experimental branch off of the issue branch to try out the unpacking on demand.
msg8910 (view)	Author: florian	Date: 2019-06-14.21:50:27
I went through the changes again and tried to collect the rough edges and open questions. I came up with the following list: 1) Tasks with axioms now do an additional copy of the unpacked state data. msg7500 concluded that unpacking the state for the axiom evaluation was not worth it. The new code doesn't unpack but it copies the full state data, evaluates the axioms on it, then packs it again. From my understanding of packing and unpacking, this should take longer than unpacking. I don't see how we save time with this change. 2) msg8880 showed a performance drop when unpacking every generated state but we currently do that. Why is this not a problem anymore? 3) Successor state creation is implemented twice, once for packed and once for unpacked data. Can we avoid this duplication? 4) State::operator[] returns a FactProxy where GlobalState::operator[] returned an int, do we want to change this? 5) There are now two ways to get to the ID of a state: s.get_id() and s.get_handle().get_id(). This smells of bad design but using the second one everywhere also seems bad. 6) We use unique_ptr<T> for data that is created on demand (SearchNode::state and State::values). Should we use optional<T> instead?
msg8906 (view)	Author: jendrik	Date: 2019-06-14.15:07:55
I reviewed the individual commits and only had a few minor comments. I think unpacking the state data on demand could be deferred to a follow-up issue.
msg8904 (view)	Author: florian	Date: 2019-06-14.13:58:46
I turned the prototype branch into a pull request for this issue by splitting it into 20 commits that are hopefully easier to review. I left out some stuff that I think we don't need any more. Here is the pull request from the issue branch to default. I recommend looking at the individual commits instead of the full diff. https://bitbucket.org/FlorianPommerening/downward-issues/pull-requests/59 Here is a pull request from the prototype to the issue branch. It shows the left-overs, i.e., the parts I didn't include in the new issue branch. Guillem and Cedric are currently looking over it to see if I missed anything important. https://bitbucket.org/FlorianPommerening/downward-issues/pull-requests/60 I think there is still quite a number of open points to do before we can consider merging them. the main things for me are: * We currently have two functions that compute successor states, with very similar code. We should try to get rid of one of them. * The StateHandle class is not that nice, especially if we keep getters for both handle and ID in the state. * I'm not convinced that copying the full state, then evaluating the axioms on the unpacked data, and packing everything again is worth it (over evaluating axioms on packed data and then unpacking). We should test that. We might be able to remove the StateHandle again if we switch the state class to contain only the handle as required data and treat the buffer lookup and the state unpacking as optional cached data.
msg8891 (view)	Author: jendrik	Date: 2019-06-12.18:18:38
For completeness, here are some additional results: https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-all-unpacked-exp1.html v1: store state data packed in registry and unpacked in State v2: remove copy and assignment constructors for State v3: store state in SearchNode to avoid unpacking it again v4: compute successors in registry v5: set effects in packed and unpacked state data simultaneously total search_time scores: v1: 568.65 v2: 568.38 v3: 570.45 v4: 571.56 v5: 577.54
msg8888 (view)	Author: jendrik	Date: 2019-06-12.17:25:30
I'm done making the changes we discussed offline. The new pull request is at https://bitbucket.org/FlorianPommerening/downward-issues/pull-requests/57 .
msg8880 (view)	Author: jendrik	Date: 2019-06-12.13:53:23
We suspect that the problem slowing down blind search is that all generated states have to be unpacked. The following experiment confirms this hypothesis: https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-base-exp1.html v1: default v2: make dummy copy of each evaluated State v3: add dummy unpacking for each expanded GlobalState v4: add dummy unpacking for each generated GlobalState total search_time scores: v1: 578.82 v2: 578.01 v3: 576.58 v4: 568.57
msg8822 (view)	Author: jendrik	Date: 2019-06-05.21:26:08
I ran an experiment evaluating the changes we made today, but the changes have no noticeable effect: https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-v3-opt.html
msg8809 (view)	Author: jendrik	Date: 2019-06-05.12:37:42
Update summary.
msg8806 (view)	Author: florian	Date: 2019-06-05.10:22:07
I'm attaching two profiles of gripper/prob06 in the default branch and in our prototype. It looks like the main place where we lose time is in get_successor_state. In the default branch, this was handled by the state registry and it created the successor state in the packed buffer. The packed buffer of the predecessor was copied and then modified. In the new code, we move the successor generation out of the registry and now copy the unpacked state, modify the copy, then pack it (copies it again) and register it (copies the packed buffer again). So instead of one copy we now have three.
msg8805 (view)	Author: jendrik	Date: 2019-06-04.23:40:37
We made some progress on the prototype and ran some experiments comparing the prototype to the default branch. Here are the results: https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-v1-opt-issue348-v1-base-issue348-v1-compare.html https://ai.dmi.unibas.ch/_tmp_files/seipp/issue348-v1-sat-issue348-v1-base-issue348-v1-compare.html The search time score decreases for most configurations, so we should profile the runtime to see where the slowdown comes from.
msg8791 (view)	Author: cedric	Date: 2019-06-03.17:46:14
The current plan is to store the state in the state registry in its packed form and unpack it everywhere else. This means that we will merge the functionality of the classes State and GlobalState into one class (called State). State objects can be registered or unregistered but they are always unpacked. This has some wide-ranging implications because copying a state is now an expensive operation. We thus have to think carefully about how to pass states to where they are used. The current plan is to store them inside an EvaluationContext and move them where possible. We started working on a separate branch to test the effects of this change. Once we are happy with it, we can decide which parts of it we want to use. https://bitbucket.org/FlorianPommerening/downward-issues/pull-requests/51
msg7500 (view)	Author: florian	Date: 2018-09-17.22:00:21
In issue794 we tested if its worth unpacking the state just for the axiom evaluation. The answer apparently is "no" (msg7498, msg7499). While the axiom evaluator is much faster on unpacked data, the unpacking and packing offsets this benefit. Since we already unpack the state in most heuristics, it would probably be worth it if we can unpack every state when it is taken out of storage and pack it when it is stored.
msg4337 (view)	Author: florian	Date: 2015-07-07.22:20:58
@Gabi: I'm adding you to the nosy list because you are working on issue545 and this might be relevant. Feel free to undo this. This issue has been lying around for a while again. In the meantime the situation has changed once again. We currently have two state classes: There is GlobalState which has an id, is used in the search, by StateRegistry, and PerStateInformation but does not know anything about the new task interface. GlobalState objects are immutable and hold a pointer to their data (stored in a StateRegistry). Each access to variable values is packed/unpacked on-the-fly individually. In the long run, we want the global task to disappear, which means we also hav to get rid of GlobalStates. Then there is the class State which contains unpacked state data. State objects know about the task interface and can generate FactProxy objects for it. Access to variable values is a direct array access, but copying States is expensive. States can currently be created from GlobalStates by the intermediary helper method convert_global_state. States are currently used within heuristics (those that already are updated to the new task interface). Operator application is implemented for them, but does not support axioms. The problem of packing/unpacking currently exists for both state classes: GlobalState is always packed, which makes evaluating axioms slow when generating successors in the search algorithms. State is always unpacked which is a problem for heuristics like the blind heuristic or PDBs that only access very few facts. For example, if we currently generate a successor and then evaluate a pdb heuristic for it, the following happens: 1. StateRegistry::get_successor_state reserves space for packed state data and copies the the current packed state data there. 2. It then loops through the effects of the operator and checks for each one, if it fires. To do that, it unpacks each value individually. The effects are then set in the new packed state individually as well. 3. The AxiomEvaluator is called on the new packed state data and again accesses each value individually (read and write access). 4. A GlobalState object is created that points to the new packed state data. 5. A State object is created by accessing/unpacking each value individually and storing the result in an int array. 6. The PDB accesses the values for variables in its pattern. This is inefficient mostly in step 3. (evaluating axioms would be faster on unpacked states) and step 5./6. (The PDB only needs a few values and should not unpack the state). Malte mentioned in msg4326: > One possibility is to replace the current concrete State class with > an abstract interface with unpacking and non-unpacking implementations. Gabi currently also works on issue545 which will change the state class(es) once again.
msg3741 (view)	Author: malte	Date: 2014-10-09.10:48:52
Sure, sounds good.
msg3728 (view)	Author: florian	Date: 2014-10-08.20:54:46
What do you suggest as a next step? You pointed out a problem in the code review which accounts for some of the lost runtime, but it cannot account for the whole difference. Should we have a more detailed look at the profile together?
msg3721 (view)	Author: malte	Date: 2014-10-08.17:40:59
It makes a certain amount of sense that this causes overhead, but it's surprising that it's so large and that it varies so much from task to task (little difference for some tasks, more than five times the runtime for others). Perhaps there's something special about these airport tasks, such as a high rate of duplicates.
msg3720 (view)	Author: florian	Date: 2014-10-08.17:34:34
I ran callgrind on airport/p09 but did not have a close look at the results. On first glance it looks like pack/unpack just takes too much time. With the blind heuristic, we unpack every value even though we do not look it during heuristic calculation.
msg3717 (view)	Author: florian	Date: 2014-10-08.13:04:42
It might take me a while to get to this but I'll look at the profile for the Airport tasks. The code is in my bitbucket repository. Merging in the new default should have updated the existing pull request, so it now shows the diff of the versions we compared: https://bitbucket.org/flogo/downward-issues-issue348/pull-request/1/always-unpack-state/diff
msg3713 (view)	Author: malte	Date: 2014-10-08.12:02:03
The slowdown in, for example, some Airport tasks with blind search is quite large. Perhaps you can profile one of these? I'd be interested in having a look at the code, ideally a comparison of the two versions compared in the experiment. Can you set up a code review for this delta?
msg3700 (view)	Author: florian	Date: 2014-10-07.21:25:08
Results for the optimal suite are in (40MB): http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-v3-base-issue348-v3-compare.html Coverage and memory look good, but everything seems to run a little slower, and in some cases a lot slower. In the last reports we also compared to the code before issue214 (packed states), this is not done here. What do think would be a good next step (more reports, experiments for sat, profiling, ...)?
msg3670 (view)	Author: florian	Date: 2014-10-06.12:30:49
I will merge in the new code and re-run the experiments.
msg3638 (view)	Author: malte	Date: 2014-10-04.19:49:35
It would be good to make progress on this issue -- I think it's an important one. Can I do anything to help get this one moving again? Also, perhaps we should run some new experiments that give before/after numbers against the newest code version? The original experiments are now 4-5 months old, we've had many changes since then.
msg3407 (view)	Author: johannes	Date: 2014-09-03.13:23:02
This issue has implications for my numeric fast downward extension too. I currently tag variables in preprocess with "constant" (contains the value of a primitive numeric expression such as road length) "derived" (result of a numeric axiom) "instrumentation" (variable never occurs in preconditions -> only relevant to compute metrics) and "state-relevant". The state-relevant variables have to be packed, while states that only differ in the instrumentation variables can be considered the same state. Constants can be stored globally once (they never change) and the handling of derived variables has multiple possible solutions. Following florian the current solution would be to evaluate all axioms when unpacking the state and not store them in the packed buffer (and also do not evaluate them on the fly).
msg3188 (view)	Author: florian	Date: 2014-05-23.16:55:50
We still have an open issue in the code, maybe you could have a look. Its the TODO in state_registry.cc: the question is whether we can avoid to copy the state data by reserving uninitialized memory in the pool and then pack the state into this memory. (https://bitbucket.org/flogo/downward-issues-issue348/pull-request/1) Apart from that, we wanted to have a closer look (i.e. profile) at iPDB. It seems to be the config where the state packing has the largest negative effect.
msg3187 (view)	Author: malte	Date: 2014-05-23.15:55:06
Thanks! What's the next step?
msg3185 (view)	Author: florian	Date: 2014-05-23.15:22:07
I ran full experiments after Jendriks review for optimal domains: http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-v2-opt.html The full results for satisficing domains are for the code version before the review, but this should be fine, because we mostly changed minor things in the review http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-v1full-sat.html Both reports show a lot of failed merge and shrink configs because the configuration file in lab was out of date. Finally, I tried to create a report that only shows runs with a significant change in total time. It compares the code before issue214 to the code after this issue. It excludes a run if it wasn't solved in both configs, or it was solved in both configs in under 0.5 seconds. The remaining runs are only included if their runtime difference is at least 20%. http://ai.cs.unibas.ch/_tmp_files/pommeren/notablechangesbyrevisionreport.html#total_time
msg3180 (view)	Author: florian	Date: 2014-05-21.17:53:47
Argh, yes! The reports had the same name and the second one replaced the first. Looked ok in my browser because I checked the first before uploading the second. Let me try again: http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-v1-opt.html http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-v1-sat.html
msg3179 (view)	Author: malte	Date: 2014-05-21.17:49:49
Numbers look good! Did you want to give two different experiment links?
msg3178 (view)	Author: florian	Date: 2014-05-21.17:42:44
I did a first round of experiments for unpacking in every state and invited Jendrik to the following pull request: https://bitbucket.org/flogo/downward-issues-issue348/pull-request/1 If anyone else wants to have a look, let me know. The experiments only test the domains and configs Malte suggested in msg3049. We should run a larger set of benchmarks before the final merge. http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-base-issue348-v1-compare.html http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-base-issue348-v1-compare.html
msg3092 (view)	Author: florian	Date: 2014-03-28.12:00:38
Some notes from our last meeting: As a first step, we want to completely unpack the state everywhere outside the state registry, i.e., only pack it for storage. As I said offline, I'll finish some other things first and will then work on this issue if it is still open. I'll leave the issue unassigned until I actually have time to work on it in case anyone else wants to work on it in the meantime.
msg3082 (view)	Author: malte	Date: 2014-03-24.13:32:26
Any volunteers to lead this? I'm happy to give advice regarding the design etc.
msg3065 (view)	Author: malte	Date: 2014-03-15.16:31:52
Andrew reports (in msg3061 and msg3063 of issue214) a nice speed improvement for blind search in the tidybot domain by inlining the get/set methods for the packed states. See msg3061 for details. We should consider the question of inlining or not inlining in the context of this issue, since 1) inlining speeds up access to packed states, making unpacking less necessary 2) inlining might be less beneficial if we unpack states in one go, since there will be fewer calls to get/set. My suspicion regarding 1) is that we will want to unpack even if we inline calls to get/set. Regarding 2), we should measure and look at particularly bad domains/configurations.
msg3049 (view)	Author: malte	Date: 2014-03-14.01:33:48
For these and the related issues on the state space, I suggest we identify a set of domains and configurations that are particularly worth watching, and limit our initial experiments to this smaller subset. (Or, if you prefer to run all experiments on all benchmarks and configs, that's also fine, but I'd like to additionally have result tables that only talk about the "interesting subset".) I find it very hard to find problem cases in the 100 MB tables, since the summaries don't nearly tell the whole story. For example, we know that for this issue we mainly need to watch total time, and the total time summaries are rather useless with a large set of configs because the set of commonly solved instances becomes tiny. I would never have found h^CG + PSR-Large as a problem case if I hadn't been looking for it specifically in the detailed tables. Here are my candidates for configs to watch: - A* + blind - A* + h^2 - A* + ipdb - eager_greedy + h^CG (presumably with preferred ops? whatever the config in our satisficing experiment for issue214 was) - LM-Cut - BJOLP (The last two are not really problem cases, but it's good to have something not expected to be problematic to compare against.) Here are my candidates for domains: - tidybot (e.g. the opt variant) - pegsol (e.g. opt-11, but I expect pegsol-2008 should work equally well) - psr-large There may be other critical cases I am missing. Ideally, I'd be interested in a list of cases (config, instance) where total time is at least 0.5 seconds for the old code and increases by at least 20% in the new code in the experiments of issue214. Can someone prepare this, or if not, can someone make the properties files for the two main experiments (opt and sat) from this issue available? (Actually, it'd be nice to have the properties files either way.)
msg3043 (view)	Author: jendrik	Date: 2014-03-13.17:02:57
Rephrased title to make this issue more general. Originally this was only about the axiom evaluator. Quoting Malte from issue214: """ There is one outstanding point ...: the question if, when and where to unpack the state information. We already have issue348 related to the axiom evaluator that discusses this point, but this should not be limited to the axiom evaluator, as our results for heuristics that access state information many times show. I suggest we [...] deal with it soon, either straight away or after issue420, which should be quick. We can take the information from msg2994 and msg3004 to identify the most interesting configurations and benchmarks for this, and msg3001 has some additional considerations regarding IDA* etc. that might help guide the code design for the new issue. """
msg3041 (view)	Author: jendrik	Date: 2014-03-13.01:10:08
Other related things we should take care of: 1) The initial state representation contains unevaluated axiom values. 2) Update the comment above the declaration of g_initial_state_data in globals.h once 1) is done.
msg3033 (view)	Author: florian	Date: 2014-03-06.16:10:08
After we finished issu214 and issue420, we should try to refactor the axiom evaluator. States should be unpacked before axioms are evaluated. This would mean that the axiom evaluator works on unpacked state data and we could move the typedef for PackedStateEntry into the state class and make it private. We should then also make sure that axioms are evaluated on the global variable holding the initial state data. Currently this is done only when the initial state object is created by the state registry, which is confusing because the global variable name is not what it seems.
msg2327 (view)	Author: florian	Date: 2012-08-30.13:28:12
From Malte's comment in Rietveld: > The way that the axiom evaluator accesses the state at the moment will probably need some additional thought at some point. At the moment the evaluator accesses the State class directly and changes the state variables of it. In the patch for issue344 this is changed to a solution where the evaluation works directly on the state_var_t array and is called in the constructor for successor states and in the named constructor for the initial state.

History
Date	User	Action	Args
2021-01-30 18:32:59	silvan	set	messages: + msg9964
2021-01-30 17:57:52	jendrik	set	messages: + msg9962
2021-01-30 17:45:25	malte	set	messages: + msg9961
2021-01-30 17:38:21	florian	set	status: in-progress -> resolved messages: + msg9958
2021-01-30 15:57:58	malte	set	messages: + msg9957
2021-01-30 08:56:42	jendrik	set	messages: + msg9956
2021-01-30 00:18:52	florian	set	messages: + msg9955
2021-01-29 14:12:47	malte	set	messages: + msg9950
2021-01-29 13:51:39	malte	set	messages: + msg9948
2021-01-29 13:20:10	malte	set	messages: + msg9945
2021-01-29 13:18:44	florian	set	messages: + msg9944
2021-01-29 10:54:07	jendrik	set	messages: + msg9938
2021-01-28 15:52:53	florian	set	messages: + msg9886 summary: This is part of issue401 TODOs for prototype implementation: 1. ~~Return EvaluationContext instead of registered state in search engines.~~ 2. ~~Check for additional copies added by the change so far.~~ 3. ~~Unpack state data on demand.~~ 4. ~~Investigate the typical "life cycle" of a state and when it is unpacked/copied.~~ 5. ~~Decide when to unpack states.~~ 6. Unpack states in successor generator only in the presence of axioms and otherwise on demand. TODOs for final implementation: 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554). -> This is part of issue401 TODOs for prototype implementation: 1. ~~Return EvaluationContext instead of registered state in search engines.~~ 2. ~~Check for additional copies added by the change so far.~~ 3. ~~Unpack state data on demand.~~ 4. ~~Investigate the typical "life cycle" of a state and when it is unpacked/copied.~~ 5. ~~Decide when to unpack states.~~ 6. ~~Unpack states in successor generator only in the presence of axioms and otherwise on demand.~~
2021-01-28 15:47:52	florian	set	messages: + msg9884
2021-01-28 12:05:35	florian	set	messages: + msg9876
2021-01-27 23:29:39	florian	set	messages: + msg9872
2021-01-27 21:21:12	florian	set	messages: + msg9870
2021-01-27 21:00:25	florian	set	messages: + msg9869
2021-01-27 19:55:59	jendrik	set	messages: + msg9866
2021-01-27 13:15:52	florian	set	messages: + msg9856
2021-01-26 18:42:02	malte	set	messages: + msg9814
2021-01-26 16:27:36	florian	set	messages: + msg9813
2021-01-26 14:47:50	florian	set	messages: + msg9807
2021-01-26 12:12:18	silvan	set	nosy: + silvan
2021-01-25 14:37:14	malte	set	messages: + msg9805
2021-01-25 14:23:35	florian	set	messages: + msg9804
2021-01-25 13:33:22	malte	set	messages: + msg9803
2021-01-25 12:45:19	florian	set	files: + callgrind.airport.p08.base
2021-01-25 12:43:46	florian	set	files: + callgrind.airport.p08.version2-v3
2021-01-25 12:41:03	florian	set	messages: + msg9802
2021-01-24 13:24:22	florian	set	messages: + msg9801
2021-01-23 10:34:43	DominikDrexler	set	messages: + msg9800
2021-01-22 15:33:56	malte	set	messages: + msg9799
2021-01-22 13:40:11	florian	set	messages: + msg9796 summary: This is part of issue401 TODOs for prototype implementation: 1. ~~Return EvaluationContext instead of registered state in search engines.~~ 2. ~~Check for additional copies added by the change so far.~~ 3. ~~Unpack state data on demand.~~ 4. Investigate the typical "life cycle" of a state and when it is unpacked/copied. 5. Decide when to unpack states. TODOs for final implementation: 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554). -> This is part of issue401 TODOs for prototype implementation: 1. ~~Return EvaluationContext instead of registered state in search engines.~~ 2. ~~Check for additional copies added by the change so far.~~ 3. ~~Unpack state data on demand.~~ 4. ~~Investigate the typical "life cycle" of a state and when it is unpacked/copied.~~ 5. ~~Decide when to unpack states.~~ 6. Unpack states in successor generator only in the presence of axioms and otherwise on demand. TODOs for final implementation: 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554).
2021-01-22 13:36:12	florian	set	files: + callgrind.miconic.s8-4.version1-v2
2021-01-22 13:35:53	florian	set	files: + callgrind.miconic.s8-4.base messages: + msg9795
2021-01-22 13:32:57	florian	set	messages: + msg9794
2021-01-22 09:39:43	DominikDrexler	set	messages: + msg9788
2021-01-21 16:03:05	malte	set	messages: + msg9786
2021-01-21 15:40:33	jendrik	set	messages: + msg9785
2021-01-21 13:29:05	malte	set	files: + benchmark.cc messages: + msg9783
2021-01-21 13:08:13	malte	set	messages: + msg9780
2021-01-21 12:51:23	malte	set	messages: + msg9779
2021-01-21 12:36:00	florian	set	messages: + msg9778
2021-01-21 12:21:17	florian	set	messages: + msg9777
2021-01-21 12:00:00	malte	set	messages: + msg9775
2021-01-21 11:58:45	florian	set	messages: + msg9774 summary: TODOs for prototype implementation: 1. ~~Return EvaluationContext instead of registered state in search engines.~~ 2. ~~Check for additional copies added by the change so far.~~ 3. Unpack state data on demand. TODOs for final implementation: 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554). -> This is part of issue401 TODOs for prototype implementation: 1. ~~Return EvaluationContext instead of registered state in search engines.~~ 2. ~~Check for additional copies added by the change so far.~~ 3. ~~Unpack state data on demand.~~ 4. Investigate the typical "life cycle" of a state and when it is unpacked/copied. 5. Decide when to unpack states. TODOs for final implementation: 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554).
2021-01-21 11:53:19	florian	set	messages: + msg9773
2021-01-20 19:09:16	DominikDrexler	set	messages: + msg9772
2021-01-20 14:25:05	florian	set	messages: + msg9771
2021-01-19 11:07:47	DominikDrexler	set	messages: + msg9764
2021-01-19 11:07:07	DominikDrexler	set	messages: - msg9763
2021-01-19 11:05:57	DominikDrexler	set	nosy: + DominikDrexler messages: + msg9763
2021-01-18 16:33:38	florian	set	messages: + msg9762
2019-06-19 09:18:50	jendrik	set	messages: + msg8920
2019-06-18 15:10:28	jendrik	set	messages: + msg8919
2019-06-17 11:14:54	florian	set	messages: + msg8915
2019-06-17 10:59:20	malte	set	messages: + msg8914
2019-06-17 10:37:44	guillem	set	nosy: + guillem messages: + msg8913
2019-06-14 21:55:35	florian	set	messages: + msg8911
2019-06-14 21:50:27	florian	set	messages: + msg8910
2019-06-14 15:07:55	jendrik	set	messages: + msg8906
2019-06-14 13:58:46	florian	set	messages: + msg8904
2019-06-12 18:18:38	jendrik	set	messages: + msg8891
2019-06-12 17:25:30	jendrik	set	status: chatting -> in-progress messages: + msg8888
2019-06-12 13:53:23	jendrik	set	messages: + msg8880
2019-06-05 21:26:08	jendrik	set	messages: + msg8822
2019-06-05 12:37:42	jendrik	set	messages: + msg8809 summary: TODOs for prototype implementation: 1. Return EvaluationContext instead of registered state in search engines. 2. Check for additional copies added by the change so far. TODOs for final implementation: 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554). -> TODOs for prototype implementation: 1. ~~Return EvaluationContext instead of registered state in search engines.~~ 2. ~~Check for additional copies added by the change so far.~~ 3. Unpack state data on demand. TODOs for final implementation: 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554).
2019-06-05 10:22:20	florian	set	files: + default.callgrind.out
2019-06-05 10:22:07	florian	set	files: + branch.callgrind.out messages: + msg8806
2019-06-04 23:40:37	jendrik	set	messages: + msg8805
2019-06-03 17:46:14	cedric	set	messages: + msg8791 summary: TODOs (incomplete): 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554). -> TODOs for prototype implementation: 1. Return EvaluationContext instead of registered state in search engines. 2. Check for additional copies added by the change so far. TODOs for final implementation: 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554).
2019-06-03 14:05:41	cedric	set	nosy: + cedric
2018-09-17 22:00:21	florian	set	messages: + msg7500
2015-10-29 14:02:12	jendrik	set	summary: TODOs (incomplete): 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554).
2015-07-07 22:20:58	florian	set	nosy: + gabi messages: + msg4337
2014-10-09 10:48:52	malte	set	messages: + msg3741
2014-10-08 20:54:46	florian	set	messages: + msg3728
2014-10-08 17:40:59	malte	set	messages: + msg3721
2014-10-08 17:34:50	florian	set	files: + callgrind.airport.p09.patched
2014-10-08 17:34:34	florian	set	files: + callgrind.airport.p09.default messages: + msg3720
2014-10-08 13:04:42	florian	set	messages: + msg3717
2014-10-08 12:02:03	malte	set	messages: + msg3713
2014-10-07 21:25:09	florian	set	messages: + msg3700
2014-10-06 12:30:49	florian	set	messages: + msg3670
2014-10-04 19:49:35	malte	set	messages: + msg3638
2014-09-03 13:23:02	johannes	set	nosy: + johannes messages: + msg3407
2014-05-23 16:55:50	florian	set	messages: + msg3188
2014-05-23 15:55:06	malte	set	messages: + msg3187
2014-05-23 15:22:07	florian	set	messages: + msg3185
2014-05-21 17:53:47	florian	set	messages: + msg3180
2014-05-21 17:49:49	malte	set	messages: + msg3179
2014-05-21 17:42:44	florian	set	messages: + msg3178
2014-05-21 14:46:09	florian	set	assignedto: florian
2014-03-28 12:00:38	florian	set	messages: + msg3092
2014-03-24 13:32:26	malte	set	messages: + msg3082
2014-03-15 16:31:52	malte	set	messages: + msg3065
2014-03-15 16:27:05	andrew.coles	set	nosy: + andrew.coles
2014-03-14 01:33:48	malte	set	messages: + msg3049
2014-03-13 17:02:57	jendrik	set	messages: + msg3043 title: Rethink the way that the axiom evaluator accesses the state -> Rethink how and when to unpack states
2014-03-13 01:10:08	jendrik	set	nosy: + jendrik messages: + msg3041
2014-03-06 16:10:08	florian	set	status: unread -> chatting messages: + msg3033
2012-08-30 13:28:12	florian	create

Issue348