Issue385

Title Remove StateRepresentation class
Priority wish Status resolved
Superseder Nosy List erez, florian, gabi, jendrik, malte, mkatz, silvan
Assigned To florian Keywords
Optional summary

Created on 2013-08-01.02:17:50 by florian, last changed by florian.

Messages
msg2759 (view) Author: malte Date: 2013-12-05.14:08:06
Yes, please merge.
msg2758 (view) Author: florian Date: 2013-12-05.11:41:40
Experiments for satisficing configs (WARNING: 77 MB)

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue385-exp3-v3-sat.html

From my side, this looks ready to merge.
msg2739 (view) Author: florian Date: 2013-12-02.11:29:53
New experiments for optimal configs (WARNING: 50MB each)

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue385-exp2-v3.html

The results look good so far. I'll start running experiments for satisficing
configs.
msg2728 (view) Author: florian Date: 2013-11-27.12:42:46
The activity tab shows messages grouped by thread and sorted by the date of the
last message. If the code file, where the thread started, changed in the
meantime, a yellow/gray box in the frame shows that it is "outdated".
I like this view compared to the one in rietveld because it keeps the threads
that are still being discussed at the top, even if the code changes.

I fixed some minor things (it looks like you are already reviewing them, but its
mostly comments and refactoring). I still have some open questions, but I guess
they are easier to discuss offline, if you have some free time tomorrow.
msg2727 (view) Author: malte Date: 2013-11-26.23:31:30
All the better.

I was looking at some code shown to me in the "Activity" tab on Bitbucket after
clicking at the email notifications I got from Bitbucket. (I don't fully
understand how these messages are tied to changesets, but it doesn't really
matter. I can't find the discussions there anymore.)
msg2726 (view) Author: florian Date: 2013-11-26.17:40:27
Thanks for the comments, I'll go over them now.

I didn't find the borrowed_buffer flag in the code. Which version were you
looking at? In the pull request the flag is remove at this position:

https://bitbucket.org/flogo/downward-issues-issue385/pull-request/1/remove-staterepresentation-class/diff#Lsrc/search/state.hF16
msg2725 (view) Author: malte Date: 2013-11-25.14:18:13
I think I replied to all comments on bitbucket. If there are no further open
questions here, please prepare another diff and run another experiment. (Sorry
about that, but I think that's always needed for changes at this level. Hope
it's not too much work.)

Going over the discussion, I found this comment in msg2664:

> The result of this would be that every state is registered and has borrowed
> state data. The state class would be simplified, because borrowed_buffer is
> not needed any more. We also would not need to check for invalid ids anymore,
> because only valid ids are constructed.

The code I looked at still has the borrowed_buffer flag. Am I missing something?
msg2673 (view) Author: florian Date: 2013-09-25.11:55:29
Both versions mentioned in msg2664 show almost no changes in coverage, memory or
time. I suggest we review and merge version 2 which has in-place state creation
and the simplified state class.

Here are the results (WARNING: 50MB each)

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue385-exp1-compare-issue385-v1.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue385-exp1-compare-issue385-v2.html

And the diff is here:
https://bitbucket.org/flogo/downward-issues-issue385/compare/issue385-v2..issue385-base#diff

I'll give Malte access in a minute. If anyone else is interested, tell me.
msg2668 (view) Author: malte Date: 2013-09-20.14:32:36
Sounds good as a temporary measure, but can we introduce a g_initial_state()
helper function for this? In the medium term, we should get rid of the global
registry, of course.

Regarding invalid IDs, it might make sense to have an ID that represents
"null/no state". I wouldn't call it "invalid", though -- that has the wrong
connotations for me. It's not that the initial node has an invalid parent, but
rather that it has no parent.

> The second place is in SearchNodeInfo where parent state ids are invalid for
> new nodes and the initial state node. In this case I don't see an obvious way
> to avoid the invalid id.

It would make logical sense that new nodes must be created with their parent
already set, so there's no invalid parent at any time.

Regarding the initial state, a common solution for structures of this kind where
"null" values are impossible for some reason is to let it be its own parent.
(I'm not saying I prefer this solution over allowing invalid IDs, just that this
is a common way to handle this.)


I'd prefer to review after the experiments.
msg2667 (view) Author: florian Date: 2013-09-20.10:54:50
Making every state registered means that the global variable for the initial
state would also have to be registered. I think this would be confusing if
searches start using their own registries. My suggestion is to remove it and use
"g_state_registry->get_initial_state()" for now. If we change the code so
different registries are used we have to check all occurrences of
g_state_registry anyway. I also added caching in state registry to avoid
recreating the initial state on every call.

About the state ids: I thought it would be possible to avoid having a value for
"invalid id", but I'm not sure anymore. Aside from assertions there are only two
places where invalid state ids are used. The first is in lazy search and is used
to distinguish the first iteration from all other iterations. This could be
avoided by using a flag. The second place is in SearchNodeInfo where parent
state ids are invalid for new nodes and the initial state node. In this case I
don't see an obvious way to avoid the invalid id.

Apart from the invalid state ids I now have both version ready. Malte, do you
want to review them before the experiments or after?
msg2665 (view) Author: malte Date: 2013-09-19.17:49:36
All sounds good to me.
msg2664 (view) Author: florian Date: 2013-09-19.11:17:24
I think it would make sense to do this now, because it would reduce code complexity.

The main change would be that only the StateRegistry class is allowed to
construct new states. It would get two factory methods "get_initial_state()" and
"get_successor(s, op)". Both these methods would construct the state data
in-place in the segmented vector (state_data_pool), try to add a new id for it
and remove the state data again if they find an existing id.

The result of this would be that every state is registered and has borrowed
state data. The state class would be simplified, because borrowed_buffer is not
needed any more. We also would not need to check for invalid ids anymore,
because only valid ids are constructed.

This would also help with temporary states because we could easily add a
reference or pointer to the creating state registry to each state.
PerStateInformation could then use a hash_map to map from registries to vectors,
as Malte suggested in issue386.


I currently have a version of the code ready for review that uses StateIDs
instead of StateHandles but sill copies the state data. If there are no
objections, I'll implement the idea above on top of that and then evaluate both
versions.
msg2654 (view) Author: malte Date: 2013-09-16.17:40:31
I doubt this will make a real difference, as there is no memory management
involved in either case. The "delete" part in particular is essentially free.

The copying of the data is a bit unfortunate; later on, we might try to set it
up so that when the state is packed in the first place, it is constructed right
where we want it to live (if it turns out to be new). But I wouldn't worry about
this until we can see a measurable drop in speed.
msg2650 (view) Author: florian Date: 2013-09-16.17:34:32
Is there a way we can avoid copying the state in option 1)? Otherwise every
lookup of a created state leads to this state being copied once and then deleted
again if it already was registered.
msg2649 (view) Author: malte Date: 2013-09-16.17:30:42
I prefer option 1) -- I think it fits the overall logic of the code quite well
and leads to no special cases in the hash code.

I was thinking before that StateHandles in their current form would be
redundant, yes.


> Also:
> Is there an advantage of
>
> struct StateId {
>   int value;
> }
> 
> over
> 
> typedef int StateId;

Yes, type-safety and making it possible to find out where state IDs are used if
we want to do anything that affects state IDs in general. BTW, the class should
be called StateID, as I think we don't lower-case upper-case words/abbreviations
in class names (e.g. PDB, not Pdb).
msg2648 (view) Author: florian Date: 2013-09-16.17:25:39
The hash function in StateRegistry has to be able to deal with unregistered
states. The get_handle method currently creates a new handle for a state,
inserts it into the hash_set and makes the state permanent if the insertion was
successful (otherwise it is a known state). What do you prefer for StateIds in
this method:

1) Adding the new state to the segmented vector (= copy it), insert its id in
the hash_set and remove it from the segmented vector if it was known already.
2) Using a field like current_state in the hash and equality methods and use it
whenever the StateId is equal to the size of the container. Then insert the
StateId "container.size()" and make the State permanent if this was successful.
3) Something else?

Another thing:
If StateHandles are not stored in containers, do we really need them?
Currently most lines that use StateHandles call search_space.get_node(handle) or
the constructor State(handle) shortly after getting the handle. Could we remove
the class StateHandle entirely and just use StateId for containers and
(references to) State everywhere else?


Also:
Is there an advantage of

struct StateId {
  int value;
}

over

typedef int StateId;
msg2601 (view) Author: malte Date: 2013-08-01.10:59:37
I think the StateRepresentation can indeed go.

We should have a StateID class, which stores just an int, and which can be used
e.g. inside SearchNodeInfo or OpenLists. StateRegistry could then use a hash of
such StateIDs.

Regarding StateHandle, I think the best way to proceed is to have it contain an
ID, a reference to a state registry, and possibly (if this is helpful for
performance) cached state data. State handles would not be stored inside
containers (as they currently are in the search space), but rather only be used
to actually work with states.
msg2585 (view) Author: florian Date: 2013-08-01.02:17:50
We split issue344 into more manageable chunks. The main issue branch will soon
be merged and this is a possible follow-up work. I nosied everyone from the
original issue, but feel free to erase yourself from nosy, if you are not
interested.

The StateRepresentation class is currently stored in the StateRegistry to
associate ids with state data and find duplicates in a hash set. Now that the
state data is stored in a SegmentedArrayVector we can use an integer in the hash
set. The hash and equality functions would then need to know the
SegmentedArrayVector and look up the index there for a semantic hash/comparison
of the state.

The StateHandle class would then only need to store the index as well (maybe we
can even replace the class by a typedef or just use ints directly). This means
that constructing a state from a state handle needs to know the StateRegistry
where the state is stored. We should remove the constructor from the State class
and use a factory method in StateRegistry instead.
History
Date User Action Args
2013-12-05 15:07:07floriansetstatus: reviewing -> resolved
2013-12-05 14:08:06maltesetmessages: + msg2759
2013-12-05 11:41:40floriansetmessages: + msg2758
2013-12-02 11:29:53floriansetmessages: + msg2739
2013-11-27 12:42:46floriansetmessages: + msg2728
2013-11-26 23:31:30maltesetmessages: + msg2727
2013-11-26 17:40:27floriansetmessages: + msg2726
2013-11-25 14:18:13maltesetassignedto: florian
messages: + msg2725
2013-09-25 11:56:40floriansetstatus: chatting -> reviewing
2013-09-25 11:55:29floriansetmessages: + msg2673
2013-09-20 14:32:36maltesetmessages: + msg2668
2013-09-20 10:54:50floriansetmessages: + msg2667
2013-09-19 17:49:36maltesetmessages: + msg2665
2013-09-19 11:17:24floriansetmessages: + msg2664
2013-09-16 17:40:31maltesetmessages: + msg2654
2013-09-16 17:34:32floriansetmessages: + msg2650
2013-09-16 17:30:42maltesetmessages: + msg2649
2013-09-16 17:25:39floriansetmessages: + msg2648
2013-08-01 10:59:37maltesetmessages: + msg2601
2013-08-01 02:17:50floriancreate