Issue849

Title M&S refactoring part 2: split off merge-and-shrink algorithm from heuristic
Priority feature Status resolved
Superseder Nosy List jendrik, malte, silvan
Assigned To silvan Keywords
Optional summary

Created on 2018-10-02.18:53:07 by silvan, last changed by silvan.

Files
File name Uploaded Type Edit Remove
mas-files.tar.gz silvan, 2018-10-07.09:30:13 application/gzip
Messages
msg7997 (view) Author: silvan Date: 2018-10-19.11:09:32
Merged.
msg7996 (view) Author: silvan Date: 2018-10-19.10:37:48
You are right, I changed the default in Lab (don't remember why, though). I
reverted that and you can now reload the report with the correct summed
aggregation:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue849-v1-issue849-base-issue849-v1-compare.html
msg7986 (view) Author: malte Date: 2018-10-19.01:13:15
Experimental results look good. However, I wonder why things like
"score_total_time" show the arithmetic mean rather than the sum. I think I'm
used to seeing the sum in other experiments. Is that because you use a different
experiment script compared to, e.g., Jendrik and Florian?

The issue with the arithmetic mean is that we only get two significant digits,
which is not a whole lot. With the sum we get something like five significant
digits because we have three digits in front of the decimal point.

Not a reason to rerun things, of course. Apart from the comments I left on the
commits, I think this is ready to merge.
msg7985 (view) Author: malte Date: 2018-10-19.01:10:00
I've looked at the individual commits and left a few small polishing comments.
Apologies if any of the things I'm asking should be obvious from previous
discussion; it's hard to see previous comments on the pull request when looking
at individual commits.

None of the commits since the experiments warrant rerunning the experiments, but
I haven't had a look at the experiments in the first place. :-) (When I did the
code review, the tracker was down.) So let me have a a quick look...
msg7982 (view) Author: silvan Date: 2018-10-18.19:46:22
Issue tracker is back online :-) For the sake of completeness: Malte left some
comments at bitbucket, I addressed them. Do you want to go through the small
commits or should I merge? Or even re-run experiments?
msg7921 (view) Author: silvan Date: 2018-10-08.12:40:17
This is how I implemented it so far, but still keeping everything related to
Options in the algorithm class to allow easier use of that class. Of course, if
we argue that only the M&S heuristic is and can use it, then we could leave
everything related to options there.
msg7919 (view) Author: malte Date: 2018-10-08.12:16:10
I haven't looked at the code yet, so without the benefit that the code gives: my
first intuition is that I would not expose MergeAndShrinkAlgorithm as a
command-line plug-in until it can be used in more than one context.
msg7918 (view) Author: silvan Date: 2018-10-08.11:34:53
Thanks, I addressed the minor ones.

Regarding your comment "MergeAndShrinkAlgorithm should be constructed by
passing in the options explicitly rather than an OptionParser object", I think
that this is linked to the design discussion below.

If you suggest to go down the route to not treat the algorithm "plugin-like",
then we also shouldn't have functions for it to add options to parser (why
would a class provide such functions but then not accept an Options object for
its construction?), to dump options, to verify options etc. Then all this
should remain in the heuristic.

Another alternative of course is to turn the algorithm into a plugin as well,
possibly adding a short-hand alias like
"merge_and_shrink(...)=mas_heuristic(mas_algorithm(...))" to avoid additional
nesting. Malte and I weren't really sure at which point it would be justified
to make the algorithm pluggable; right-away or only later when we might have
other users than the M&S heuristic.
msg7912 (view) Author: jendrik Date: 2018-10-07.21:10:19
I left a few comments on Bitbucket.
msg7910 (view) Author: silvan Date: 2018-10-07.18:24:22
(I ran some experiments to confirm that nothing went wrong:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue849-v1-issue849-base-issue849-v1-compare.html)
msg7909 (view) Author: malte Date: 2018-10-07.15:21:31
I will also try to find time for this ASAP. But as I'm travelling today and
tomorrow and Tuesday will be my first day meeting everyone in Toronto, this
likely won't happen before Wednesday Toronto time, i.e., very late on Wednesday
or early on Thursday in Europe.
msg7906 (view) Author: silvan Date: 2018-10-07.09:30:13
Since this issue will cause many merge conflicts with my other repositories, I'd
like to get it merged very soon. As it further doesn't change any of the
existing functionality, maybe a heavy-weight review is not necessary besides
discussing the design questions that I raised below, thus answering where which
code should end up.

@Jendrik: could you maybe read through the discussion and also have a look at
the code? To actually get a diff and not just a new file, it could be helpful to
compare the old merge_and_shrink_heuristic.* against the new
merge_and_shrink_algorithm.*. I attached these four files.
msg7855 (view) Author: malte Date: 2018-10-03.16:42:50
> Do you think you can review this soon, or should I add it to the queue?

Hard to tell. One option is to just add it to the queue, which doesn't
immediately imply that it will take a long time to review. ;-) Or wait a day or
two, and if nothing happens, add it to the queue then.
msg7846 (view) Author: silvan Date: 2018-10-03.15:21:36
I prepared a pull-request at
https://bitbucket.org/SilvanS/fd-dev/pull-requests/41/issue849/diff

There are certainly several possible ways of splitting the responsibilities
between heuristic and algorithm, and I for now decided to move almost everything
into the algorithm. That means that the only thing the heuristic has to do is to
use the algorithm to compute an FTS and to then determine which factor to use
(there are several if one is unsolvable, and it uses an unsolvable in this case;
otherwise, there is exactly one). 

This also means that while all options related to M&S are of course still being
added by the heuristic (which is the object created from the command line), the
options are added (via public static methods), extracted, and dumped only in the
algorithm. This allows creating the algorithm with a single options object. This
could certainly be changed towards a constructor accepting a list of individual
options, and I wouldn't mind changing this if you found this the better design.

Another design question is whether the algorithm should store all the things it
needs when being constructed and be called only with the task object to compute
the FTS, or whether it should be a one-shot factory that should not store
anything (we could use an external function in that case and internally use a
class to avoid passing around many arguments). My current solution uses the
former design. (Also note that the algorithm could not be directly reused to
compute another FTS for another task, at least not without modifying it to keep
the merge strategy factory around and to reset various other things such as
label reduction.)

Do you think you can review this soon, or should I add it to the queue?
msg7831 (view) Author: silvan Date: 2018-10-02.18:53:07
This is part of meta issue567. In order to more easily integrate a time limit
(issue802), we think it will be very useful to separate the computation of the
merge-and-shrink algorithm from its user, the heuristic.
History
Date User Action Args
2018-10-19 11:09:32silvansetstatus: reviewing -> resolved
messages: + msg7997
2018-10-19 10:37:48silvansetmessages: + msg7996
2018-10-19 01:13:15maltesetmessages: + msg7986
2018-10-19 01:10:00maltesetmessages: + msg7985
2018-10-18 19:46:22silvansetmessages: + msg7982
2018-10-08 12:40:17silvansetmessages: + msg7921
2018-10-08 12:16:10maltesetmessages: + msg7919
2018-10-08 11:34:53silvansetmessages: + msg7918
2018-10-07 21:10:19jendriksetmessages: + msg7912
2018-10-07 18:24:22silvansetmessages: + msg7910
2018-10-07 15:21:31maltesetmessages: + msg7909
2018-10-07 09:30:14silvansetfiles: + mas-files.tar.gz
nosy: + jendrik
messages: + msg7906
2018-10-03 16:42:50maltesetmessages: + msg7855
2018-10-03 15:21:36silvansetstatus: in-progress -> reviewing
messages: + msg7846
2018-10-02 18:53:07silvancreate