Issue1040

Title Rewrite option parser to support the component interaction problem
Priority feature Status chatting
Superseder Nosy List florian, jendrik, malte, silvan
Assigned To Keywords
Optional summary
TODO:
* ~~cast and transform the return values~~
* handle errors
* ~~generate help from the information in the registry~~
* ~~add the option to access (enriched) ASTNode~~
* ~~allow validation and rewriting options~~
* ~~add let syntax and variables~~
* ~~rewrite command line string in the presence of predefinitions~~
* use the new option parser instead of the old one for all existing plugins
* remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md)

Created on 2022-02-01.10:01:26 by florian, last changed by florian.

Summary
TODO:
* cast and transform the return values
* handle errors
* generate help from the information in the registry
* add the option to access (enriched) ASTNode
* allow validation and rewriting options
* add let syntax and variables
* rewrite command line string in the presence of predefinitions
* use the new option parser instead of the old one for all existing plugins
* remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md)
Messages
msg10664 (view) Author: florian Date: 2022-03-11.09:58:35
Help is now generated from the new registry. We will only be able to check if it does the same thing as the old code once we switch over all plugins, though.
msg10647 (view) Author: malte Date: 2022-03-08.13:36:03
A split into two directories sounds good to me. The more the two parts can be decoupled, the more sense it makes. From what I know so far about the code, I can only comment on this in general terms.
msg10644 (view) Author: florian Date: 2022-03-08.12:05:49
Looking at this again, options.* and any.h belong to "plugins", not "parser", so that would also mean that most classes only need to include headers from "plugins".
msg10643 (view) Author: florian Date: 2022-03-08.11:50:20
One thing Patrick and I discussed was that we could probably split the directory "parser" into two directories (= CMake plugins):

parser:
  lexical_analyzer.*
  syntax_analyzer.*
  abstract_syntax_tree.*
  enriched_abstract_syntax_tree.*
  options.*
  any.h
  errors.*

plugins:
  plugin_info.*
  plugin.*
  bounds.*
  raw_registry.*
  registry_types.h 
  registry.*

If we are not overlooking something "parser" would depend on "plugin" but not the other way around. To define a plugin, you'd probably still need includes from both modules (plugins/plugin.h for the plugin class, parser/options.h for constructing the object).

Should we still split the code to decouple it a bit more?
msg10641 (view) Author: florian Date: 2022-03-08.11:30:09
We are done with the let/var syntax and rewriting the old-style predefinitions.

Error-handling and help generation is still missing before we can start switching classes to the new parser.

Malte if you have time for a review, this would be a good time to review the main part of the changes (the missing parts are relatively independent and if we want to do some major changes it would be good to know this before we touch the existing plugins). The "reviewing guide" in msg10580 is still up to date. By now we removed the unused semantic_analyzer files that are mentioned at the end of that message.
msg10615 (view) Author: florian Date: 2022-02-21.10:33:47
I added a TODO to the summary: once we have all the functionality in place, we have to use the new code for all plugins.
msg10614 (view) Author: florian Date: 2022-02-21.10:32:05
Yes, the plan is to get rid of this again once we no longer need it for the iterated search. I also put comments about this in the code. So let's leave it in this state for now.
msg10613 (view) Author: malte Date: 2022-02-20.16:36:13
The virtual clone method is one of the oldest design patterns. Makes sense to me. I understand correctly that the intention is to remove this again later? In that case, I'd do it this way and move on. Otherwise I'd spend more time on the design. (I haven't looked at the diff.)
msg10612 (view) Author: florian Date: 2022-02-20.14:02:02
I added a hack to support the lazy parsing we use for the iterated search. In theory, this is not that complicated, but in practice, we have to make our parse tree copyable.

This is complicated by the fact that the parse tree is not a single class but a tree of objects that are from different derived classes (ListNode, FunctionCallNode, etc. all derived from EnrichedASTNode). When copying, we only have a polymorphic pointer to the base class, so we don't know what class to construct. This page (https://www.fluentcpp.com/2017/09/08/make-polymorphic-copy-modern-cpp/) mentioned the "classical solution" to this problem is a virtual "clone" method, that is overridden in the derived classes, so they construct the correct type. Like this:

unique_ptr<EnrichedASTNode> ListNode::clone() {
  return make_unique_ptr<ListNode>(*this);
}

The clone methods work for creating a copy of the parse tree as a polymorphic unique_ptr<EnrichedASTNode> but we then also have to store it in an Options object, which nests it inside an Any object. Any requires its content to be copyable, so the unique_ptr doesn't work and we want a shared pointer instead. To do so, I added a second set of clone methods to all classes that return a shared pointer instead of a unique pointer. I don't see a way around this, because internally unique pointers are used, so we cannot just clone everything with shared pointers.

Anyway, my point with this rant is that the hack requires a lot of code changes that we only need for the hack. I would have liked to keep the hack more local so it doesn't infect too many other parts while we are still working on the component interaction problem (it can disappear after that). But I could not think of a way that would lead to a smaller diff. 

If you want to see the diff, it is in this commit:
https://github.com/aibasel/downward/pull/95/commits/2a5799b4f28f8d8869db0532e9c165602a187564
msg10609 (view) Author: florian Date: 2022-02-15.16:46:34
We added a way to preprocess the options. This can be used for validation (like opts.verify_list_not_empty() in the old code) or for creating a new options object based on the old one (as ipdb and astar do).
msg10608 (view) Author: malte Date: 2022-02-15.14:00:14
I've added that removing ext also requires updating README.md.
msg10607 (view) Author: florian Date: 2022-02-15.11:26:52
One more TODO: once we are done, we should remove code that is only used in the old parser. The directory src/search/ext/* can probably go completely and is easy to forget.
msg10604 (view) Author: florian Date: 2022-02-11.13:37:57
Casting and transforming the return values from an options object is now implemented.
msg10595 (view) Author: florian Date: 2022-02-11.09:06:30
I agree that once we switch to the builders, both of the points you mentioned should not be necessary anymore on the parser side. If we want to do this in two steps, we need them as a temporary solution.

I'll ask around if anyone else wants to do a review. I would be good if more people are familiar with the design anyway.
msg10586 (view) Author: malte Date: 2022-02-10.19:20:29
Thanks, Florian! I had a look at the pull request, but I won't find enough time to look at it in detail during the sprint, it's too much for that. If you want, we can arrange a meeting to talk about things.

I looked at the pull request a little bit and have no further comment from this, but this might be different with more time. Perhaps someone else can review it?

Regarding the summary:

"* add the option to access (enriched) ASTNode"

The way this is currently done in the iterated search is absolutely terrible. Once all pieces are in place, there should be no need for something like this; this is one of the things we want to fix while addressing the component interaction problem. In the meantime of course it's fine to design something temporary to address the need we have, but this shouldn't influence the final design.

"* allow validation and rewriting options"

Many of the things mentioned here sound to me like things that would happen on the builder side and are not relevant to the option parser. Perhaps this is also something where the design will need to evolve over time, but my instinct would be to be minimalistic with these kinds of things on the option parser side.
msg10581 (view) Author: florian Date: 2022-02-10.16:48:48
I added the major TODOs that are still left. 

* cast and transform the return values
  A vector<vector<int>> is currently generated as a vector<Any>, where the Any holds
  another vector<Any>, where this Any holds an int. We have find a way to turn this
  into a vector<vector<int>> inside options.get<vector<vector<int>>>().

* handle errors
  We completely ignored this so far and have only placeholder error messages.

* generate help from the information in the registry
  We also completely ignored this so far.

* add the option to access (enriched) ASTNode.
  The plan is to allow not parsing part of the config. The iterated search current
  needs this, so the (potentially expensive) components of the second search are not
  constructed before the first search finished. This is probably no longer needed
  once we handle the component interaction problem, so for now, we could live with a
  hacky solution.

* allow validation and rewriting options
  Our plan here is to call a virtual function in Plugin<>::construct() after the
  options are constructed and before the object itself is constructed. This function
  should work like a filter in lab, i.e., it can validate the options and fail if they
  are not acceptable (e.g., empty list of heuristics). But it can also change things
  in the options (e.g., set an argument that functionally depends on a other
  arguments) or even construct some objects, set them into a new Options object and
  return the new one. This way, we could handle almost all of the special cases we
  have so far.

* add let syntax and variables
  See msg10476. We have not worked on this so far.

* rewrite command line string in the presence of predefinitions
  See msg10476. We have not worked on this so far.
msg10580 (view) Author: florian Date: 2022-02-10.16:28:36
Most of what I wrote in msg10510 and 10509 is still up to date but we had to change the details a bit. For example, it is not possible to call a virtual method in the constructor of plugin as we planned, so we now handle the code in the constructor. This means that the plugin is registered before the meta-data is filled in but this is OK, as the raw registry stores the plugin as a pointer, so the order of filling in the data and registering doesn't matter.

We also added an additional step in the parsing process that creates an enriched version of the abstract syntax tree from the original AST. This way, we do not have to modify the objects as the information about function signatures is added from the registry.

We prepared a pull request for the current state at:
https://github.com/aibasel/downward/pull/95
it adds the new code in a new CMake plugin and namespace, so it doesn't conflict with the old code, while we are working on it. In command_line.cc and parser_tests.h we hacked in a test case that should not stay like this, but it shows how the external interface should change (we might hide some of the individual steps used in command_line.cc behind a nicer interface).

For reviewing the changes, I would recommend following the order of the parser:

1) lexical_analyzer.* parses the string into a stream of tokens.
2) syntax_analyzer.* parser the token stream into an abstract syntax tree
3) abstract_syntax_tree.* represents this tree and has methods to create an "enriched" copy of itself.
4) enriched_abstract_syntax_tree.* represents this enriched data and has methods to construct the objects. In this step we use the classes Options and Any (from options.*, any.h). So far, these are copied from the old module and probably need some more work.

Steps 3 and 4 rely on the plugin data which is collected in the registry, which has its own life cycle and can be reviewed independently:

1) plugin_info.*, plugin.*, bounds.* define the classes we create and store in the registry. There are three kinds of plugins: TypePlugins define the types of plugins that can be created (SearchEngine, Evaluator, ...), Plugins define the actual things we want to parse (EagerSearch, BlindHeuristic, ...), GroupPlugins define groups that we can sort plugins into for structuring in the wiki (PDB heuristics, Potential heuristics, ...)

2) In the plugin constructor, the plugin registers itself in the raw registry (raw_registry.*) that collects all of this data without error checking. It uses types from registry_types.h 

3) The raw registry then can create a Registry (registry.*) that contains the curated information of the raw registry. (We need this two-step process because we don't know in which orders plugins are created).

There are two more files that I didn't mention so far:
- errors.* should do error handling for things going wrong in the parser. We didn't work on this yet and everything related to errors is still inconsistent (we have to think this trough once the other parts are in place).
- semantic_analyzer.* is an old version of the third level of parsing. We probably do not need it anymore.
msg10510 (view) Author: florian Date: 2022-02-04.16:14:29
On the parser side, we plan to do two passes through our parsed data with the registry. The first one checks that all parameters are well-typed, transforms positional arguments into keyword arguments and  sets default values for unspecified arguments. At this point, we also check that no arguments were given to the function that the function did not expect.

In the second pass, we create the actual objects (or the builders once we introduce those). This will look roughly like this.

Any parse(parser::Function f, Registry registy, type_index type_id) {
    unique_ptr<PluginBase> plugin = registry.get(type_id, f.name);
    Options opts;
    for (auto parameter : plugin->get_paremeters()) {
        opts.set(parameter.name) = parse(f.get_parameter(parameter.name), registry, parameter.type);
    }
    return plugin.construct(opts);
}

We use Any to pass parameters from `plugin.construct` to the options object because they are stored as Any within options anyway and we could not figure out how to do this with templates. Since the type safety is checked in the first pass, we think this is acceptable.
msg10509 (view) Author: florian Date: 2022-02-04.15:59:22
We started writing a lexical and syntactical parse to parse the command line text first into a stream of tokens and then into a recursive data structure that consists of literals (a single token, e.g. "infinity"), lists (e.g., [1,2,3]), and "functions" (e.g., ff()). Functions have a list of posititional and keyword arguments which are themselves literals, lists or functions.

As a next step, we want to change the way plugins are defined, so we can extract the necessary meta data
to this abstract data structure into something more concrete. Currently, a plugin definition mixes parsing and this meta data definition so we will have to change this in all plugins. But we think we found a way that is not that disruptive and also works with our future plans for the component interaction problem.

We plan to use the actual plugin objects but with a new Plugin class (currently these static objects are just created so their constructor is called and registers the plugin). The new class will look roughly as follows and will be stored in the registry. We have the same issue as the current code that we don't know the creation order of the static objects and will use the same solution of first storing everything in a provisional list and then transform this into the real registry before parsing.

class PluginBase:
    PluginInfo metadata;
public:
    PluginBase() {
        initialize_metadata();
        RawRegistry::instance()->register_plugin(this);
    }
    virtual type_index get_type() = 0;
    virtual Any construct(const Options &opts) = 0;
    virtual void initialize_metadata() = 0;

    add_document_synopsis(string, string);
    add_document_language_support(string, string);
    add_document_property(string, string);
    add_document_note(string, string);    
    add_option<T>(string key, string help, string default, ...);
}

The `add_*` methods are equivalent to the ones currently used in the OptionParser so the diffs should remain readable. In contrast to the current implementation, they should only store the data in the plugin instance. The new method `get_type` is used to communicate the `return type` of the plugin (e.g., SearchEngine for `astar()`). The method `initialize_metadata` should be overridden in derived classes and replaces our current `_parse` methods. In it the `add_*` methods are used to set up all documentation and options. It is called in the constructor before registering the plugin. The `construct` methods currently should just construct the actual parsed object (and possibly do error checking). Later, they will return the builder objects instead. To avoid some code duplication with the most common case, we can implement a derived class like this: 

template<typename Base, typename Constructed>
class Plugin<T> : PluginBase
    Any construct(const Options &opts) override {
        return Any(make_shared<Constructed>(opts));
    }

    type_index get_type() {
        return typeid(Base);
    }
}

The FF heuristic for example would then be defined like this:

class FFPlugin: Plugin<Evaluator, FFHeuristic> {
    void initialize_metadata() override {
        /*
          Use add_* methods from Plugin base class to set
          documentation and options. This should be a small
          diff to the current _parse method.
        */ 
    }
}

static FFPlugin _plugin;
msg10477 (view) Author: florian Date: 2022-02-01.10:16:43
We had an initial meeting about this and discussed the general parser design. In a first step, we have to do a lexical analysis, splitting the string into tokens.

We support the following tokens: ( ) [ ] = , NUMBER WORD
- NUMBER can be any integer or float and we also support numbers like "1e8", "1M", "0.1k", "infinity"
- WORD can be any identifier starting with an underscore or letter and containing only underscores letters and numbers ("infinity" is a reserved word, and not allowed as a WORD).

We plan to use regular expressions for the lexical analysis.

After the lexical analysis, we have to parse the string recursively into a parse tree and then transform the parse tree by adding more information. To do so, we have to adapt our plugin mechanism to make the information about names and types of parameters of each plugin available to the parser. Previously, this was integrated into the actual parsing but for this issue, we aim for a stronger separation of parser and parsed content.
msg10476 (view) Author: florian Date: 2022-02-01.10:01:26
In issue559 we discussed a new design for our option syntax that we will eventually need to support our solution to the component interaction problem.

The new parser should support the following syntax
  let(h1, ff(), let(h2, cg(), eager_greedy([var(h1), var(h2)], preferred=[var(h1),var(h2)])))
and otherwise support all the old features.

Predefinitions (--evaluator h=ff()) should be rewritten to the new syntax internally before parsing.
History
Date User Action Args
2022-03-11 09:58:35floriansetmessages: + msg10664
summary: TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * ~~add the option to access (enriched) ASTNode~~ * ~~allow validation and rewriting options~~ * ~~add let syntax and variables~~ * ~~rewrite command line string in the presence of predefinitions~~ * use the new option parser instead of the old one for all existing plugins * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md) -> TODO: * ~~cast and transform the return values~~ * handle errors * ~~generate help from the information in the registry~~ * ~~add the option to access (enriched) ASTNode~~ * ~~allow validation and rewriting options~~ * ~~add let syntax and variables~~ * ~~rewrite command line string in the presence of predefinitions~~ * use the new option parser instead of the old one for all existing plugins * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md)
2022-03-08 13:36:03maltesetmessages: + msg10647
2022-03-08 12:05:49floriansetmessages: + msg10644
2022-03-08 11:50:20floriansetmessages: + msg10643
2022-03-08 11:30:09floriansetmessages: + msg10641
summary: TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * ~~add the option to access (enriched) ASTNode~~ * ~~allow validation and rewriting options~~ * add let syntax and variables * rewrite command line string in the presence of predefinitions * use the new option parser instead of the old one for all existing plugins * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md) -> TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * ~~add the option to access (enriched) ASTNode~~ * ~~allow validation and rewriting options~~ * ~~add let syntax and variables~~ * ~~rewrite command line string in the presence of predefinitions~~ * use the new option parser instead of the old one for all existing plugins * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md)
2022-02-21 10:33:47floriansetmessages: + msg10615
summary: TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * ~~add the option to access (enriched) ASTNode~~ * ~~allow validation and rewriting options~~ * add let syntax and variables * rewrite command line string in the presence of predefinitions * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md) -> TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * ~~add the option to access (enriched) ASTNode~~ * ~~allow validation and rewriting options~~ * add let syntax and variables * rewrite command line string in the presence of predefinitions * use the new option parser instead of the old one for all existing plugins * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md)
2022-02-21 10:32:05floriansetmessages: + msg10614
2022-02-20 16:36:13maltesetmessages: + msg10613
2022-02-20 14:02:02floriansetmessages: + msg10612
summary: TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * add the option to access (enriched) ASTNode. * ~~allow validation and rewriting options~~ * add let syntax and variables * rewrite command line string in the presence of predefinitions * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md) -> TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * ~~add the option to access (enriched) ASTNode~~ * ~~allow validation and rewriting options~~ * add let syntax and variables * rewrite command line string in the presence of predefinitions * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md)
2022-02-15 16:46:34floriansetmessages: + msg10609
summary: TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * add the option to access (enriched) ASTNode. * allow validation and rewriting options * add let syntax and variables * rewrite command line string in the presence of predefinitions * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md) -> TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * add the option to access (enriched) ASTNode. * ~~allow validation and rewriting options~~ * add let syntax and variables * rewrite command line string in the presence of predefinitions * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md)
2022-02-15 14:00:14maltesetmessages: + msg10608
summary: TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * add the option to access (enriched) ASTNode. * allow validation and rewriting options * add let syntax and variables * rewrite command line string in the presence of predefinitions * remove code that is no longer used (in particular, src/search/ext/*) -> TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * add the option to access (enriched) ASTNode. * allow validation and rewriting options * add let syntax and variables * rewrite command line string in the presence of predefinitions * remove code that is no longer used (in particular, src/search/ext/*; also requires update of License in README.md)
2022-02-15 11:26:53floriansetmessages: + msg10607
summary: TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * add the option to access (enriched) ASTNode. * allow validation and rewriting options * add let syntax and variables * rewrite command line string in the presence of predefinitions -> TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * add the option to access (enriched) ASTNode. * allow validation and rewriting options * add let syntax and variables * rewrite command line string in the presence of predefinitions * remove code that is no longer used (in particular, src/search/ext/*)
2022-02-11 13:37:57floriansetmessages: + msg10604
summary: TODO: * cast and transform the return values * handle errors * generate help from the information in the registry * add the option to access (enriched) ASTNode. * allow validation and rewriting options * add let syntax and variables * rewrite command line string in the presence of predefinitions -> TODO: * ~~cast and transform the return values~~ * handle errors * generate help from the information in the registry * add the option to access (enriched) ASTNode. * allow validation and rewriting options * add let syntax and variables * rewrite command line string in the presence of predefinitions
2022-02-11 09:06:30floriansetmessages: + msg10595
2022-02-10 19:20:29maltesetmessages: + msg10586
2022-02-10 16:48:48floriansetmessages: + msg10581
summary: TODO: * cast and transform the return values * handle errors * generate help from the information in the registry * add the option to access (enriched) ASTNode. * allow validation and rewriting options * add let syntax and variables * rewrite command line string in the presence of predefinitions
2022-02-10 16:28:36floriansetmessages: + msg10580
2022-02-04 16:14:29floriansetmessages: + msg10510
2022-02-04 15:59:22floriansetmessages: + msg10509
2022-02-01 10:16:43floriansetstatus: unread -> chatting
messages: + msg10477
2022-02-01 10:06:26silvansetnosy: + silvan
2022-02-01 10:01:26floriancreate