Issue589

Title Make statistics output easier to parse
Priority feature Status reviewing
Superseder Nosy List erez, florian, jendrik, malte, silvan
Assigned To jendrik Keywords
Optional summary

Created on 2015-11-09.17:21:14 by florian, last changed by jendrik.

Messages
msg5947 (view) Author: jendrik Date: 2016-12-20.20:02:39
Florian also added comments on bitbucket. Thanks for that! 

I have addressed all of Silvan's and Florian's comments. Now we'll have to wait 
until Malte's FIFO gets to this issue :-)
msg5945 (view) Author: jendrik Date: 2016-12-20.18:13:22
> Sure, but then we shouldn't change the relaxation heuristic here.

I think it's good to see how the code using the logging API will look. If you want 
we can remove it before merging.

The use case for the "JSON" file is to let lab use it exclusively (after a 
transition period), i.e. lab won't parse the output written to stdout. There may be 
some exceptions though, since for example the peak memory can't be written to the 
statistics file when the planner doesn't exit "normally".
msg5904 (view) Author: silvan Date: 2016-12-19.12:56:04
> Thanks for the comments! As a general note: Currently, this pull request is 
> concerned with getting the logging design right, not with using it in all places. 
> We can discuss if we want to do the second step in this issue once we're happy
with 
> the design.

Sure, but then we shouldn't change the relaxation heuristic here.

> I think both outputs are human-readable, while the latter is easier to parse for 
> programs, because it uses JSON.

The question of why we support both still remains. What is the use case for
specifying a file for the JSON output? If the default is still to use lab's
parsers to parse the regular std::out log files, then we may not need the JSON
output. Or we change the root logger to also stream its output into a (default)
file, which then can be parsed by lab.
msg5901 (view) Author: jendrik Date: 2016-12-19.09:04:58
Thanks for the comments! As a general note: Currently, this pull request is 
concerned with getting the logging design right, not with using it in all places. 
We can discuss if we want to do the second step in this issue once we're happy with 
the design.

Regarding your question: Currently, the output is as follows:

stdout: [t=4.6s, 22628 KB] downward.merge_and_shrink.init_time: 4.1s
file: {"downward.merge_and_shrink.init_time": 4.1}

I think both outputs are human-readable, while the latter is easier to parse for 
programs, because it uses JSON.
msg5900 (view) Author: silvan Date: 2016-12-18.12:45:52
The code looks good I think. I left a few comments at bitbucket.

The default scenario for parsing statistics with lab would then still be to
parse the regular log file, right, and that's why the default std::out output is
now machine-readable rather than human-readable. For which use case do we
support the possibility to specify a statistics file to write statistics in, if
not as the default to parse statistics with lab?
msg5857 (view) Author: jendrik Date: 2016-12-09.21:27:31
I have started working on this and made a pull request at
https://bitbucket.org/jendrikseipp/downward/pull-requests/64

It would be great if you guys could have a look at the code and give me some 
feedback.
msg5500 (view) Author: malte Date: 2016-08-07.16:04:31
I'd do that for the translator, but not for the search component, as we might
eventually want to fold everything into the search component. (Folding the
preprocessor into the search component is already essentially done.)
msg5496 (view) Author: jendrik Date: 2016-08-07.11:25:50
You're right, we shouldn't enforce uniqueness from the start.

Also, we should maybe consider prepending the component, e.g. 
"downward.translate.variables", "downward.search.total_time".
msg5493 (view) Author: malte Date: 2016-08-06.13:49:04
I suggest "downward" instead of "fd". It's longer, but we never use the
abbreviation "FD".

> The code should check that logger and attribute names don't contain
> dots.

Not good enough. What about space, newlines, colons, ...?

I suggest we should only allow regular ASCII identifiers as used by C-like
languages, i.e.:

- must not be empty
- must start with [a-Za-z_]
- after first character, allowed characters are [a-Za-z0-9_]

> It is an error if an existing attribute is overwritten.

Will we really be able to enforce this from the start? Then we need to find a
way to specify information that needs to be logged multiple times.
msg5491 (view) Author: jendrik Date: 2016-08-05.21:01:20
We discussed this offline and here are the results:

--statistics myfile.stat enables logging JSON lines to myfile.stat

Add code to logging.h that supports the following API:

Log g_log("fd");
g_log.statistics("variables", 18);
# cout: [t=0.6s, 22628 KB] fd.variables: 18
# file: {"fd": {"variables": 18}}
Log eager_search_log(g_log, "eager_search");
eager_search_log.statistics("evaluations", 1000);
# cout: [t=10.6s, 22628 KB] fd.eager_search.evaluations: 1000
# file: {"fd": {"eager_search": {"evaluations": 1000}}}
Log mas_log(g_log, "mas");
mas_log.statistics("init_time", mas_timer);
# cout: [t=4.6s, 22628 KB] fd.mas.init_time: 4.1s
# file: {"fd": {"mas": {"init_time": 4.1}}}

Log::statistics() will be overloaded  or templated for our common
types (including vectors).

The code should check that logger and attribute names don't contain
dots.

The parsing code collects the JSON lines and updates the properties
file with the new attributes. It is an error if an existing
attribute is overwritten.

Currently lab only supports flat attributes. We should discuss if
instead of {"fd": {"mas": {"init_time": 4.1}}} we want to write
{"fd.mas.init_time": 4.1} directly or let lab flatten the attribute
name.

Since lab supports globbing, you'll be able to select all "mas"
attributes by adding the attribute "fd.mas.*" to the report.
msg4739 (view) Author: erez Date: 2015-11-09.18:10:12
That sounds like a good idea. However, you 
might want even more structure than just 
key/value. For example, when you output search 
progress statistics, you have attribute 
(expanded/generated/time/...), and the value.  
But to connect two reports from the same time 
(for example, after 5 minutes we had 5000 
expanded, 10000 generated), you might want to 
add some sort of ID for each report. There 
might be even more complicated cases which I 
haven't thought of. I would suggest some sort 
of way to output a record.
msg4736 (view) Author: florian Date: 2015-11-09.17:21:14
We want to output statistics on a different channel (with a public utility
function). This function should then write key-value pairs in some easy to parse
format. The statistics could be written to a separate file and the standard
output, in a way that the file can be directly used by lab. For example, the
format for the file could be json, we then just have to take care of opening and
closing brackets.
History
Date User Action Args
2016-12-20 20:02:39jendriksetmessages: + msg5947
2016-12-20 18:13:22jendriksetmessages: + msg5945
2016-12-19 12:56:04silvansetmessages: + msg5904
2016-12-19 09:04:58jendriksetmessages: + msg5901
2016-12-18 12:45:52silvansetmessages: + msg5900
2016-12-09 21:27:31jendriksetstatus: chatting -> reviewing
assignedto: jendrik
messages: + msg5857
2016-08-07 16:04:31maltesetmessages: + msg5500
2016-08-07 11:25:50jendriksetmessages: + msg5496
2016-08-06 13:49:04maltesetmessages: + msg5493
2016-08-05 21:01:20jendriksetmessages: + msg5491
2015-11-10 10:51:06silvansetnosy: + silvan
2015-11-09 18:10:12erezsetstatus: unread -> chatting
nosy: + erez
messages: + msg4739
2015-11-09 17:21:14floriancreate