Issue589

Title Make statistics output easier to parse
Priority feature Status resolved
Superseder Nosy List erez, florian, jendrik, malte, silvan
Assigned To Keywords
Optional summary
Optional: consider using a Verbosity enum like implemented in M&S.

Created on 2015-11-09.17:21:14 by florian, last changed by malte.

Summary
Optional: consider using a Verbosity enum like implemented in M&S.
Messages
msg10655 (view) Author: malte Date: 2022-03-09.20:40:50
This is now superseded by issue1051.
msg9218 (view) Author: jendrik Date: 2020-04-04.09:49:22
I just saw that there's a "standard" for the one-JSON-record-per-line format: http://jsonlines.org/
msg8023 (view) Author: jendrik Date: 2018-10-22.15:38:23
I have updated the pull request by merging the latest master revision into the 
issue branch. Since I wasn't planning to work on this in the near future, I'll 
unassign myself. If somebody else would like to work on this, feel free to assign 
the issue to you and maybe even add it to the review queue.
msg8021 (view) Author: malte Date: 2018-10-21.23:56:49
Also see this:
https://stackoverflow.com/questions/27907633/multiple-json-objects-in-one-file-extract-by-python
for using raw_decode to handle multiple JSON objects in one file.

They mention that the second argument to raw_decode is not documented. It is
documented in the "simplejson" module that is the source of Python's json module
in the stdlib (https://simplejson.readthedocs.io/en/latest/). If we don't want
to use undocumented stuff, I suppose we can pass in something like
"json_text[next_start_index:]", but we may want to be careful to avoid O(N^2)
behaviour because log files can be large, and O(N^2) where N is potentially many
megabytes is a problem.

On
https://stackoverflow.com/questions/18042042/pythons-immutable-strings-and-their-slices
someone suggests using buffer for things like these, and that may be a good
solution depending on whether or not it is sufficiently string-like for the json
module.
msg8020 (view) Author: malte Date: 2018-10-21.23:41:55
BTW, the reason I was looking for this issue now is to point out this related
talk from CppCon 2018 on "Source Instrumentation for Monitoring C++ in
Production". It goes much further than what we are likely to want to do, but it
does talk about structured logging among other things and may be useful to have
a look at. I only listened to it with one ear, so I'm not entirely sure how
relevant it is:

https://www.youtube.com/watch?v=0WgC5jnrRx8
msg8019 (view) Author: malte Date: 2018-10-21.23:34:57
Hi Jendrik, the previous comment suggests this should be in the review queue,
but it currently isn't. Do you want to add it?
msg5947 (view) Author: jendrik Date: 2016-12-20.20:02:39
Florian also added comments on bitbucket. Thanks for that! 

I have addressed all of Silvan's and Florian's comments. Now we'll have to wait 
until Malte's FIFO gets to this issue :-)
msg5945 (view) Author: jendrik Date: 2016-12-20.18:13:22
> Sure, but then we shouldn't change the relaxation heuristic here.

I think it's good to see how the code using the logging API will look. If you want 
we can remove it before merging.

The use case for the "JSON" file is to let lab use it exclusively (after a 
transition period), i.e. lab won't parse the output written to stdout. There may be 
some exceptions though, since for example the peak memory can't be written to the 
statistics file when the planner doesn't exit "normally".
msg5904 (view) Author: silvan Date: 2016-12-19.12:56:04
> Thanks for the comments! As a general note: Currently, this pull request is 
> concerned with getting the logging design right, not with using it in all places. 
> We can discuss if we want to do the second step in this issue once we're happy
with 
> the design.

Sure, but then we shouldn't change the relaxation heuristic here.

> I think both outputs are human-readable, while the latter is easier to parse for 
> programs, because it uses JSON.

The question of why we support both still remains. What is the use case for
specifying a file for the JSON output? If the default is still to use lab's
parsers to parse the regular std::out log files, then we may not need the JSON
output. Or we change the root logger to also stream its output into a (default)
file, which then can be parsed by lab.
msg5901 (view) Author: jendrik Date: 2016-12-19.09:04:58
Thanks for the comments! As a general note: Currently, this pull request is 
concerned with getting the logging design right, not with using it in all places. 
We can discuss if we want to do the second step in this issue once we're happy with 
the design.

Regarding your question: Currently, the output is as follows:

stdout: [t=4.6s, 22628 KB] downward.merge_and_shrink.init_time: 4.1s
file: {"downward.merge_and_shrink.init_time": 4.1}

I think both outputs are human-readable, while the latter is easier to parse for 
programs, because it uses JSON.
msg5900 (view) Author: silvan Date: 2016-12-18.12:45:52
The code looks good I think. I left a few comments at bitbucket.

The default scenario for parsing statistics with lab would then still be to
parse the regular log file, right, and that's why the default std::out output is
now machine-readable rather than human-readable. For which use case do we
support the possibility to specify a statistics file to write statistics in, if
not as the default to parse statistics with lab?
msg5857 (view) Author: jendrik Date: 2016-12-09.21:27:31
I have started working on this and made a pull request at
https://bitbucket.org/jendrikseipp/downward/pull-requests/64

It would be great if you guys could have a look at the code and give me some 
feedback.
msg5500 (view) Author: malte Date: 2016-08-07.16:04:31
I'd do that for the translator, but not for the search component, as we might
eventually want to fold everything into the search component. (Folding the
preprocessor into the search component is already essentially done.)
msg5496 (view) Author: jendrik Date: 2016-08-07.11:25:50
You're right, we shouldn't enforce uniqueness from the start.

Also, we should maybe consider prepending the component, e.g. 
"downward.translate.variables", "downward.search.total_time".
msg5493 (view) Author: malte Date: 2016-08-06.13:49:04
I suggest "downward" instead of "fd". It's longer, but we never use the
abbreviation "FD".

> The code should check that logger and attribute names don't contain
> dots.

Not good enough. What about space, newlines, colons, ...?

I suggest we should only allow regular ASCII identifiers as used by C-like
languages, i.e.:

- must not be empty
- must start with [a-Za-z_]
- after first character, allowed characters are [a-Za-z0-9_]

> It is an error if an existing attribute is overwritten.

Will we really be able to enforce this from the start? Then we need to find a
way to specify information that needs to be logged multiple times.
msg5491 (view) Author: jendrik Date: 2016-08-05.21:01:20
We discussed this offline and here are the results:

--statistics myfile.stat enables logging JSON lines to myfile.stat

Add code to logging.h that supports the following API:

Log g_log("fd");
g_log.statistics("variables", 18);
# cout: [t=0.6s, 22628 KB] fd.variables: 18
# file: {"fd": {"variables": 18}}
Log eager_search_log(g_log, "eager_search");
eager_search_log.statistics("evaluations", 1000);
# cout: [t=10.6s, 22628 KB] fd.eager_search.evaluations: 1000
# file: {"fd": {"eager_search": {"evaluations": 1000}}}
Log mas_log(g_log, "mas");
mas_log.statistics("init_time", mas_timer);
# cout: [t=4.6s, 22628 KB] fd.mas.init_time: 4.1s
# file: {"fd": {"mas": {"init_time": 4.1}}}

Log::statistics() will be overloaded  or templated for our common
types (including vectors).

The code should check that logger and attribute names don't contain
dots.

The parsing code collects the JSON lines and updates the properties
file with the new attributes. It is an error if an existing
attribute is overwritten.

Currently lab only supports flat attributes. We should discuss if
instead of {"fd": {"mas": {"init_time": 4.1}}} we want to write
{"fd.mas.init_time": 4.1} directly or let lab flatten the attribute
name.

Since lab supports globbing, you'll be able to select all "mas"
attributes by adding the attribute "fd.mas.*" to the report.
msg4739 (view) Author: erez Date: 2015-11-09.18:10:12
That sounds like a good idea. However, you 
might want even more structure than just 
key/value. For example, when you output search 
progress statistics, you have attribute 
(expanded/generated/time/...), and the value.  
But to connect two reports from the same time 
(for example, after 5 minutes we had 5000 
expanded, 10000 generated), you might want to 
add some sort of ID for each report. There 
might be even more complicated cases which I 
haven't thought of. I would suggest some sort 
of way to output a record.
msg4736 (view) Author: florian Date: 2015-11-09.17:21:14
We want to output statistics on a different channel (with a public utility
function). This function should then write key-value pairs in some easy to parse
format. The statistics could be written to a separate file and the standard
output, in a way that the file can be directly used by lab. For example, the
format for the file could be json, we then just have to take care of opening and
closing brackets.
History
Date User Action Args
2022-03-09 20:40:50maltesetstatus: chatting -> resolved
messages: + msg10655
2020-04-04 09:49:22jendriksetmessages: + msg9218
2018-12-14 17:04:30silvansetsummary: Optional: consider using a Verbosity enum like implemented in M&S.
2018-10-22 15:38:23jendriksetstatus: reviewing -> chatting
assignedto: jendrik ->
messages: + msg8023
2018-10-21 23:56:49maltesetmessages: + msg8021
2018-10-21 23:41:55maltesetmessages: + msg8020
2018-10-21 23:34:57maltesetmessages: + msg8019
2016-12-20 20:02:39jendriksetmessages: + msg5947
2016-12-20 18:13:22jendriksetmessages: + msg5945
2016-12-19 12:56:04silvansetmessages: + msg5904
2016-12-19 09:04:58jendriksetmessages: + msg5901
2016-12-18 12:45:52silvansetmessages: + msg5900
2016-12-09 21:27:31jendriksetstatus: chatting -> reviewing
assignedto: jendrik
messages: + msg5857
2016-08-07 16:04:31maltesetmessages: + msg5500
2016-08-07 11:25:50jendriksetmessages: + msg5496
2016-08-06 13:49:04maltesetmessages: + msg5493
2016-08-05 21:01:20jendriksetmessages: + msg5491
2015-11-10 10:51:06silvansetnosy: + silvan
2015-11-09 18:10:12erezsetstatus: unread -> chatting
nosy: + erez
messages: + msg4739
2015-11-09 17:21:14floriancreate