Issue 589: Make statistics output easier to parse

Title	Make statistics output easier to parse
Priority	feature	Status	resolved
Superseder		Nosy List	erez, florian, jendrik, malte, silvan
Assigned To		Keywords
Optional summary	Optional: consider using a Verbosity enum like implemented in M&S.

Created on 2015-11-09.17:21:14 by florian, last changed by malte.

Summary
Optional: consider using a Verbosity enum like implemented in M&S.

Messages
msg10655 (view)	Author: malte	Date: 2022-03-09.20:40:50
This is now superseded by issue1051.
msg9218 (view)	Author: jendrik	Date: 2020-04-04.09:49:22
I just saw that there's a "standard" for the one-JSON-record-per-line format: http://jsonlines.org/
msg8023 (view)	Author: jendrik	Date: 2018-10-22.15:38:23
I have updated the pull request by merging the latest master revision into the issue branch. Since I wasn't planning to work on this in the near future, I'll unassign myself. If somebody else would like to work on this, feel free to assign the issue to you and maybe even add it to the review queue.
msg8021 (view)	Author: malte	Date: 2018-10-21.23:56:49
Also see this: https://stackoverflow.com/questions/27907633/multiple-json-objects-in-one-file-extract-by-python for using raw_decode to handle multiple JSON objects in one file. They mention that the second argument to raw_decode is not documented. It is documented in the "simplejson" module that is the source of Python's json module in the stdlib (https://simplejson.readthedocs.io/en/latest/). If we don't want to use undocumented stuff, I suppose we can pass in something like "json_text[next_start_index:]", but we may want to be careful to avoid O(N^2) behaviour because log files can be large, and O(N^2) where N is potentially many megabytes is a problem. On https://stackoverflow.com/questions/18042042/pythons-immutable-strings-and-their-slices someone suggests using buffer for things like these, and that may be a good solution depending on whether or not it is sufficiently string-like for the json module.
msg8020 (view)	Author: malte	Date: 2018-10-21.23:41:55
BTW, the reason I was looking for this issue now is to point out this related talk from CppCon 2018 on "Source Instrumentation for Monitoring C++ in Production". It goes much further than what we are likely to want to do, but it does talk about structured logging among other things and may be useful to have a look at. I only listened to it with one ear, so I'm not entirely sure how relevant it is: https://www.youtube.com/watch?v=0WgC5jnrRx8
msg8019 (view)	Author: malte	Date: 2018-10-21.23:34:57
Hi Jendrik, the previous comment suggests this should be in the review queue, but it currently isn't. Do you want to add it?
msg5947 (view)	Author: jendrik	Date: 2016-12-20.20:02:39
Florian also added comments on bitbucket. Thanks for that! I have addressed all of Silvan's and Florian's comments. Now we'll have to wait until Malte's FIFO gets to this issue :-)
msg5945 (view)	Author: jendrik	Date: 2016-12-20.18:13:22
> Sure, but then we shouldn't change the relaxation heuristic here. I think it's good to see how the code using the logging API will look. If you want we can remove it before merging. The use case for the "JSON" file is to let lab use it exclusively (after a transition period), i.e. lab won't parse the output written to stdout. There may be some exceptions though, since for example the peak memory can't be written to the statistics file when the planner doesn't exit "normally".
msg5904 (view)	Author: silvan	Date: 2016-12-19.12:56:04
> Thanks for the comments! As a general note: Currently, this pull request is > concerned with getting the logging design right, not with using it in all places. > We can discuss if we want to do the second step in this issue once we're happy with > the design. Sure, but then we shouldn't change the relaxation heuristic here. > I think both outputs are human-readable, while the latter is easier to parse for > programs, because it uses JSON. The question of why we support both still remains. What is the use case for specifying a file for the JSON output? If the default is still to use lab's parsers to parse the regular std::out log files, then we may not need the JSON output. Or we change the root logger to also stream its output into a (default) file, which then can be parsed by lab.
msg5901 (view)	Author: jendrik	Date: 2016-12-19.09:04:58
Thanks for the comments! As a general note: Currently, this pull request is concerned with getting the logging design right, not with using it in all places. We can discuss if we want to do the second step in this issue once we're happy with the design. Regarding your question: Currently, the output is as follows: stdout: [t=4.6s, 22628 KB] downward.merge_and_shrink.init_time: 4.1s file: {"downward.merge_and_shrink.init_time": 4.1} I think both outputs are human-readable, while the latter is easier to parse for programs, because it uses JSON.
msg5900 (view)	Author: silvan	Date: 2016-12-18.12:45:52
The code looks good I think. I left a few comments at bitbucket. The default scenario for parsing statistics with lab would then still be to parse the regular log file, right, and that's why the default std::out output is now machine-readable rather than human-readable. For which use case do we support the possibility to specify a statistics file to write statistics in, if not as the default to parse statistics with lab?
msg5857 (view)	Author: jendrik	Date: 2016-12-09.21:27:31
I have started working on this and made a pull request at https://bitbucket.org/jendrikseipp/downward/pull-requests/64 It would be great if you guys could have a look at the code and give me some feedback.
msg5500 (view)	Author: malte	Date: 2016-08-07.16:04:31
I'd do that for the translator, but not for the search component, as we might eventually want to fold everything into the search component. (Folding the preprocessor into the search component is already essentially done.)
msg5496 (view)	Author: jendrik	Date: 2016-08-07.11:25:50
You're right, we shouldn't enforce uniqueness from the start. Also, we should maybe consider prepending the component, e.g. "downward.translate.variables", "downward.search.total_time".
msg5493 (view)	Author: malte	Date: 2016-08-06.13:49:04
I suggest "downward" instead of "fd". It's longer, but we never use the abbreviation "FD". > The code should check that logger and attribute names don't contain > dots. Not good enough. What about space, newlines, colons, ...? I suggest we should only allow regular ASCII identifiers as used by C-like languages, i.e.: - must not be empty - must start with [a-Za-z_] - after first character, allowed characters are [a-Za-z0-9_] > It is an error if an existing attribute is overwritten. Will we really be able to enforce this from the start? Then we need to find a way to specify information that needs to be logged multiple times.
msg5491 (view)	Author: jendrik	Date: 2016-08-05.21:01:20
We discussed this offline and here are the results: --statistics myfile.stat enables logging JSON lines to myfile.stat Add code to logging.h that supports the following API: Log g_log("fd"); g_log.statistics("variables", 18); # cout: [t=0.6s, 22628 KB] fd.variables: 18 # file: {"fd": {"variables": 18}} Log eager_search_log(g_log, "eager_search"); eager_search_log.statistics("evaluations", 1000); # cout: [t=10.6s, 22628 KB] fd.eager_search.evaluations: 1000 # file: {"fd": {"eager_search": {"evaluations": 1000}}} Log mas_log(g_log, "mas"); mas_log.statistics("init_time", mas_timer); # cout: [t=4.6s, 22628 KB] fd.mas.init_time: 4.1s # file: {"fd": {"mas": {"init_time": 4.1}}} Log::statistics() will be overloaded or templated for our common types (including vectors). The code should check that logger and attribute names don't contain dots. The parsing code collects the JSON lines and updates the properties file with the new attributes. It is an error if an existing attribute is overwritten. Currently lab only supports flat attributes. We should discuss if instead of {"fd": {"mas": {"init_time": 4.1}}} we want to write {"fd.mas.init_time": 4.1} directly or let lab flatten the attribute name. Since lab supports globbing, you'll be able to select all "mas" attributes by adding the attribute "fd.mas.*" to the report.
msg4739 (view)	Author: erez	Date: 2015-11-09.18:10:12
That sounds like a good idea. However, you might want even more structure than just key/value. For example, when you output search progress statistics, you have attribute (expanded/generated/time/...), and the value. But to connect two reports from the same time (for example, after 5 minutes we had 5000 expanded, 10000 generated), you might want to add some sort of ID for each report. There might be even more complicated cases which I haven't thought of. I would suggest some sort of way to output a record.
msg4736 (view)	Author: florian	Date: 2015-11-09.17:21:14
We want to output statistics on a different channel (with a public utility function). This function should then write key-value pairs in some easy to parse format. The statistics could be written to a separate file and the standard output, in a way that the file can be directly used by lab. For example, the format for the file could be json, we then just have to take care of opening and closing brackets.

History
Date	User	Action	Args
2022-03-09 20:40:50	malte	set	status: chatting -> resolved messages: + msg10655
2020-04-04 09:49:22	jendrik	set	messages: + msg9218
2018-12-14 17:04:30	silvan	set	summary: Optional: consider using a Verbosity enum like implemented in M&S.
2018-10-22 15:38:23	jendrik	set	status: reviewing -> chatting assignedto: jendrik -> messages: + msg8023
2018-10-21 23:56:49	malte	set	messages: + msg8021
2018-10-21 23:41:55	malte	set	messages: + msg8020
2018-10-21 23:34:57	malte	set	messages: + msg8019
2016-12-20 20:02:39	jendrik	set	messages: + msg5947
2016-12-20 18:13:22	jendrik	set	messages: + msg5945
2016-12-19 12:56:04	silvan	set	messages: + msg5904
2016-12-19 09:04:58	jendrik	set	messages: + msg5901
2016-12-18 12:45:52	silvan	set	messages: + msg5900
2016-12-09 21:27:31	jendrik	set	status: chatting -> reviewing assignedto: jendrik messages: + msg5857
2016-08-07 16:04:31	malte	set	messages: + msg5500
2016-08-07 11:25:50	jendrik	set	messages: + msg5496
2016-08-06 13:49:04	malte	set	messages: + msg5493
2016-08-05 21:01:20	jendrik	set	messages: + msg5491
2015-11-10 10:51:06	silvan	set	nosy: + silvan
2015-11-09 18:10:12	erez	set	status: unread -> chatting nosy: + erez messages: + msg4739
2015-11-09 17:21:14	florian	create

Issue589