Issue740

Title make translator output file configurable
Priority wish Status reviewing
Superseder Nosy List guillem, jendrik, malte, patrick.luehne
Assigned To guillem Keywords easy, translator
Optional summary

Created on 2017-10-25.16:10:43 by patrick.luehne, last changed by jendrik.

Messages
msg6834 (view) Author: jendrik Date: 2018-03-14.17:47:43
I made the aforementioned change (rm -f output.sas) in Lab.
msg6811 (view) Author: jendrik Date: 2018-02-19.14:33:32
No need for that, I think. I'll add it once this issue is merged.
msg6810 (view) Author: guillem Date: 2018-02-19.14:28:38
Makes sense, yes. Do you want me to open a PR for lab with that (the simple 
addition of the -f flag)?
msg6809 (view) Author: jendrik Date: 2018-02-19.13:07:10
The old comments are preserved in the "Activity" tab on Bitbucket. I left some 
minor comments over there.

Regarding Lab, I think it's best to just use "rm -f output.sas". When a user 
passes "--sas-task myfile" the SAS file is probably needed and shouldn't be 
removed.
msg6808 (view) Author: guillem Date: 2018-02-19.12:29:57
Thanks Jendrik - I've updated the PR accordingly. Not sure if updating a PR in bitbucket removes the old comments? Hope 
not.

BTW this change affects LAB as well. Namely, the constructor of FastDownwardExperiment (lab/experiment.py:205) adds a 
hardcoded "rm output.sas" which should be modified whenever the --output-sas option is used. Do you want me to open an 
issue on that? Or we can have a look at it together this week.
msg6806 (view) Author: jendrik Date: 2018-02-16.16:27:16
I left some comments on Bitbucket.
msg6795 (view) Author: malte Date: 2018-02-09.10:39:21
I can have a look next week, after I'm done with my grant proposal. If I don't
write again before Friday, feel free to remind me.
msg6794 (view) Author: guillem Date: 2018-02-09.10:33:17
The diff for this issue is here:

https://bitbucket.org/gfrances/downward-issue740/pull-requests/1/issue740/diff

I've tested this with a few configurations, including invoking only the translator 
component, invoking different aliases, standard search invokation, plus a 
portfolio configuration, running those on a few instances and searching for all 
*.sas in the workspace. Just for the record, I temporarily disabled the removal of 
"output.sas" in LAB's code, so that I could check the exact filenames being used.
Everything works as expected. Malte / Jendrik, let me know if you'd test any other 
particular thing, or if this looks good to you to be merged into the default 
branch. Patrick, perhaps you'd like to test this as well once we merge it into the 
main repo.

G.
msg6793 (view) Author: patrick.luehne Date: 2018-02-08.10:56:28
@malte: Thanks for your thorough explanations!

Please know that my points weren’t meant as criticism; I just wanted my feedback
to be heard :). I totally understand that the FastDownward team have many
priorities and that you all had to put lots of thought into how to make things
efficient and effective research-wise. As a “user” of the translator component,
I had been wondering about some of the decisions, and your explanations
clarified these a lot.

@guillem: And thanks also for your current work on the --sas-file option :).
msg6792 (view) Author: malte Date: 2018-02-07.17:34:46
Looks good to me! Make sure that it also works when using portfolio
configurations. I'm not sure they use the same code path to call the search
component.
msg6791 (view) Author: guillem Date: 2018-02-07.17:27:48
I'm having a look at this. Basically, after the few changes I'm implementing, the behavior should be as follows:

* The translate.py script has an option "--sas-file FILENAME" that allows to specify the filename where the translated SAS task will be placed. 
This can of course be a relative or an absolute path. This allows the translate output to be customized even if only the translator is needed.

* The fast-downward.py main script has an option with the same name and semantics; if specified, it passes it down to the translator AND uses 
it to stdin-feed it to the search component.

Malte/Jendrik, is there any particular or non-standard use case that you'd like to test?
msg6789 (view) Author: malte Date: 2018-02-06.17:53:19
> @malte: I didn’t suggest using exit codes exclusively for error reporting.
> What I intended to say was that only the exit code could be used to check
> whether the program finished its job successfully. If not, the error
> output is still available in addition to the exit code to figure out what
> went wrong.

We often have gigabytes of stdout output in an experiment -- up to several
megabytes per planner run, multiplied by perhaps 10000 planner runs. But the
stderr output is short and sweet, almost always nothing and rarely more than 100
bytes, short enough to be included in the overall experiment report.

> In this way, noncritical output printed to stderr could simply be ignored
> if the exit code was 0, in which case the (well-formed) result can be used
> by the next tool in the chain.

Sure, that's what a modern logging system with explicit log levels like debug,
warning, error, critical would enable. That would be nice to have, and that's
part of what I meant about modernizing the logging system in the first paragraph
of msg6785. But at the current moment, it would make our lives much more
difficult if the critical stuff we need to see is buried in gigabytes of error
output. Even if we had a log facility where errors and other output can be
distinguished at a glance, "grep '^error:' **/*.log" can take ages to sift
through gigabytes of data, where "ls **/*.err" will be very fast.

Roughly speaking, we have three types of output:

1) Actual results, like the plans that are produced.
2) Information about the progress of the planning process, which are mainly used
for automatic consumption by scripts that run planner experiments.
3) Information about fatal errors that occurred during the planner process.

For you as a user, separation of 1+2 is important, which is why you like to be
able to see only 1 on stdout. (Currently, 1 can only be separated by going
through a file.) This makes a lot of sense. But for us who are mainly interested
in the planning process itself, separation of 2+3 is much more important because
we don't really care about the plans; our main "outcomes" are the things that
the planner writes to stdout. So for us it would be quite undesirable to combine
2+3 in the same stream. At the moment, the main people who run the planner are
other planning researchers who are interested in the planning process itself,
not "real users", so this is the audience we cater for primarily. But of course
it would be better to cater to both audiences well.
msg6788 (view) Author: patrick.luehne Date: 2018-02-06.17:27:10
@malte: I didn’t suggest using exit codes exclusively for error reporting. What
I intended to say was that only the exit code could be used to check whether the
program finished its job successfully. If not, the error output is still
available in addition to the exit code to figure out what went wrong.

In this way, noncritical output printed to stderr could simply be ignored if the
exit code was 0, in which case the (well-formed) result can be used by the next
tool in the chain.

However, I believe that you have reasoned about error reporting a lot already. I
was just curious to know how you handle this, because I believe that existing
output on stderr isn’t considered an error by itself with most existing tools :).
msg6787 (view) Author: malte Date: 2018-02-06.17:19:07
> Just out of curiosity, why is output to stderr interpreted as actual errors?
> Couldn’t this be handled by looking at whether the exit code was 0 (success)?
> I believe that’s a more common convention :). Couldn’t this be handled by
looking at whether the exit code was 0 (success)?
> I believe that’s a more common convention :).

Setting the exit code to non-zero on error is not mutually exclusive with
reporting errors on stderr. :-) All basic unix tools like ls, cp, cat etc. write
something to stderr iff they return a non-zero exit code. Of course we use exit
codes, too.

But it's hard to communicate exactly what went wrong in a single number when
errors can occur in the driver, in the translator, in the search component, in
different parts of a portfolio etc. In automated experiments, we want to be able
to tell the experimenter what went wrong if there is something that needs their
attention.
msg6786 (view) Author: patrick.luehne Date: 2018-02-06.16:48:48
@malte: Thanks for the explanations :). Making more use of libraries indeed
sounds like a good long-term plan!

Just out of curiosity, why is output to stderr interpreted as actual errors?
Couldn’t this be handled by looking at whether the exit code was 0 (success)? I
believe that’s a more common convention :).
msg6785 (view) Author: malte Date: 2018-02-06.16:45:09
Hi Patrick, I think several common conventions for stdout vs. stderr exist, and
unfortunately we already follow a different one (not just in the translator but
in the planner as a whole): in our planner runs, every output to stderr is
considered an actual error or failure that requires investigation. One thing we
might be able to do in the future to support your request is make logging more
configurable in general, but for now logging improvements are more likely to
happen in the search component than in the translator component.

We have "librarification" of the planner as a longer-term goal that we started
actively working on recently, but so far it's not really clear when or even if
it will happen.
msg6784 (view) Author: patrick.luehne Date: 2018-02-06.16:36:10
@guillem, @malte: Thanks for tackling this issue! I’m sorry that I didn’t find
the time to come back to this issue, but I really appreciate that this is being
worked on :).

I agree that making the output filename configurable solves part of the problem,
and it would be a desirable option generally. I still think that using “-” as an
output filename instead of /dev/stdout would be convenient.

Considering @malte’s argument A, I think that all output other than the actually
translated file should go to stderr. This is how many tools dinstinguish the
actual output (on stdout) from additional runtime information, warnings, errors,
etc. (on stderr). Also, I don’t believe that it will be problematic to have this
kind of output directed to stderr, as I don’t expect it to scale with the actual
output.

What concerns using pipes, I’d say that the fact that the translated
representation exists multiple times in memory is to be somewhat expected in
cases where tools reside in separate binaries. Generally, I think that it’d be
best to directly access the original representation from tools using it by
providing a library. In this way, there would be no need for pipes, no necessary
duplicates of the translated representation, or alternative “hacks” required.
However, I believe that the current architecture doesn’t allow for this, and
that this would require lots of work.

Still, I think that the output filename option would be a cool thing to have for
the time being :).
msg6783 (view) Author: guillem Date: 2018-02-06.16:21:28
I'll be tackling this issue in the next days, just to avoid any redundant work 
:-)
msg6765 (view) Author: malte Date: 2018-01-03.20:40:51
I have spun off the part regarding changing the behaviour of the driver script
to not produce a temporary file into issue758, copying the nosy list from this
issue.

This issue is now only about making the translator output file name
configurable. I have renamed it accordingly.

I think neither of the two issues are an enormous amount of work, but I won't be
able to look into them myself in the near future.
msg6573 (view) Author: malte Date: 2017-10-26.10:42:56
> The possible solutions I currently have in mind involve being able to
> configure the output file name in the translator and using either named pipes
> or the /dev/fd... pseudo-files inside the driver script to avoid using actual
> temporary files.

Addendum: after some more reading, the /dev/fd... solution is probably better
because file descriptors cannot be externally interfered with and don't need to
be cleaned up manually. I also expect it to be more portable in the Unix world,
but that needs further investigation.
msg6572 (view) Author: malte Date: 2017-10-26.10:29:50
I moved the original issue report from "optional summary" to "change note".

"Change note" is meant for messages, including the original message to open an
issue. "Optional summary" is an editable field that is intended for TODO lists
and similar things while we're working on the issue. Sorry for the unintuitive
interface! One of these days we should improve it, but so much to do, so little
time. :-)
msg6571 (view) Author: malte Date: 2017-10-26.10:28:15
[original comment by Patrick Lühne:]

Dear FastDownward maintainers,

At our research group at the University of Potsdam, we are using FastDownward’s
translator (with --translate) successfully as a very useful preprocessing step
in our planning projects :).

At the time being, FastDownward’s translator produces an output.sas file, which
we later parse in our applications for further processing.

However, printing the translator output to files has multiple drawbacks.

1. We can’t simply pipe FastDownward’s output within our toolchain like this:

   fast-downward --translate [<files>] | planning-tool

2. When we run two FastDownward processes simultaneously in the same working
directory, the same file is read and written to by the two processes without
synchronization, resulting in all kinds of issues.

3. The working directory is polluted with files that need to be cleaned up later. 

4. FastDownward’s translator fails when run from a directory without write
permissions for the current user.

For this reason, I would like to know whether it would make sense to have
FastDownward print all output to stdout by default instead to files?

I know that there are cases where multiple output files are produced. Perhaps,
it would, thus, be best to have options to redirect output as intended by the user.

For instance, --output=output.sas would print the output to a file, while
--output=- would print to stdout (using a dash is a convention used by many
programs [1]). And --output-<alternative> would configure the output of some
other stream, for instance, the output after postprocessing.

As explained above, our group could really make use of such a feature. Do you
have plans to implement something along these lines in the future? If you
currently don’t have the resources to implement this, I could certainly lend a
hand :).

I haven’t seen any issue like this on the issue tracker before, so I apologize
if a similar idea has been proposed before.

Many thanks,
Patrick




[1] https://unix.stackexchange.com/a/16364
msg6570 (view) Author: malte Date: 2017-10-26.10:09:27
Dear Patrick, thanks for the very thoughtfully formulated feature request!

We have thought about this general problem in the past, for all the reasons you
mention. We want to implement something along those lines, but we are not yet
clear on the details.

Some considerations:

A. If we use stdout for output, the translated problem file will be mixed with
the regular output of the translator, so whichever script (or user) is calling
the translator will have to take them apart again. In some use cases, a --silent
setting to suppress log output could address this, but in others we would need
to do something more complex, such as making log output configurable, too.

B. One drawback of using pipes and related approaches that are based on the
output and input process running concurrently is that at some point near the end
of execution of the translator, the whole translated task will still be in
memory for the translator (which is just about to finish writing it), and it
will also be in memory for the search component or other tool following the
translator (which is just about to finish reading it). Instead of using pipes
directly, one can also capture and then feed forward the output, but again we
would have a duplicate copy of the data. In some planning tasks, for example
from the Planning and Learning competitions, this can be a substantial issue.

To be clear, these are not meant as arguments against the proposed feature, just
as considerations in selecting an implementation strategy.


The possible solutions I currently have in mind involve being able to configure
the output file name in the translator and using either named pipes or the
/dev/fd... pseudo-files inside the driver script to avoid using actual temporary
files. (In a bash script, I think the cleanest solution would be to use process
substitution, and from the bash documentation it looks like process substitution
is internally implemented with named pipes and/or /dev/fd..., depending on what
the operating system supports.) This addresses point A., but not B. I don't know
a good way to address B.

I'm not sure if we want to implement the "-" convention because at the moment I
don't want to encourage using stdout for the output task due to point A above.
But note that even if "-" is not supported, you can use /dev/stdout with any
program instead. (We usually use Linux. After reading a bit, I'm not sure if
this is supported on a Mac, but there may be more portable alternatives. If so,
I'd be very interested in hearing more.  For Windows, unfortunately I don't know
if there is an easy way of doing this kind of thing.)


So in terms of a practical TODO list, there would be two steps to implement:

1. Make the translator output file name configurable.

2. Change the plumbing inside the driver script so that a full planner run by
default does not write intermediate files to disk, does not need write access to
the current directory, and does not rely on exclusive use of a given file name.

Step 1. is probably quite a bit easier than step 2., and it may already be
sufficient to support your use case. So we could try to do that one soon, and
work on 2. if/when someone has a bit of time to spare.

Would that work for you?
History
Date User Action Args
2018-03-14 17:47:43jendriksetmessages: + msg6834
2018-02-19 14:33:32jendriksetmessages: + msg6811
2018-02-19 14:28:38guillemsetmessages: + msg6810
2018-02-19 13:07:10jendriksetmessages: + msg6809
2018-02-19 12:29:57guillemsetmessages: + msg6808
2018-02-16 16:27:16jendriksetmessages: + msg6806
2018-02-09 10:39:21maltesetmessages: + msg6795
2018-02-09 10:33:17guillemsetstatus: in-progress -> reviewing
messages: + msg6794
2018-02-08 10:56:28patrick.luehnesetmessages: + msg6793
2018-02-07 17:34:46maltesetmessages: + msg6792
2018-02-07 17:27:48guillemsetstatus: chatting -> in-progress
messages: + msg6791
2018-02-06 17:53:19maltesetmessages: + msg6789
2018-02-06 17:27:10patrick.luehnesetmessages: + msg6788
2018-02-06 17:19:07maltesetmessages: + msg6787
2018-02-06 16:48:48patrick.luehnesetmessages: + msg6786
2018-02-06 16:45:09maltesetmessages: + msg6785
2018-02-06 16:36:10patrick.luehnesetmessages: + msg6784
2018-02-06 16:21:28guillemsetassignedto: guillem
messages: + msg6783
nosy: + guillem
2018-01-03 20:40:52maltesetmessages: + msg6765
keyword: + easy, translator
title: Print translator output to stdout and not to file -> make translator output file configurable
2017-10-26 11:49:52jendriksetnosy: + jendrik
2017-10-26 10:42:56maltesetmessages: + msg6573
2017-10-26 10:29:50maltesetmessages: + msg6572
2017-10-26 10:28:15maltesetmessages: + msg6571
summary: Dear FastDownward maintainers, At our research group at the University of Potsdam, we are using FastDownward’s translator (with --translate) successfully as a very useful preprocessing step in our planning projects :). At the time being, FastDownward’s translator produces an output.sas file, which we later parse in our applications for further processing. However, printing the translator output to files has multiple drawbacks. 1. We can’t simply pipe FastDownward’s output within our toolchain like this: fast-downward --translate [<files>] | planning-tool 2. When we run two FastDownward processes simultaneously in the same working directory, the same file is read and written to by the two processes without synchronization, resulting in all kinds of issues. 3. The working directory is polluted with files that need to be cleaned up later. 4. FastDownward’s translator fails when run from a directory without write permissions for the current user. For this reason, I would like to know whether it would make sense to have FastDownward print all output to stdout by default instead to files? I know that there are cases where multiple output files are produced. Perhaps, it would, thus, be best to have options to redirect output as intended by the user. For instance, --output=output.sas would print the output to a file, while --output=- would print to stdout (using a dash is a convention used by many programs [1]). And --output-<alternative> would configure the output of some other stream, for instance, the output after postprocessing. As explained above, our group could really make use of such a feature. Do you have plans to implement something along these lines in the future? If you currently don’t have the resources to implement this, I could certainly lend a hand :). I haven’t seen any issue like this on the issue tracker before, so I apologize if a similar idea has been proposed before. Many thanks, Patrick [1] https://unix.stackexchange.com/a/16364 ->
2017-10-26 10:09:27maltesetstatus: unread -> chatting
nosy: + malte, patrick.luehne
messages: + msg6570
2017-10-25 16:10:43patrick.luehnecreate