Issue197

Title run validator on experiments
Priority feature Status resolved
Superseder Nosy List erez, gabi, jendrik, malte
Assigned To jendrik Keywords
Optional summary

Created on 2011-01-07.00:01:20 by malte, last changed by jendrik.

Messages
msg1214 (view) Author: jendrik Date: 2011-01-22.19:12:43
The cbb part is exactly what you mention: Making invalid plans or other faults 
more prominent. I guess this is what issue137 talks about and we should track the 
status there. We should probably talk about that in person to find a way that is 
feasible.
msg1210 (view) Author: malte Date: 2011-01-22.08:05:26
Hi Jendrik, what is the cbb part in done-cbb here? Should we keep this open, or
should we open another issue for possible improvements?

One possible improvement that I would like is to make certain kinds of errors
such as validation failures very noisy so that they are impossible to miss. (The
way I understand it, one would have to actively look at the plan_valid entries
currently to see that there are problems, right?) But that is better left to a
new issue since it's not just about validation. For example, segmentation faults
in the planner come to mind, or translator/preprocessor errors, or suboptimal
plans in optimal configurations.
msg1209 (view) Author: jendrik Date: 2011-01-21.18:17:06
The validator is now called by the new-scripts in the run's postprocess command. 
The resultfetcher parses the returncode of the postprocess command and sets 
"plan_valid" to 1 only if the returncode is 0. If the returncode is not present 
or any other number, "plan_valid" is 0.
msg1098 (view) Author: malte Date: 2011-01-07.01:38:43
> So probably the way to go is to abandon the subprocess approach in favor 
> of getting inspired by the ipc-scripts and using os.fork, right?

It doesn't have to be either/or -- we could stick to the current code and just
put in an additional fork call. Shouldn't be more than a few lines of code.
IIRC, we were planning discussing the code some time soon anyway; we can discuss
the best architecture then.
msg1097 (view) Author: jendrik Date: 2011-01-07.01:30:03
So probably the way to go is to abandon the subprocess approach in favor 
of getting inspired by the ipc-scripts and using os.fork, right? Or 
should we keep this simpler approach and try to implement timeouts and 
memory limits to the subprocess approach manually (or by searching the 
web for similar modules)?
msg1096 (view) Author: malte Date: 2011-01-07.01:08:19
> At the moment the timeout is set for the main and postprocessing command 
> combined. This hasn't been a problem yet, because there has never been 
> any postprocessing. I don't know how to set the timeouts for the 
> different parts separately with the resource module, because e.g. limits 
> can only be decreased if i remember correctly.

Whenever something is run with a timeout, it should be run in a subprocess
(using os.fork()), and the timeout should be set after the fork. That way,

1) there's no problem with setting different timeouts in different subprocesses, and
2) the main process can always clean up properly since it is never accidentally
killed by a timeout

So that's what I would suggest.
msg1094 (view) Author: jendrik Date: 2011-01-07.00:50:02
Yes.
> If yes, we need to make sure that the grid timeout is large enough to encompass
> both the planner run and the validator run.
At the moment the timeout is set for the main and postprocessing command 
combined. This hasn't been a problem yet, because there has never been 
any postprocessing. I don't know how to set the timeouts for the 
different parts separately with the resource module, because e.g. limits 
can only be decreased if i remember correctly.
msg1091 (view) Author: malte Date: 2011-01-07.00:39:00
Is the postprocessing command run on the server?
If yes, we need to make sure that the grid timeout is large enough to encompass
both the planner run and the validator run.

I think validation should be the default, not something enabled by an option. On
second thought, I don't even like the ability to turn it off, but we could set a
separate validation timeout, with the reports distinguishing between validation
succeeded, validation failed, or validation timed out.
msg1090 (view) Author: jendrik Date: 2011-01-07.00:33:41
I would suggest adding the validator call to the postprocessing command if the 
option --validate is given. An optional parsing could then be done for the 
validator output in the resultfetcher.
msg1086 (view) Author: malte Date: 2011-01-07.00:01:20
It would be good if we could run the validator after an experiment has
completed. We recently had a case where the planner generated invalid plans, and
it is very easy to overlook this.

I'm not sure where and how best to implement this -- the problem is that on the
one hand, the validator can occasionally take a very long time, and on the other
hand, it's really a good idea to do validation always. Suggestions welcome.
Maybe the validator could be run at a particular point by default, with an
option to skip it.
History
Date User Action Args
2011-01-22 19:12:43jendriksetstatus: chatting -> resolved
messages: + msg1214
2011-01-22 08:05:27maltesetstatus: done-cbb -> chatting
messages: + msg1210
2011-01-22 04:57:39jendriksetassignedto: jendrik
2011-01-21 18:17:07jendriksetstatus: chatting -> done-cbb
messages: + msg1209
2011-01-07 01:38:44maltesetmessages: + msg1098
2011-01-07 01:30:03jendriksetmessages: + msg1097
2011-01-07 01:08:19maltesetmessages: + msg1096
2011-01-07 00:50:02jendriksetmessages: + msg1094
2011-01-07 00:39:01maltesetmessages: + msg1091
2011-01-07 00:33:41jendriksetstatus: unread -> chatting
messages: + msg1090
2011-01-07 00:01:20maltecreate