Issue245

Title scripts: reduce experiment size
Priority feature Status resolved
Superseder Nosy List jendrik, malte
Assigned To jendrik Keywords RETIRED-scripts
Optional summary

Created on 2011-06-22.11:22:26 by jendrik, last changed by jendrik.

Messages
msg1385 (view) Author: jendrik Date: 2011-06-22.13:07:48
I think for now the absolute linking should be sufficient as the --compact option 
should only be used in the rare cases when really many configs are compared and 
the preprocess files are not bound to change. The default will be to copy all 
files.
msg1376 (view) Author: malte Date: 2011-06-22.11:32:12
PS: Compression could be handled transparently by the "fetch" command.
msg1375 (view) Author: malte Date: 2011-06-22.11:31:56
For further space reduction, it might be worth considering compressing certain
files. Care would need to be taken that decompression is not counted against
actual runtime. We had a setup like this at IPC 2008.
msg1374 (view) Author: malte Date: 2011-06-22.11:30:59
Sounds good. The critical thing that needs to be ensured is that the correct
version of each file is always used and that nothing bad happens if someone
performs e.g. a new preprocessor run while another experiment is running. I
would suggest using a hash signature mechanism for this: have a cache directory
somewhere where files can be indexed by md5 or sha1 hashes, and use these hash
values to refer to the correct file.

I would avoid using symbolic links explicitly in the experiment directory since
that hurts relocatability. Rather, I'd have a command like "fetch
aasg2525afasfawtn2j output.sas" in the experiment running script where the first
argument is the hash key and the second argument is the destination file we want
to have, and then the link is only generated when the experiment is run and
removed afterwards.

Am I making sense?
msg1373 (view) Author: jendrik Date: 2011-06-22.11:30:02
Additionally we will only mention the domain and problem files in the 
properties file instead of copying them.
msg1372 (view) Author: jendrik Date: 2011-06-22.11:22:26
Experiments currently take a lot of size on the hard disk, because all the 
preprocessing files are copied into the respective directories for every 
configuration. E.g. for 20 different configurations and experiment over all 
domains can easily lead to writing 80GB onto the disk before the experiment is 
even run. This copying also takes lots of time.

The proposed solution is an option for search experiments to only reference the 
preprocessing files instead of copying them.
History
Date User Action Args
2011-06-22 13:07:48jendriksetstatus: chatting -> resolved
messages: + msg1385
2011-06-22 11:32:12maltesetmessages: + msg1376
2011-06-22 11:31:56maltesetmessages: + msg1375
title: Scripts: Reduce experiment size -> scripts: reduce experiment size
2011-06-22 11:30:59maltesetmessages: + msg1374
2011-06-22 11:30:03jendriksetstatus: unread -> chatting
messages: + msg1373
2011-06-22 11:22:26jendrikcreate