Issue680

Title Update OSI to a new version
Priority wish Status resolved
Superseder Nosy List erez, florian, jendrik, malte, salome
Assigned To florian Keywords
Optional summary

Created on 2016-10-12.13:51:35 by florian, last changed by florian.

Messages
msg5836 (view) Author: florian Date: 2016-11-29.14:21:06
Merged
msg5815 (view) Author: malte Date: 2016-11-14.22:30:15
I think we shouldn't recommend a version that doesn't compile with modern
compilers unless we have a really good reason to recommend it. So if nobody is
worried about the new performance numbers (please speak up if you are), we
should recommend the new version. And merge.
msg5814 (view) Author: florian Date: 2016-11-14.21:40:41
I'm in favor of merging and indifferent about the new recommendation. If no one
else has a strong opinion on this, lets take the new version!?
msg5813 (view) Author: malte Date: 2016-11-14.21:25:08
What do you recommend? Should we merge the change and then recommend the new
versions?
msg5806 (view) Author: florian Date: 2016-11-07.08:33:53
We still lose some coverage in some configurations (7 with operator counting) but 
also gain some with others (10 with diverse potentials). The search times are still 
all over the map but the most extreme outliers are gone. I think we should merge the 
change (forcing cplex to a single thread) either way, but with these results, we 
could also recommend the new versions.

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue680-v2.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue680-v2-potential.html

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue680_v2_plots.tgz
msg5805 (view) Author: salome Date: 2016-11-04.16:49:41
We found out that the new OSI interface causes CPLEX to use multiple threads. We
did not figure out which change in OSI causes this, but we implemented a change
in fast downward which forces CPLEX to only use 1 thread. Florian is starting
new experiments to see if the performance gets better again (for
miconic:s28-0.pddl the total time went down from 49s to 9s on my computer after
the change in fast downward).
msg5804 (view) Author: florian Date: 2016-11-03.10:26:05
The tasks before s28 also didn't show much variance on the grid. I think we should focus 
on the larger ones. We can try to figure out what the difference for s28-0 is and (if we 
can do something about it) see if that generalizes to other tasks.

I guess a profile will just show that all the time is used up in some library method of 
CPLEX that we cannot access, but maybe its still worth a shot. Could you run profiles 
for s28-0 with osi107-cplex1263-32 and osi103-cplex1263-32?

This should work something like this

./fast-downward.py --preprocess <benchmarks>/miconic/s28-0.pddl
valgrind --tool=callgrind builds/<buildname>/bin/downward --search 'astar(...)' < output


If that doesn't work, maybe we can have a look at the OSI source code and trace which 
calls to CPLEX it makes. We don't call that many OSI functions, so this might be viable.
msg5803 (view) Author: salome Date: 2016-11-03.09:57:04
I ran local experiments with the miconic domain with all 6 possible
configurations (old and new OSI, old and new CPLEX, 32 and 64 bit; the
combination new OSI and old CPLEX does not work). Here are the results:
http://ai.cs.unibas.ch/_tmp_files/simon/issue680-v1.html

In general the new OSI in 32bit performs worse everywhere while the 64bit
performs actually quite ok except for the latest tasks (s28* - s30*). What I
also find interesting is that the new OSI mostly requires less memory, but in
s28*- s30* it suddenly requires a lot more memory.
msg5772 (view) Author: florian Date: 2016-10-24.11:53:31
We discussed this offline and decided to look into the performance loss a bit more.

I tried to reproduce the behavior of the operator counting heuristic locally and used 
some of the larger miconic tasks (they stuck out in the plots). On the grid their 
total time increased by a factor of 4 to 10, locally, I didn't see a huge difference. 
I had to use a 64 bit build locally, because my 32 bit CPLEX is not working anymore, 
maybe this was the difference. 

Salome, could you set up a new CPLEX and both OSI versions for it, and run some of 
the miconic tasks (s28* - s30*) in the 32 bit build.
msg5757 (view) Author: malte Date: 2016-10-19.20:05:21
The build worked for me, and with this OSI version I no longer need to enable
the workaround.
msg5756 (view) Author: florian Date: 2016-10-19.16:10:03
Apart from adding the option there is nothing else I'm aware of (besides the common problems with OSI, which you probably 
know already from installing 0.103.0). Here are the configurations I used on my computer:

# 32 bit
sudo ./configure CC="gcc"  CFLAGS="-m32 -pthread -Wno-long-long" CXX="g++" CXXFLAGS="-m32 -pthread -Wno-long-long" \
LDFLAGS="-L$DOWNWARD_CPLEX_ROOT/lib/x86_linux/static_pic" --without-lapack --enable-static=yes \
--prefix="/opt/coin-0.107.8-32" --with-cplex-incdir=$DOWNWARD_CPLEX_ROOT/include/ilcplex --with-cplex-lib="-lcplex -lm" \
--disable-zlib

# 64 bit
sudo ./configure CC="gcc"  CFLAGS="-m64 -pthread -Wno-long-long" CXX="g++" CXXFLAGS="-m64 -pthread -Wno-long-long" \
LDFLAGS="-L$DOWNWARD_CPLEX_ROOT/lib/x86-64_linux/static_pic" --without-lapack --enable-static=yes \
--prefix="/opt/coin-0.107.8-64" --with-cplex-incdir=$DOWNWARD_CPLEX_ROOT/include/ilcplex --with-cplex-lib="-lcplex -lm" \
--disable-zlib

CPLEX 12.6.3 should be compatible with both the old (0.103.0) and new (0.107.8) version of OSI. It's the version that I 
also used in the experiments.

I used build configs instead of environment variables to set up the correct paths:
https://bitbucket.org/FlorianPommerening/downward-issues/branch/issue680#chg-issue680_build_configs.py

Let me know if you run into any problems in case I forgot to mention something here.
msg5755 (view) Author: malte Date: 2016-10-19.15:51:21
I'll give it a try later. Is there anything I need to know other than adding
--disable-zlib to the configure call for OSI? I have CPLEX version 12.6.3, is
that what you have been using?
msg5752 (view) Author: florian Date: 2016-10-18.23:51:59
I ran the default configurations of our remaining potential heuristics and got a 
similar drop in coverage when optimizing all syntactic states or the initial state.

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue680-v1-potential.html

About the gcc issue: I cannot test if the new OSI fixes the problem, because I didn't 
have the issue with the old OSI.
msg5750 (view) Author: malte Date: 2016-10-17.18:20:12
I think it would be interesting to see what the effect is in any case.

Regarding whether or not to upgrade: I think at some point we will have to
upgrade. We're already in a situation where current OSI + current gcc won't
compile. I'm happy to treat the LP solver essentially as a black box so that I'm
not too worried if the performance of the black box we use in the future isn't
quite as good as the performance of the black box we used in the past.

But of course a drastic drop isn't ideal, and maybe we can look into what can be
done to improve the performance again. If I recall correctly, we've been
tweaking things in the past to get to the level we're currently at, e.g.
regarding a "row-major" or "column-major" representation of LP constraints.
Perhaps they need to be tweaked in a different direction for the newer OSI/CPLEX
version, and perhaps there is even something we can do to significantly speed
things up, such as parameter-tuning CPLEX for our purposes.
msg5749 (view) Author: florian Date: 2016-10-17.18:12:32
I can run more experiments for those configurations as well, but that won't change 
the existing results. Would we change our recommended OSI version without getting 
better results in those configs?
msg5748 (view) Author: malte Date: 2016-10-17.16:23:29
Or better. :-) OK, I think this means that we should also evaluate
configurations that may behave very differently with different LP solutions,
such as a potential heuristic optimized only for the initial state.
msg5747 (view) Author: florian Date: 2016-10-17.16:20:50
I don't know what to suggest here.

As for the evaluations, I didn't expect any change, but in hindsight it makes sense 
for potential heuristics. We know that there are lots of optimal solutions of the 
LP. If the new version uses a different configuration that yields a different weight 
function, the heuristic can be completely different. The diversification should 
limit the differences somewhat, so for other potential heuristics the differences 
could be even worse.
msg5746 (view) Author: malte Date: 2016-10-17.13:40:34
What do you advocate?

Also, for the first configuration, the number of evaluations changes in quite a
few tasks. Is this expected?
msg5744 (view) Author: florian Date: 2016-10-17.01:43:44
Unfortunately the experiments show a drop in performance when switching to the new 
version of OSI. This also involves using a more up-to date CPLEX version as the OSI 
0.107.8 is not compatible with CPLEX 12.5.1. The experiments do this in two steps 
(first upgrading CPLEX for a fixed OSI version, then upgrading OSI for a fixed 
CPLEX version). The coverage drops by some tasks especially in the second step 
(upgrading OSI).

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue680-v1.html

Total time is very evenly distributed between factors of 1/3 and 3 in most cases, 
but tasks of some domains take significantly longer:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue680_plots.tgz
msg5736 (view) Author: malte Date: 2016-10-14.14:17:00
When this is done, we are also ready to close issue681.
msg5723 (view) Author: malte Date: 2016-10-13.00:10:53
I suggest some experiments with a few of our most LP-heavy configurations to
make sure nothing funny happens with performance with the old vs. new OSI.
msg5722 (view) Author: florian Date: 2016-10-12.23:51:26
Thanks Erez. After trying some time with "--without-zlib" I found out that the 
option is "--disable-zlib". For me, this makes it work without any changes to the 
code. Should I just update the wiki and close this issue?
msg5721 (view) Author: erez Date: 2016-10-12.19:52:54
There's a flag to tell OSI not to use zlib.
msg5717 (view) Author: florian Date: 2016-10-12.13:51:35
In the wiki we recommend installing version 0.103.0 of COIN/OSI, which is 6 years 
old now. In this issue, we should check what changes we need in the build system 
so that we can compile with the newest version (0.107.8). A first test showed 
that version 0.107.8 can be installed in the same way as 0.103.0, but it has an 
additional dependency on zlib. Adding this to the build system lead to a 32/64-
bit issue (http://stackoverflow.com/questions/39994097).
History
Date User Action Args
2016-11-29 14:21:06floriansetstatus: chatting -> resolved
messages: + msg5836
2016-11-14 22:30:15maltesetmessages: + msg5815
2016-11-14 21:40:41floriansetmessages: + msg5814
2016-11-14 21:25:08maltesetmessages: + msg5813
2016-11-07 08:33:53floriansetmessages: + msg5806
2016-11-04 16:49:41salomesetmessages: + msg5805
2016-11-03 10:26:06floriansetmessages: + msg5804
2016-11-03 09:57:04salomesetmessages: + msg5803
2016-10-24 11:53:31floriansetnosy: + salome
messages: + msg5772
2016-10-19 20:05:21maltesetmessages: + msg5757
2016-10-19 16:10:03floriansetmessages: + msg5756
2016-10-19 15:51:21maltesetmessages: + msg5755
2016-10-18 23:51:59floriansetmessages: + msg5752
2016-10-17 18:20:12maltesetmessages: + msg5750
2016-10-17 18:12:32floriansetmessages: + msg5749
2016-10-17 16:23:29maltesetmessages: + msg5748
2016-10-17 16:20:50floriansetmessages: + msg5747
2016-10-17 13:40:34maltesetmessages: + msg5746
2016-10-17 01:43:44floriansetmessages: + msg5744
2016-10-14 14:17:00maltesetmessages: + msg5736
2016-10-13 00:10:53maltesetmessages: + msg5723
2016-10-12 23:51:26floriansetmessages: + msg5722
2016-10-12 19:52:54erezsetnosy: + erez
messages: + msg5721
2016-10-12 13:51:35floriancreate