Issue859

Title Analyze performance with GCC 4.8, 5.4 and 8.2
Priority feature Status resolved
Superseder Nosy List augusto, florian, guillem, jendrik, malte
Assigned To Keywords
Optional summary
This is required for issue852.

Created on 2018-10-26.17:44:13 by augusto, last changed by augusto.

Summary
This is required for issue852.
Messages
msg8072 (view) Author: malte Date: 2018-11-08.02:39:16
Thanks, Augusto! You mentioned worse scores for gcc 8.2 than for gcc 4.8, but I
don't see that. The way I look at these numbers, gcc 8.2 looks best, followed by
gcc 4.8, and gcc 5.4 is worst. Did you mean to say gcc 5.4 instead of gcc 8.2?

Overall, I'm not concerned by the numbers. The differences are small, and the
reason we want to move to newer compilers is for code clarity, not primarily for
performance. It is normal that satisficing configurations fluctuate a lot more
than optimal ones because one can solve a satisficing problem just by lucky
tie-breaking, whereas for optimal planning luck can only help you on the last f
layer. (Simplifying a bit -- of course you can also get lucky in how exactly
LM-Cut resolves ties in the landmark selection etc.)

Schedule is one of the domains that has proven more susceptible than most to
tie-breaking in the past. There are some actions there that look promising in
the delete relaxation but will screw you up if you apply them, and whether or
not a heuristic like h^FF will fall into this trap is essentially determined by
arbitrary tie breaking.

The relevant parts of the code don't have explicit randomness, but some parts of
the code do things like break ties based on memory addresses etc., and these can
move around arbitrarily with compiler changes. Perhaps it would be good to
improve the robustness of the code further in the future because these
fluctuatios are somewhat annoying. But this is of course not what this issue is
about.

So from my perspective, this is ready to be closed.

Augusto, if you agree, feel free to set this to "resolved".
msg8071 (view) Author: augusto Date: 2018-11-07.17:43:13
Results for the first iteration of LAMA can be found here:
http://inf.ufrgs.br/~abcorrea/_issues/issue859/lama/

I used only 10 minutes for each run.

The results seem a lot more noisy then the ones using LM-cut and blind. We have
some cases where there is a difference of 99% in memory and time. All the
extreme cases happened in the schedule domain, but I have no explanation for
that. Also, with LAMA it is not so clear anymore that the new compilers boost
the performance: GCC 8.2 had a worst score for total time and memory usage
compared to GCC 4.8.

There were also differences in coverage in at least 7 domains between GCC 4.8
and GCC 8.2. The cost and quality of the plans also changed. But I guess all
these changes in coverage and plan quality can be assumed as some randomness
influenced by the compiler (i.e., the compiler use a different random seed or
something like that.) Or am I missing something here?
msg8031 (view) Author: malte Date: 2018-11-01.00:34:01
Thanks, Augusto! Some of the differences in score_total_time are small enough
that in other circumstances I'd double-check if they are more than noise, but
looking at the per-domain results shows that they are fairly consistent, which
suggests an effect of high significance (there is definitely something positive
going on) but low strength (it is not changing things by much).

Regarding the additional memory, I think it is likely that it is not additional
allocations as such but just the size of the compiled code. It's not unusual for
this to grow with newer compiler versions.

Regarding the volatility of some of the results for blind search, I don't have a
good explanation. The configuration may be more affected by grid noise or by
small "random" changes in the compiler. Based on some past experiences, I would
say it's probably not worth investing a lot of time trying to understand what is
going on there.

I would be interested in seeing the lama-first results (and perhaps others would
like to see other results), but I don't think they are absolutely needed.
msg8030 (view) Author: augusto Date: 2018-10-31.17:11:12
We finally have the first experiments testing the three different compilers.
We tested blind and LM-cut using limits of ~3.6GB and 10 minutes. We produced
one comparative table for each case and also relative scatter plots comparing
memory and total time.


Results for blind: http://inf.ufrgs.br/~abcorrea/_issues/issue859/blind/
Results for LM-cut: http://inf.ufrgs.br/~abcorrea/_issues/issue859/lmcut/


We can observe the following:
* Memory:
  - GCC 5.4 and 8.2 increase the memory usage by a maximum factor of 2.5%. This
behavior happens in the smallest instances. If we increase the size of the
instances, this factor decreases and it is almost 0% for the largest ones. This
indicates that the newest compilers might have some constant additional
allocation of memory when compared to GCC 4.8. That would explain why this
increase is more visible in the smallest instances. Also, this constant seems to
be larger in GCC 8.2 than in GCC 5.4 (but not as large as when comparing GCC 4.8
to the newest ones).

* Time:
  - Fast Downward with the new compilers is faster than with GCC 4.8 on average.
In special, GCC 8.2 produces the best results w.r.t. time. This performance
boost is easier to be identified in the results using LM-cut. If we look the
plots comparing GCC 4.8 to GCC 8.2, the larger instances present (on average) a
more significant improvement on time than the smallest ones. The comparison
between GCC 5.4 and GCC 8.2 is not so clear for me. (But, ok, the scores show
that GCC 8.2 produces better results wrt time.) Also, the plots for the
experiment using blind search suggest that the new compilers produce better
results (which is proved by the total time score on the tables). However, there
are many more outliers in this case than in the experiment using LM-cut. Do we
have any easy explanation for that?


Question: should we run the first iteration of LAMA?
msg8026 (view) Author: augusto Date: 2018-10-26.19:35:23
Just for the record: GCC 5.4 and GCC 8.2 produce many warnings on the grid
because of the ld.gold linker. Right now, we are changing it manually in the
'build_config.py' options. We should be able to solve this linker problem
changing the '.bashrc' file (or any similar local method) without modifying the
Fast Downward build configurations.
msg8024 (view) Author: augusto Date: 2018-10-26.17:44:13
This sub-issue is just for the experiments and performance analysis of the
different compilations using different GCC versions.
History
Date User Action Args
2018-11-08 16:28:52augustosetstatus: testing -> resolved
2018-11-08 02:39:17maltesetmessages: + msg8072
2018-11-07 17:43:13augustosetmessages: + msg8071
2018-11-05 11:24:54jendriksetnosy: + jendrik
2018-11-01 00:34:01maltesetmessages: + msg8031
2018-10-31 17:11:12augustosetmessages: + msg8030
2018-10-26 19:35:23augustosetmessages: + msg8026
2018-10-26 18:31:59guillemsetnosy: + guillem
2018-10-26 17:44:13augustocreate