Issue 859: Analyze performance with GCC 4.8, 5.4 and 8.2 - Fast Downward issue tracker

Title	Analyze performance with GCC 4.8, 5.4 and 8.2
Priority	feature	Status	resolved
Superseder		Nosy List	augusto, florian, guillem, jendrik, malte
Assigned To		Keywords
Optional summary	This is required for issue852.

Created on 2018-10-26.17:44:13 by augusto, last changed by augusto.

Summary
This is required for issue852.

Messages
msg8072 (view)	Author: malte	Date: 2018-11-08.02:39:16
Thanks, Augusto! You mentioned worse scores for gcc 8.2 than for gcc 4.8, but I don't see that. The way I look at these numbers, gcc 8.2 looks best, followed by gcc 4.8, and gcc 5.4 is worst. Did you mean to say gcc 5.4 instead of gcc 8.2? Overall, I'm not concerned by the numbers. The differences are small, and the reason we want to move to newer compilers is for code clarity, not primarily for performance. It is normal that satisficing configurations fluctuate a lot more than optimal ones because one can solve a satisficing problem just by lucky tie-breaking, whereas for optimal planning luck can only help you on the last f layer. (Simplifying a bit -- of course you can also get lucky in how exactly LM-Cut resolves ties in the landmark selection etc.) Schedule is one of the domains that has proven more susceptible than most to tie-breaking in the past. There are some actions there that look promising in the delete relaxation but will screw you up if you apply them, and whether or not a heuristic like h^FF will fall into this trap is essentially determined by arbitrary tie breaking. The relevant parts of the code don't have explicit randomness, but some parts of the code do things like break ties based on memory addresses etc., and these can move around arbitrarily with compiler changes. Perhaps it would be good to improve the robustness of the code further in the future because these fluctuatios are somewhat annoying. But this is of course not what this issue is about. So from my perspective, this is ready to be closed. Augusto, if you agree, feel free to set this to "resolved".
msg8071 (view)	Author: augusto	Date: 2018-11-07.17:43:13
Results for the first iteration of LAMA can be found here: http://inf.ufrgs.br/~abcorrea/_issues/issue859/lama/ I used only 10 minutes for each run. The results seem a lot more noisy then the ones using LM-cut and blind. We have some cases where there is a difference of 99% in memory and time. All the extreme cases happened in the schedule domain, but I have no explanation for that. Also, with LAMA it is not so clear anymore that the new compilers boost the performance: GCC 8.2 had a worst score for total time and memory usage compared to GCC 4.8. There were also differences in coverage in at least 7 domains between GCC 4.8 and GCC 8.2. The cost and quality of the plans also changed. But I guess all these changes in coverage and plan quality can be assumed as some randomness influenced by the compiler (i.e., the compiler use a different random seed or something like that.) Or am I missing something here?
msg8031 (view)	Author: malte	Date: 2018-11-01.00:34:01
Thanks, Augusto! Some of the differences in score_total_time are small enough that in other circumstances I'd double-check if they are more than noise, but looking at the per-domain results shows that they are fairly consistent, which suggests an effect of high significance (there is definitely something positive going on) but low strength (it is not changing things by much). Regarding the additional memory, I think it is likely that it is not additional allocations as such but just the size of the compiled code. It's not unusual for this to grow with newer compiler versions. Regarding the volatility of some of the results for blind search, I don't have a good explanation. The configuration may be more affected by grid noise or by small "random" changes in the compiler. Based on some past experiences, I would say it's probably not worth investing a lot of time trying to understand what is going on there. I would be interested in seeing the lama-first results (and perhaps others would like to see other results), but I don't think they are absolutely needed.
msg8030 (view)	Author: augusto	Date: 2018-10-31.17:11:12
We finally have the first experiments testing the three different compilers. We tested blind and LM-cut using limits of ~3.6GB and 10 minutes. We produced one comparative table for each case and also relative scatter plots comparing memory and total time. Results for blind: http://inf.ufrgs.br/~abcorrea/_issues/issue859/blind/ Results for LM-cut: http://inf.ufrgs.br/~abcorrea/_issues/issue859/lmcut/ We can observe the following: * Memory: - GCC 5.4 and 8.2 increase the memory usage by a maximum factor of 2.5%. This behavior happens in the smallest instances. If we increase the size of the instances, this factor decreases and it is almost 0% for the largest ones. This indicates that the newest compilers might have some constant additional allocation of memory when compared to GCC 4.8. That would explain why this increase is more visible in the smallest instances. Also, this constant seems to be larger in GCC 8.2 than in GCC 5.4 (but not as large as when comparing GCC 4.8 to the newest ones). * Time: - Fast Downward with the new compilers is faster than with GCC 4.8 on average. In special, GCC 8.2 produces the best results w.r.t. time. This performance boost is easier to be identified in the results using LM-cut. If we look the plots comparing GCC 4.8 to GCC 8.2, the larger instances present (on average) a more significant improvement on time than the smallest ones. The comparison between GCC 5.4 and GCC 8.2 is not so clear for me. (But, ok, the scores show that GCC 8.2 produces better results wrt time.) Also, the plots for the experiment using blind search suggest that the new compilers produce better results (which is proved by the total time score on the tables). However, there are many more outliers in this case than in the experiment using LM-cut. Do we have any easy explanation for that? Question: should we run the first iteration of LAMA?
msg8026 (view)	Author: augusto	Date: 2018-10-26.19:35:23
Just for the record: GCC 5.4 and GCC 8.2 produce many warnings on the grid because of the ld.gold linker. Right now, we are changing it manually in the 'build_config.py' options. We should be able to solve this linker problem changing the '.bashrc' file (or any similar local method) without modifying the Fast Downward build configurations.
msg8024 (view)	Author: augusto	Date: 2018-10-26.17:44:13
This sub-issue is just for the experiments and performance analysis of the different compilations using different GCC versions.

History
Date	User	Action	Args
2018-11-08 16:28:52	augusto	set	status: testing -> resolved
2018-11-08 02:39:17	malte	set	messages: + msg8072
2018-11-07 17:43:13	augusto	set	messages: + msg8071
2018-11-05 11:24:54	jendrik	set	nosy: + jendrik
2018-11-01 00:34:01	malte	set	messages: + msg8031
2018-10-31 17:11:12	augusto	set	messages: + msg8030
2018-10-26 19:35:23	augusto	set	messages: + msg8026
2018-10-26 18:31:59	guillem	set	nosy: + guillem
2018-10-26 17:44:13	augusto	create