Similarly to other landmark issues, I think the experiments should also report things relevant to the landmark code, such as landmark generation time, numbers of landmarks and orderings etc. This can hopefully be copied from experiments from some of the other landmark issues we worked on recently.
No need to create a new report for this experiment, but perhaps in the next experiment we can also include this data for base and v1.
I agree that the runtime data looks good to proceed. Unlike other landmark issues we have had recently, the data looks largely like noise if you look into the per-domain results. Previously we often had small difference that were quite consistent across domains.
|