Thanks! Code looks good. Regarding the results, I'm surprised by the change in
the evaluation score -- seems there is an additional task solved, which improves
the score, but coverage remains the same. Is this because the task is proved
unsolvable? (That's in the mystery domain.)
I had a quick look at the diff; looks fine. I think the h^m issue was the only
thing bothering us, so it looks ready to merge.
|