I also did an eval of the entropy decoding on GSM8k with Qwen2.5-0.5B-Instruct model in a zero shot setting. I found improvements over the base model but they are not better than what we get with a much simple CoT decoding.
You can try them both in this free Gogole Colab -(https://colab.research.google.com/drive/1SpuUb8d9xAoTh32M-9wJsB50AOH54EaH?usp=sharing)