Cursor: 1.5x Faster Moe Training on Blackwell with MXFP8 Kernels | Heykuki News