The cool thing is that after the song is analyzed on the server, the client can recompute and preview the results completely client-side through an implementation that uses Web Workers and WebAssembly. The audio previewing uses Tone.js. I am thinking of writing up some more details about the implementation in the future.
I'm still working on a way to explain this easily, but I like the idea of carrying over the concept of content-aware fill from images to audio.
Please let me know if you have any comments or questions!