Given the fact that we are limited to seeing only one output at a time, it's kind of hard to compare outputs from different models, adjustments made to the prompting, and sampler settings.
But even when keeping the generation parameters the same (e.g., to test for reliability in the output) and just going for multiple passes, there is no easy way to have a side-by-side comparison to keep track of the outputs from the multiple "rounds".
I really like excalidraw and thinking visually, so that got me thinking: why not build a sort of "open-world" playground to just place nodes that represent system, user, and assistant messages on a big canvas? Since they all fit on the screen, we have this easy side-by-side comparison that is basically unlimited in the number of direct side-by-side threads at any level. Meaning one user input could lead to two assistant outputs that in turn introduce new sub-branches that can be tracked visually.