My D&D group typically casts music to our TV during sessions, and very occasionally, the default background image lines up nicely with what the party is doing (e.g. a picture of a mountain while we’re exploring a mountain pass). Using Whisper speech-to-text, GPT 3.5, and DALL-E 3, I rigged up something to make that happen automatically. The results are certainly interesting. The images generated are typically much closer to what’s actually going on than the previous example, but get many of the small details wrong.
Show HN: Live-ish illustration for TTRPG campaigns | Heykuki News