Multimodal Maestro is launching with an implementation of Microsoft's Set of Mark prompting technique, outlined in the paper "Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4" [1]. Using Multimodal Maestro, you can generate SoM masks with ease.
We are actively looking to add more utilities for LMM prompts. We would love ideas!