Tool-use APIs make it hard to interweave text and commands together. But if LLMs can output XML and text seamlessly, it's better just to go back to basics and roll with an online parser that can handle commands on the fly as the response comes streaming in.
This project is still in development, so please leave comments and feedback! Let me know what troubles you currently have and how SaxaMLL can help. Thanks!