I’m the maintainer of OpenLIT, and like many of you, I’ve always struggled with debugging LLM responses on Ollama and GPT4All—it's possible but with significant effort tbh. So, taking advantage of my role as maintainer, I tackled the problem head-on to make it simpler.
I've developed an OpenTelemetry-native integration within OpenLIT aimed at boosting performance tracking for any LLM hosted on Ollama or GPT4All. Integration takes just one line of code. This integration has notably streamlined my own workflow, allowing monitoring of prompts and responses, including the parameters used, like temperature settings, and how these affect responses, plus tracking token counts.
If you’re encountering similar issues, this could be a really useful: https://docs.openlit.io/latest/integrations/ollama
I'd love to hear how it works for you or discuss any other strategies you've found effective in managing LLMs!