The latest Traces & Spans release of Phoenix offers: -Out of the box tracing for LlamaIndex and LangChain -Fully local execution, no data sent anywhere, outside of your own LLM calls -Ability to get a common dataframe format across frameworks back to a notebook for Evals -Code based LLM Eval harness: light, simple, and fast -Benchmarking scripts for retrieval setup - chunk size, K, and retrieval approach
https://github.com/Arize-ai/phoenix
We love to hear more from the community about what kind of LLM applications you are building, whether you are using a framework or building from scratch, and how are you running/measuring LLM Evals today. Thinking a lot about what “non-framework” integrations look like.
Also genuinely interested in people's opinions of LLM spans and traces versus OTEL. Is the divergence because of something intrinsic to the application or as a community are we reinventing the wheel?
Colab if you want to test it out: https://colab.research.google.com/github/Arize-ai/phoenix/bl...