Show HN: Phoenix OSS – Applying LLM Spans, Traces, and Evals for AI Insights

23 points

3 years ago

We are the developers of Phoenix, which we released in April of this year with a goal of bringing LLM observability to the notebook. In the time since, the growth of LLM frameworks and complex agent workflows led us to add support for LLM spans and traces and introduce a simple Eval harness for testing the data from those spans.

The latest Traces & Spans release of Phoenix offers: -Out of the box tracing for LlamaIndex and LangChain -Fully local execution, no data sent anywhere, outside of your own LLM calls -Ability to get a common dataframe format across frameworks back to a notebook for Evals -Code based LLM Eval harness: light, simple, and fast -Benchmarking scripts for retrieval setup - chunk size, K, and retrieval approach

https://github.com/Arize-ai/phoenix

We love to hear more from the community about what kind of LLM applications you are building, whether you are using a framework or building from scratch, and how are you running/measuring LLM Evals today. Thinking a lot about what “non-framework” integrations look like.

Also genuinely interested in people's opinions of LLM spans and traces versus OTEL. Is the divergence because of something intrinsic to the application or as a community are we reinventing the wheel?

Colab if you want to test it out: https://colab.research.google.com/github/Arize-ai/phoenix/bl...

3 comments