Show HN: Dokimos – LLM Evaluation Framework for Java

2 points

5 months ago

I tried building LLM applications in Java and wanted the evaluation experience I know from the Python ecosystem: datasets, experiments, built in metrics, tracking results over time.

Dokimos brings that to Java. It's a framework for evaluating LLM outputs with:

- Built in evaluators for both LLM-based and traditional metrics - Dataset support (JSON, CSV, or programmatic) - JUnit and CI/CD integration so evaluations run as parameterized tests alongside your existing test suite - Experiment tracking with aggregated metrics and export to multiple formats - Optional server for viewing results over time

It integrates with LangChain4j and Spring AI, but works with any LLM client on your local machine.

The goal is to make evaluating LLM applications feel like a natural part of Java development. Define your test cases, create or generate datasets, pick your evaluators, run in CI, catch regressions.

GitHub: https://github.com/dokimos-dev/dokimos Docs: https://dokimos.dev/overview