Show HN: Smell – A framework for aligning LLM evaluators to human feedback

5 points

2 years ago

We've built SMELL (Subject-Matter Expert Language Liaison), a new framework that combines human expertise with LLMs to create feedback-informed, domain-specific LLM evaluators. One of the biggest issues with current evaluation methods (heuristics, assertions, LLM-as-a-judge etc.) is that it's difficult for them to match up with and capture human preferences.

SMELL addresses this by putting human feedback at the core of the evaluation process. It scales up a small set of human-provided feedback into evaluators that reflect the standards and nuances of specific industries or use-cases. Instead of a one-size-fits-all approach, you get evaluations that actually align with human judgment in those areas.

If you're curious to try it out, we've made it easy by offering both a notebook and a hosted API so you can test SMELL with your own LLMs and datasets:

- Notebook: https://colab.research.google.com/drive/1wCRwU5KQvnRSDxkubU9... - Hosted API: https://smell.quotientai.co/

Check out the blog post for more details: https://www.quotientai.co/post/subject-matter-expert-languag...

We are in the process in writing up the findings into a paper, and are planning to provide the full details on SMELL (incl. prompts).

If you’re interested in building a custom judge tailored to your specific use case, or if you'd like to contribute to our research, we'd love to collaborate! You can share your datasets with us at [email protected]. We'll publish results based on the data you provide, with full attribution and recognition of your contributions.

In the meantime, we'd love to hear your feedback and see what you think!