SMELL addresses this by putting human feedback at the core of the evaluation process. It scales up a small set of human-provided feedback into evaluators that reflect the standards and nuances of specific industries or use-cases. Instead of a one-size-fits-all approach, you get evaluations that actually align with human judgment in those areas.
If you're curious to try it out, we've made it easy by offering both a notebook and a hosted API so you can test SMELL with your own LLMs and datasets:
- Notebook: https://colab.research.google.com/drive/1wCRwU5KQvnRSDxkubU9... - Hosted API: https://smell.quotientai.co/
Check out the blog post for more details: https://www.quotientai.co/post/subject-matter-expert-languag...
We are in the process in writing up the findings into a paper, and are planning to provide the full details on SMELL (incl. prompts).
If you’re interested in building a custom judge tailored to your specific use case, or if you'd like to contribute to our research, we'd love to collaborate! You can share your datasets with us at [email protected]. We'll publish results based on the data you provide, with full attribution and recognition of your contributions.
In the meantime, we'd love to hear your feedback and see what you think!