Key findings:
Query-Temperature Mapping:
Factual queries (math, logic) → 0.0-0.1 Technical writing → 0.2-0.5 Conversational → 0.6-1.0 Creative tasks → 1.1-1.5 Exploratory thinking → 1.6-2.0
Distribution Analysis: Most queries actually benefit from non-standard temperatures:
26.4% perform best with technical precision (0.2-0.5) 23.5% with standard balanced settings (0.6-1.0) 18.6% with strict deterministic settings (0.0-0.1) 31.6% with higher creative settings (>1.0)
Implementation: We use Round-Trip Consistency (RTC) testing: generate a response, create an alternate query targeting similar content, generate second response, measure semantic similarity. This provides an automated way to evaluate temperature effectiveness without human intervention.
The classifier learns continuously from query-response patterns. In production, this reduced "hallucinations" by 42% for factual queries while improving creativity scores by 35% for open-ended tasks.
Technical details and implementation: https://github.com/codelion/adaptive-classifier
We're particularly interested in feedback from others who've dealt with temperature optimization at scale.