I see mentions in a lot of places of Cohen’s Kappa/Krippendorf’s alpha, Fleischer’s Kappa, Comparing to predefined ground truth, etc.
If you’re managing an annotation process in your organization, how do you evaluate your annotators, and what challenges have you faced in the process?
As a side note, is anyone using programmatic labeling in a real dataset? Thoughts?