Capability
Consensus Based Annotation Workflows With Quality Scoring
20 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “human quality rating aggregation with inter-annotator agreement metrics”
161K human-written messages in 35 languages with quality ratings.
Unique: Provides raw per-annotator ratings alongside aggregates, enabling downstream systems to compute custom agreement metrics and weight examples by confidence rather than using fixed aggregation. Most datasets only expose final scores.
vs others: Richer annotation metadata than single-rater datasets (e.g., Alpaca) or datasets with binary labels, allowing nuanced quality-based filtering and confidence-weighted training.