Browse all 2 alternatives ranked side-by-side on this page.

Capability

Dual Metric Truthfulness And Informativeness Evaluation

2 artifacts provide this capability.

Want a personalized recommendation?

Find the best match →

Best tool for dual metric truthfulness and informativeness evaluation: TrustLLM
Total options: 2 artifacts

Top Matches

1

TrustLLMBenchmark63/100

via “truthfulness evaluation with misinformation, hallucination, and sycophancy detection”

8-dimension trustworthiness benchmark for LLMs.

Unique: Combines multiple factuality signals (internal consistency, external accuracy, hallucination, agreement bias) into a single truthfulness dimension. Uses mixed evaluation strategies: pattern matching for structured tasks, GPT-4 for open-ended grading, and deterministic metrics for reproducibility.

vs others: More comprehensive than single-metric factuality benchmarks (e.g., TruthfulQA alone) because it captures hallucination, sycophancy, and internal contradictions in addition to external factuality.

2

TruthfulQADataset56/100

via “dual-metric-truthfulness-and-informativeness-evaluation”

817 adversarial questions measuring model truthfulness vs misconceptions.

Unique: Decouples truthfulness from informativeness as independent evaluation dimensions rather than conflating them into single quality score; explicitly measures the dangerous failure mode of confident-sounding false answers (high informativeness, low truthfulness) which single-metric benchmarks miss

vs others: More nuanced than accuracy-only benchmarks (MMLU, TriviaQA) because it captures whether models generate plausible-sounding falsehoods or uninformative truths, addressing the safety-critical distinction between wrong answers and low-quality correct answers

Also Known As

dual-metric-truthfulness-and-informativeness-evaluation model-comparison-and-ranking-across-truthfulness-dimensions truthfulness evaluation with misinformation, hallucination, and sycophancy detection

Building an AI tool with “Dual Metric Truthfulness And Informativeness Evaluation”?

Submit your artifact →

Company

Agent? One curl.

curl unfragile.ai/agents.md | sh

nfragile