← Back to methodology index

Dimension 5 — Safety and bias evaluations (15 points)

Assesses whether the model has undergone documented evaluation against named benchmarks. The presence of evaluations does not verify their quality — it only signals that the author considered structured evaluation a relevant concern.

SignalPoints
Evaluations documented (model-index OR .eval_results/*.yaml with at least one numeric metric)6
Community-provided evaluation bonus (per eval, capped)+1 each, +2 cap
Future sub-signals (bias evaluations, external audit references)shipping in subsequent recalibration

Source — Dokima reads two formats in parallel (v0.4 onwards). Evaluations on Hugging Face appear in two places:

  1. The legacy model-index block, either at the top level of the model metadata or inside cardData.model-index. This is the format every model that documents evaluations today uses.
  2. The newer .eval_results/*.yaml files committed to the repo's .eval_results/ directory. Hugging Face documents this format on the Hub docs (linked at the bottom of this section) but explicitly marks it "work in progress feature". Adoption is around 0.4 percent of the recon corpus; growing slowly.

Dokima reads both formats and merges them. A metric that appears in both formats counts in both — Dokima does not deduplicate at v0.4 because the two formats serve different purposes (the legacy format is author-authored; the YAML format admits community-provided entries with a verifyToken).

Defensive YAML parsing. Hugging Face's .eval_results/*.yaml schema is documented but partial — the benchmark allow-list itself is not publicly enumerated (the docs page describes it as "Beta — Get in touch"). Dokima parses the documented fields (model, results[].task_id, results[].dataset_id, results[].metric, results[].value, optional verifyToken, optional source) and emits a EvalResultsUnknownShape drift flag whenever a YAML file carries top-level keys we do not recognise OR fails to parse outright. The drift flag captures schema drift as Hugging Face firms up the format; the audit still consumes the recognised fields without crashing the scan. Methodology page updates land here when HF publishes a more complete schema.

Community-provided evaluations are a separate sub-signal (v0.4 onwards). When a YAML eval file carries source: community, it represents a third party who ran the evaluation and submitted it via PR rather than the model author. This is structurally analogous to a third-party security audit — the evaluator has skin in the game (the PR is public; bad-faith evaluations get pushed back) and the model author cannot author the result on themselves. Dokima awards a one-point bonus per community-provided evaluation, capped at two points total, on top of the existing 6-point presence credit. The cap prevents gaming via "claim 50 community evals for an extra 50 points"; the per-eval mechanic rewards models that have multiple independent community evaluators rather than a single one.

Hugging Face's own documentation notes that community PRs can be opened AND closed by the model author at will, which means an author can dispute and remove a community evaluation. Dokima reads the merged state — closed PRs are not visible to the API and therefore do not contribute. The community provided badge is the canonical signal; we score what HF publishes.

Anti-gameability — the suspicious-metric flag. When a published evaluation reports a metric value above 0.99 on a known benchmark name AND that value is in the probability range (0.0 to 1.0 inclusive), Dokima records a suspicious_metric flag on the score (visible in the score JSON's drift_flags array). The list of monitored benchmark names lives in the private detection layer per the open-core split, so the flag fires only on production builds.

The April 2026 recon found that 92.7 percent of model-index metrics across 37 thousand entries were above 0.99 — far higher than any reasonable distribution of real benchmark scores. On inspection, this was a category error: many model-index entries hold counts (download numbers, parameter counts), durations, or unbounded percentages, NOT probabilities. The earlier rule that flagged any value above 0.99 was firing on essentially every metric. The v0.4 fix scopes the flag to two conditions that must both hold: the metric name appears in the published probability vocabulary AND the value is at most 1.0 (above 1.0 means the metric is structurally not a probability; the suspicious-metric heuristic does not apply).

Population baseline disclosures. See the dedicated Baselines page for the recon-derived rates (the 93 percent zero-model-index rate, the 28 percent missing-library_name, etc.). Each base rate shapes what a Dim 5 score actually means in context.

Future sub-signals (planned for the next recalibration). Two pieces are planned but not shipping in v0.4:

  1. Library-hint and task-hint fallback chains. When library_name is missing, Dokima would fall back to tags[] containing library hints (vllm, transformers, peft, sentence-transformers, diffusers, etc.); when pipeline_tag is missing, fall back to task-hint tags (text-generation, text-classification, etc.). The chains land when there is a Dim 5 sub-signal that needs the fallback values; for v0.4 the existing presence check does not require either signal so the fallback chains are deferred.
  2. Bias-evaluation and external-audit-reference sub-signals. Reserved 9 of the 15 dimension points for these. The presence-check sub-signal alone awards 6 of 15 today; the remaining 9 ship in a subsequent recalibration once the recon corpus produces enough population data to lock the per-sub-signal weights without overfitting.

Documentation reference. Hugging Face's eval-results format documentation: https://huggingface.co/docs/hub/en/eval-results.