Scoring Methodology

Seven dimensions, 100 points total. Scores are cached for 24 hours.

v1.0Updated April 2026

Dokima publishes its scoring dimensions and weights in full. We keep implementation-level heuristics (exact string matching, edge-case handling) as internal detail; these change frequently as we improve detection quality. Disputed scores can be appealed via the appeals process.

Grade mapping

A+
85–100
Exemplary
A
70–84
Strong
B
55–69
Good
C
40–54
Fair
D
25–39
Weak
F
0–24
Poor
Hard fail override

An active safety flag from the Hugging Face platform scanner results in an automatic F regardless of all other dimension scores. This cannot be overridden and must be resolved at the platform level before a passing score is achievable.

Calibration cadence

Quarterly minimum viable recalibration runs against the highest gameability dimensions: Serialisation safety, Namespace provenance, Regulatory alignment. Per-dimension distribution drift alerts trigger manual review of any dimension regardless of cadence. A full seven dimension sweep fires when any drift alert fires. Weight changes are versioned, announced with a minimum 30 day notice period, and do not retroactively alter historical score records.

More details →
Disputing a score

Model authors or users who believe a score is incorrect may raise a dispute through the support page. Include the model identifier, the affected dimension, and supporting evidence. Disputes are reviewed within 5 business days. Confirmed errors are corrected, logged in the changelog, and trigger an automatic rescan.

More details →

Foundations

The cross-cutting policies and discipline that the per-dimension rubric depends on.

Baselines
Population baselines that shape what each dimension means in context.
Calibration
Quarterly recalibration cadence, drift alerts, and minimum scope.
Disputes
Appeals process, customer contact SLAs, and erasure.
Auditability
Reproducibility invariants that make every Dokima score independently re-derivable.
Known-malicious registry
Curated list of flagged model identifiers + weight fingerprints; how seed entries land and how disputes are handled.
Coverage disclosure
What the public test coverage badge measures and does not measure.
Per-tier coverage
What each tier gets per scan: priority lane, freshness policy, and scan depth.
Competitive positioning
How Dokima compares to peer services, and what Hugging Face itself does not score.
Methodology changelog

Every weight change, every grade-boundary tweak, every published rationale.

View →

Disclaimer. A high Dokima score is not a warranty of safety, security, or fitness for purpose. Dokima scores metadata and documentation signals available through the Hugging Face public API. It does not download model weights, run inference, or perform dynamic analysis. Users remain responsible for their own due diligence before deploying any model in production. Scores reflect conditions at the time of the last scan and may not reflect subsequent repository changes.