Methodology — Dokima

Scoring Methodology

Seven dimensions, 100 points total. Scores are cached for 24 hours.

v1.0Updated April 2026

Dokima publishes its scoring dimensions and weights in full. We keep implementation-level heuristics (exact string matching, edge-case handling) as internal detail; these change frequently as we improve detection quality. Disputed scores can be appealed via the appeals process.

The file format determines whether loading a model can run code on the host machine. Safer formats only contain numbers; risky formats can execute arbitrary instructions.

Signal

Points

Notes

SafeTensors

Numbers only; no code path at load

ONNX

Compiled graph; no arbitrary execution

GGUF

Read only binary; widely audited

H5 / Keras (safe layers)

No executable layers detected

H5 / Keras (with code layers)

Custom code layers present

PyTorch .pt / .bin

Pickle based; code runs on load

Raw Pickle

Highest risk format

Platform safety flag active

0 (hard fail)

Overrides all other dimensions

More details →

The model card is the publisher's primary documentation surface and is referenced directly in EU AI Act Article 13. Each section is checked for real content, not placeholder headings.

Signal

Points

Notes

Model card present

A documentation file exists

Intended use stated

Real content, not placeholder

Known limitations stated

Failure modes and edge cases

Training data described

Sources and date range disclosed

Evaluation results published

Documented outcomes, not assertions

Bias and ethical considerations

Fairness and risk addressed

More details →

An unclear or missing licence creates legal risk for anyone using the model commercially. We score whether a licence is declared and whether its full terms are accessible.

Signal

Points

Notes

Permissive open licence (MIT, Apache 2.0, BSD)

Free for commercial use

Proprietary, clearly stated

Terms known and accessible

Copyleft with restrictions (CC BY NC SA)

Restricts commercial use

Custom licence, full text provided

Accessible but non standard

Licence referenced, text inaccessible

Terms cannot be reviewed

Identifier present, unrecognised

Does not match a known licence

No licence specified

Use at legal risk

More details →

Account history and verification status indicate whether the publisher is who they claim to be. A reused or freshly created namespace is a documented route for distributing tampered models.

Signal

Points

Notes

Verified organisation

Platform verified identity

Established account, substantial history

Long activity record

Newer account, established history

Active with several models

Newer account, limited history

Created within the last year

Very recently created account

Created within the last few weeks

Hijack pattern matched

Matches a known hijacking signature

More details →

The EU AI Act requires AI systems offered in Europe to publish specific transparency information. We check for the five categories Article 13 names.

Signal

Points

Notes

Intended purpose clearly stated

What the model is designed to do

Performance limitations documented

Where the model is expected to fail

Human oversight guidance provided

How a person should review outputs

Technical specification accessible

Architecture, inputs, outputs disclosed

Contact point documented

Compliance enquiries can reach the publisher

More details →

A trustworthy model usually leaves a trail beyond the model registry. We look for community activity, external references such as GitHub or research papers, and any third party security flags.

Signal

Points

Notes

Community engagement

Active maintainer and user conversation

External provenance

Source repo and research backing

Cross platform attestation

No external security flags raised

Deep documentation in development.

Grade mapping

A+

85–100

Exemplary

70–84

Strong

55–69

Good

40–54

Fair

25–39

Weak

0–24

Poor

Hard fail override

An active safety flag from the Hugging Face platform scanner results in an automatic F regardless of all other dimension scores. This cannot be overridden and must be resolved at the platform level before a passing score is achievable.

Calibration cadence

Quarterly minimum viable recalibration runs against the highest gameability dimensions: Serialisation safety, Namespace provenance, Regulatory alignment. Per-dimension distribution drift alerts trigger manual review of any dimension regardless of cadence. A full seven dimension sweep fires when any drift alert fires. Weight changes are versioned, announced with a minimum 30 day notice period, and do not retroactively alter historical score records.

More details →

Disputing a score

Model authors or users who believe a score is incorrect may raise a dispute through the support page. Include the model identifier, the affected dimension, and supporting evidence. Disputes are reviewed within 5 business days. Confirmed errors are corrected, logged in the changelog, and trigger an automatic rescan.

More details →

Foundations

The cross-cutting policies and discipline that the per-dimension rubric depends on.

Baselines

Population baselines that shape what each dimension means in context.

→

Calibration

Quarterly recalibration cadence, drift alerts, and minimum scope.

→

Disputes

Appeals process, customer contact SLAs, and erasure.

→

Auditability

Reproducibility invariants that make every Dokima score independently re-derivable.

→

Known-malicious registry

Curated list of flagged model identifiers + weight fingerprints; how seed entries land and how disputes are handled.

→

Coverage disclosure

What the public test coverage badge measures and does not measure.

→

Per-tier coverage

What each tier gets per scan: priority lane, freshness policy, and scan depth.

→

Competitive positioning

How Dokima compares to peer services, and what Hugging Face itself does not score.

→

Methodology changelog

Every weight change, every grade-boundary tweak, every published rationale.

View →

Disclaimer. A high Dokima score is not a warranty of safety, security, or fitness for purpose. Dokima scores metadata and documentation signals available through the Hugging Face public API. It does not download model weights, run inference, or perform dynamic analysis. Users remain responsible for their own due diligence before deploying any model in production. Scores reflect conditions at the time of the last scan and may not reflect subsequent repository changes.