Every Dokima score is auditable
Dokima is built so that every score it produces is independently reproducible — given the same Hugging Face metadata, the same methodology version, and the same weights configuration, two scans of the same model produce byte-identical Verdict JSON. This is a load-bearing property for regulated-AI-deployer audiences (banks, insurance, defence, healthcare) who need to be able to audit how a third-party scoring tool reached a verdict on a model they intend to deploy.
This page documents the invariants that make the property true and the path a third party can take to verify it.
What "auditable" means
For each score Dokima publishes, the following are independently reproducible without any access to Dokima's internal infrastructure:
- The score itself. Re-running the open-source
dokimaCLI against the same model at the same Hugging Face commit SHA produces the samescore_total, the same per-dimension breakdown, the same drift flags, and the same grade. - The methodology applied. Each Verdict is stamped with
methodology_version(the v0.X version published in this documentation) so the rubric in effect at scan time is unambiguous. - The weights applied. Each Verdict is stamped with
weights_sha256— the SHA-256 of theweights.tomlconfiguration file consumed by the engine. Two scans with identical SHA were scored under identical weights. - The serialisation order. Every map in the Verdict JSON is keyed in canonical order (the engine uses
BTreeMapat every serialisation boundary), so re-scoring on a different machine or tomorrow does not produce a JSON that differs in field order.
The invariants
These are the load-bearing rules from the project robustness discipline. Each is enforced in code:
| Invariant | What it guarantees | Where it's enforced |
|---|---|---|
| Wall-clock timestamps captured once per scan and reused | A multi-step audit pipeline does not see two timestamps for the same scan | dokima-scoring/src/audit.rs AuditRegistry::run captures Utc::now() once and reuses it |
| HashMap iteration order must not leak into stored output | Two runs of the same scan produce byte-identical JSON | Every serialisation path uses BTreeMap, not HashMap |
| Idempotent score persistence | Two writes of the same (model_id, commit_sha) are a no-op | ScoreStore::put contract; SurrealDB upsert |
| Cache key includes the commit SHA | Cached content is immutable per SHA | ScoreKey::as_cache_key returns score::{author}/{model}@{commit_sha} |
| Methodology version stamp on every Verdict | The rubric in effect at scan time is unambiguous | Verdict::Scored.methodology_version |
| Weights SHA-256 stamp on every Verdict | The weights in effect at scan time are unambiguous | Verdict::Scored.weights_sha256 |
An explicit canary test runs the in-process scan 100 times against a fixture model and asserts the JSON is byte-identical across every run (with scanned_at masked because that field is by-design the time of the scan, not a function of the data). The test lives at crates/dokima-cli/tests/scan_reproducibility.rs and runs as part of the standard cargo test workspace gate.
How to verify a score independently
The verification path uses the open-source dokima CLI and the Hugging Face fixture corpus that ships with the repo:
# 1. Clone the public Dokima repo (engine + stubs are AGPL-3.0-or-later).
git clone https://github.com/The-Malware-Files/dokima
cd dokima
# 2. Build the CLI.
cargo build --release -p dokima-cli
# 3. Score a model offline against the bundled fixture corpus.
./target/release/dokima scan fixtures/safetensors-only-model \
--replay crates/dokima-hf-client/fixtures \
--json > my-verdict.json
# 4. Compare against the same fixture's reference verdict.
diff my-verdict.json crates/dokima-cli/fixtures/expected-verdicts/safetensors-only.json
For a live model rather than the fixture corpus, set HUGGING_FACE_READ_KEY and run:
./target/release/dokima scan {author}/{model} --json
Two scans of the same {author}/{model}@{commit_sha} will produce the same Verdict (modulo scanned_at).
What is NOT auditable
Honesty about the open-core split (see also coverage-disclosure.md):
- The hijacking-signature implementations in the private
model-security-corecrate are NOT publicly auditable. Their categories are documented on the Dim 4 page, but the specific signature definitions stay private to prevent adversarial route-around-testing. - The curated benchmark vocabulary for Dim 5's suspicious-metric flag is similarly private.
- The disclosed-malicious-model denylist (the malware-tag coverage gap mitigation) is monthly-reviewed but not publicly published; the input sources (JFrog, ProtectAI, ReversingLabs, etc.) are.
Public-engine-only OSS builds run with stub implementations of these private surfaces. The Verdict JSON from a public OSS build is structurally identical to a production Verdict; the difference is the production binary may emit additional drift flags from the private detection layer that the OSS build's stubs cannot. Both are auditable per the invariants above; the production build's additional flags trace back to the private crate's own internal review process.
Why this matters
Lighthouse-style scoring tools that produce a single number without a reproducibility trail are useful for product feedback but cannot survive an audit conversation. A regulated AI deployer pilot conversation that opens with "we ran this scoring tool and our model scored X" needs to be able to answer "show us how X was computed" without depending on the scoring vendor's continued availability or goodwill. The invariants above are what make that answer possible.
This is the Dokima moat for regulated-AI-deployer prospects: open methodology + reproducible-by-design + content-addressable cache + offline replay + version stamps. No competitor scoring tool currently ships all five.