Methodology changelog
Every weight change, every grade-boundary tweak, and every recalibration cycle is logged below with a dated rationale. Scores produced before a methodology update remain valid as historical opinions under the methodology then in force; the methodology version applied to each score is recorded in the score report and verifiable against this rubric.
Material rubric changes are announced with a minimum 30-day notice period before the version label flips. Entries are added as recalibration cycles complete.
Versions
| Version | Date | Headline change | Rationale |
|---|---|---|---|
| v1 | (initial) | All weights are hand-picked initial hypothesis. | Pre-launch baseline; first recalibration scheduled at month 3 post-launch. See Calibration. |
| v0.5 | 2026-04-27 | Split the single-page methodology into per-dimension and foundations sub-pages (no rubric changes); added the Auditability foundations page documenting the reproducibility invariant that backs every Dokima score. | Documentation split plus public claim of the auditability invariant; no scoring behaviour changes. |
| v0.4 | 2026-04-27 | First substantive rubric expansion: Dim 1 gains four new format tiers (Wheel 3pt, Msgpack 8pt, OpenVINO 12pt, OpenNMT .ot 12pt) plus the long-tenured-namespace charity multiplier and the Picklescan disclaimer; Dim 2 ships as a real audit module with section detector and datasets-resolution tier ladder; Dim 3 gains the three-tier license: other distinction and the named "Open with vendor restrictions" allow-list; Dim 4 promoted to actual scoring; Dim 5 gains community-evaluation bonus; Dim 6 ships with Annex IV CO2 sub-rule and Art. 13 Model Card Contact sub-rule. | Consolidates findings from the April 2026 1000-model recon into rule changes across every dimension. |
| v0.3 | 2026-04-26 | Published the malware-tag coverage gap for Dim 1; refined the hard-fail wording to describe the collapsed "Shown for transparency only" disclosure that renders sub-dimensions when a hard-fail fires. | Honest disclosure of an empirically measured gap (250-model recon) plus precise wording on how rejected reports render. |
How to read this
The version label on a score report tells you which row of this table the score was computed under. If you scanned a model in March 2026 and it produced a v0.3 score, that score remains the canonical historical record of the model's trust profile at that point in time. A re-scan under a later methodology version may produce a different number; both records are preserved.
Disputing a score
If you believe a score is incorrect under the methodology version it was produced with, the appeals route lives at Disputes. Confirmed errors are corrected, logged here, and trigger an automatic rescan.