Methodology changelog

Every weight change, every grade-boundary tweak, and every recalibration cycle is logged below with a dated rationale. Scores produced before a methodology update remain valid as historical opinions under the methodology then in force; the methodology version applied to each score is recorded in the score report and verifiable against this rubric.

Material rubric changes are announced with a minimum 30-day notice period before the version label flips. Entries are added as recalibration cycles complete.

Versions

Version	Date	Headline change	Rationale
v1	(initial)	All weights are hand-picked initial hypothesis.	Pre-launch baseline; first recalibration scheduled at month 3 post-launch. See Calibration.
v0.5	2026-04-27	Split the single-page methodology into per-dimension and foundations sub-pages (no rubric changes); added the Auditability foundations page documenting the reproducibility invariant that backs every Dokima score.	Documentation split plus public claim of the auditability invariant; no scoring behaviour changes.
v0.4	2026-04-27	First substantive rubric expansion: Dim 1 gains four new format tiers (Wheel 3pt, Msgpack 8pt, OpenVINO 12pt, OpenNMT `.ot` 12pt) plus the long-tenured-namespace charity multiplier and the Picklescan disclaimer; Dim 2 ships as a real audit module with section detector and datasets-resolution tier ladder; Dim 3 gains the three-tier `license: other` distinction and the named "Open with vendor restrictions" allow-list; Dim 4 promoted to actual scoring; Dim 5 gains community-evaluation bonus; Dim 6 ships with Annex IV CO2 sub-rule and Art. 13 Model Card Contact sub-rule.	Consolidates findings from the April 2026 1000-model recon into rule changes across every dimension.
v0.3	2026-04-26	Published the malware-tag coverage gap for Dim 1; refined the hard-fail wording to describe the collapsed "Shown for transparency only" disclosure that renders sub-dimensions when a hard-fail fires.	Honest disclosure of an empirically measured gap (250-model recon) plus precise wording on how rejected reports render.

How to read this

The version label on a score report tells you which row of this table the score was computed under. If you scanned a model in March 2026 and it produced a v0.3 score, that score remains the canonical historical record of the model's trust profile at that point in time. A re-scan under a later methodology version may produce a different number; both records are preserved.

Disputing a score

If you believe a score is incorrect under the methodology version it was produced with, the appeals route lives at Disputes. Confirmed errors are corrected, logged here, and trigger an automatic rescan.