Bracket Calculator Validation Report
We validate the ScrollVault Commander Bracket Calculator against an authoritative reference set of 36 decks — WotC-published Commander precons (Brackets 1–3) and community-canon cEDH archetypes (Bracket 5) — plus a separate 14-deck community-labeled cohort that stress-tests the bracket boundaries the authoritative sources cannot cover. Every deck has a clickable source URL in the tables below — you can audit every entry yourself.
Headline accuracy — 36-deck authoritative core set
Bracket-in-range
Bracket-±1
Bracket-exact
Power-in-range
The boundary stress-test cohort — 14 community-labeled decks
The core set's strength — every label traceable to an authoritative source — is also its limit: WotC publishes no canonical B1 ("just a theme pile") or B4 ("optimized but not cEDH") decklists, so the core set cannot measure those boundaries. In June 2026 we added a separate cohort of 14 real community decks (sourced from Archidekt with owner-stated bracket labels, re-judged by us against WotC's framework) specifically to probe the B1/B2 and B3/B4 edges — including decks chosen because they are hard calls. These labels are community judgment, not WotC canon, so we report this cohort separately and never blend it into the headline numbers.
| Cohort | N | In-range | Within-1 | Exact | Power-in-range |
|---|---|---|---|---|---|
| Authoritative core | 36 | 36/36 (100%) | 36/36 (100%) | 35/36 (97.2%) | 36/36 (100%) |
| Boundary cohort (community-labeled) | 14 | 14/14 (100%) | 14/14 (100%) | 10/14 (71.4%) | 14/14 (100%) |
| All decks | 50 | 50/50 (100%) | 50/50 (100%) | 45/50 (90%) | 50/50 (100%) |
Reading the split honestly: the engine stays 100% in-range across all 50 decks, but exact-match drops to 71.4% on the boundary cohort — exactly where it should drop, because those decks were picked for sitting on the framework's fuzzy edges (theme piles that play like Core decks, optimized lists a card short of cEDH). Where engine and community label disagree, the per-deck rows below let you judge for yourself.
Methodology
Reference deck sourcing
Every reference deck has a public source URL. We derive bracket assignments only from authoritative sources:
- WotC official precons — decklists from MTGJSON's canonical deck data, secondary URLs to WotC's announcements. Bracket assignment per WotC's Commander Brackets Beta framework: stock precons without Game Changers fall in B1–B2 (boundary fuzzy by design); stock precons with Game Changers are forced to B3 floor.
- cEDH archetype canonicals — sourced from the cEDH Decklist Database, which curates competitive-tier decks via community submission + curator review. By definition, any cEDH archetype is Bracket 5 per WotC's framework. The decklists are representative archetype lists, not specific tournament copies.
Audit methodology
We cross-checked every precon's mainboard against the bracket calculator's 53-card Game Changers list (stored in /tools/commander-bracket/bracket.js's GAME_CHANGERS constant, mirroring the WotC Feb 2026 update). One precon, AbzanArmor (Tarkir Dragonstorm Commander), contains Seedborn Muse, which is on the GC list. Per WotC's framework, any Game Changer forces a Bracket 3 floor — so AbzanArmor's expected_bracket = 3, not B1. This is documented in the reference data and matches the calculator's verdict.
cEDH provenance chain
Every cEDH reference deck is sourced via a two-link chain: cEDH Decklist Database (community-curated tier list of cEDH archetypes) → linked Moxfield primer (community-vetted decklist for that archetype). We fetched the canonical Moxfield decklist via api2.moxfield.com/v3/decks/all/<id> on 2026-05-06 and confirmed each list is exactly 100 cards. Each row's source ↗ link goes to the human-readable Moxfield primer page; you can verify the decklist is identical to ours.
Pass criteria
For each deck, we record three bracket-accuracy criteria:
- Bracket-in-range — predicted bracket ∈ [
expected_bracket_min,expected_bracket_max]. Primary metric. WotC's B1/B2 boundary is intentionally fuzzy, so stock precons get [1,2] range. - Bracket-±1 — predicted within 1 of
expected_bracketmidpoint. Secondary metric reported for comparability with industry tools (ScryCheck reports 80% bracket-exact, 92% bracket-±1). - Bracket-exact — predicted ===
expected_bracketmidpoint. Strictest. Affected by the inherent fuzziness of WotC's framework on stock precons. - Power-in-range — predicted power level ∈ [
expected_power_min,expected_power_max]. Independent check on the engine's continuous output.
Per-bracket accuracy (core set)
| Expected | N | In-range | Within-1 | Exact | Power-in-range |
|---|---|---|---|---|---|
| B2 | 11 | 11/11 (100%) | 11/11 (100%) | 10/11 (91%) | 11/11 (100%) |
| B3 | 6 | 6/6 (100%) | 6/6 (100%) | 6/6 (100%) | 6/6 (100%) |
| B4 | 3 | 3/3 (100%) | 3/3 (100%) | 3/3 (100%) | 3/3 (100%) |
| B5 | 16 | 16/16 (100%) | 16/16 (100%) | 16/16 (100%) | 16/16 (100%) |
Confusion matrix (core set)
Rows = expected bracket; columns = predicted bracket. Diagonal = exact match.
| Pred B1 | Pred B2 | Pred B3 | Pred B4 | Pred B5 | |
|---|---|---|---|---|---|
| Exp B1 | 0 | 0 | 0 | 0 | 0 |
| Exp B2 | 1 | 10 | 0 | 0 | 0 |
| Exp B3 | 0 | 0 | 6 | 0 | 0 |
| Exp B4 | 0 | 0 | 0 | 3 | 0 |
| Exp B5 | 0 | 0 | 0 | 0 | 16 |
Engine vs frontier LLMs
Independent cross-validation: each model was given the decklist plus WotC's bracket framework and the 53-card Game Changers list, and asked to assign a bracket and power score. The same 36 authoritative core reference decks were used for every column. Methodology and per-deck verdicts are in llm-validation-results.json.
| Metric | ScrollVault engine | claude-sonnet-4-6 | claude-opus-4-7 | claude-haiku-4-5-20251001 |
|---|---|---|---|---|
| Bracket-in-range | 100% (36/36) | 94.4% (34/36) | 94.4% (34/36) | 83.3% (30/36) |
| Bracket-±1 | 100% (36/36) | 100% (36/36) | 100% (36/36) | 100% (36/36) |
| Bracket-exact | 97.2% (35/36) | 94.4% (34/36) | 94.4% (34/36) | 83.3% (30/36) |
| Power-in-range | 100% (36/36) | 91.7% (33/36) | 83.3% (30/36) | 63.9% (23/36) |
Per-deck results — authoritative core (36 decks)
Every row links to the deck's source URL. Click "source ↗" to verify decklist + bracket assignment yourself.
| Deck ID | Name | Expected | Predicted | Verdict | Power | Power range | Tipping | Source |
|---|---|---|---|---|---|---|---|---|
wotc-precon-silverquillstatement-c21 |
Silverquill Statement | B1–B2 | B1 | ✓ in range | 1.1 | 1–5.5 | T4 | source ↗ |
wotc-precon-prismariperformance-c21 |
Prismari Performance | B1–B2 | B2 | ✓ in range | 3.7 | 1–5.5 | T5 | source ↗ |
wotc-precon-quantumquandrix-c21 |
Quantum Quandrix | B1–B2 | B2 | ✓ in range | 3.8 | 1–5.5 | T4 | source ↗ |
wotc-precon-witherbloomwitchcraft-c21 |
Witherbloom Witchcraft | B1–B2 | B2 | ✓ in range | 3.6 | 1–5.5 | T4 | source ↗ |
wotc-precon-loreholdlegacies-c21 |
Lorehold Legacies | B1–B2 | B2 | ✓ in range | 4.0 | 1–5.5 | T4 | source ↗ |
wotc-precon-abzanarmor-tdc |
Abzan Armor | B3 | B3 | ✓ in range | 6.3 | 5–7.5 | T3 | source ↗ |
wotc-precon-jeskaistriker-tdc |
Jeskai Striker | B1–B2 | B2 | ✓ in range | 4.3 | 1–5.5 | T3 | source ↗ |
wotc-precon-mardusurge-tdc |
Mardu Surge | B1–B2 | B2 | ✓ in range | 3.9 | 1–5.5 | T3 | source ↗ |
wotc-precon-sultaiarisen-tdc |
Sultai Arisen | B1–B2 | B2 | ✓ in range | 3.7 | 1–5.5 | T4 | source ↗ |
wotc-precon-temurroar-tdc |
Temur Roar | B1–B2 | B2 | ✓ in range | 3.7 | 1–5.5 | T4 | source ↗ |
wotc-precon-eternalmight-drc |
Eternal Might | B1–B2 | B2 | ✓ in range | 4.0 | 1–5.5 | T3 | source ↗ |
wotc-precon-livingenergy-drc |
Living Energy | B1–B2 | B2 | ✓ in range | 4.1 | 1–5.5 | T4 | source ↗ |
wotc-precon-counterblitzfinalfantasyx-fic |
Counter Blitz (FINAL FANTASY X) | B3 | B3 | ✓ in range | 6.7 | 5–7.5 | T3 | source ↗ |
wotc-precon-20waystowin-sld |
20 Ways to Win | B3 | B3 | ✓ in range | 7.0 | 5–7.5 | T3 | source ↗ |
wotc-precon-creativeenergy-m3c |
Creative Energy | B3 | B3 | ✓ in range | 6.5 | 5–7.5 | T4 | source ↗ |
wotc-precon-deadlydisguise-mkc |
Deadly Disguise | B3 | B3 | ✓ in range | 6.5 | 5–7.5 | T4 | source ↗ |
wotc-precon-deepcluesea-mkc |
Deep Clue Sea | B3 | B3 | ✓ in range | 6.2 | 5–7.5 | T4 | source ↗ |
cedh-kinnan-infinite-mana |
Kinnan Infinite Mana | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-thrasios-tymna-blue-farm |
Blue Farm (Thrasios+Tymna) | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-najeela-blade-blossom |
Najeela Combat Combo | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-tivit-stax |
Tivit Stax | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-rograkh-silas-turbo-naus |
Rograkh+Silas Turbo Ad Nauseam | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-kraum-tymna-breach |
Kraum+Tymna Breach (Blue Farm) | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-halana-tymna-hulk |
Halana+Tymna Flash Hulk | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-tana-tymna-turbo-naus |
Tana+Tymna Turbo Ad Nauseam | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-yuriko-tempo |
Yuriko Tempo | B5 | B5 | ✓ in range | 10.0 | 9–10 | T3 | source ↗ |
cedh-malcolm-tymna-esper-turbo |
Malcolm+Tymna Esper Turbo | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-sisay-tutors |
Sisay, Weatherlight Captain (cEDH) | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-kenrith-midrange |
Kenrith, the Returned King (cEDH) | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-korvold-sacrifice |
Korvold, Fae-Cursed King (cEDH) | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-krark-sakashima |
Krark / Sakashima (cEDH Storm) | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-rocco-tutor |
Rocco, Cabaretti Caterer (cEDH) | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-stella-lee |
Stella Lee, Wild Card (cEDH) | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
b4-urdragon-highpower |
The Ur-Dragon (High-Power Dragons) | B4 | B4 | ✓ in range | 8.7 | 6.5–9.5 | T4 | source ↗ |
b4-atraxa-superfriends-2022 |
Atraxa Superfriends | B4 | B4 | ✓ in range | 9.3 | 6.5–9.5 | T4 | source ↗ |
b4-atraxa-superfriends-primer |
Atraxa Superfriends (primer build) | B4 | B4 | ✓ in range | 9.2 | 6.5–9.5 | T3 | source ↗ |
Per-deck results — boundary stress-test cohort (14 decks)
Expected brackets in this table are community labels re-judged by us against WotC's framework — not authoritative canon. Source links go to the public Archidekt decklists.
| Deck ID | Name | Expected | Predicted | Verdict | Power | Power range | Tipping | Source |
|---|---|---|---|---|---|---|---|---|
arch-b1-gruulfriends-pia |
Gruulfriends | B1–B2 | B2 | ✓ in range | 3.7 | 1–5 | T4 | source ↗ |
arch-b1-progenitus-soup |
Bracket 1 Progenitus | B1–B2 | B2 | ✓ in range | 3.7 | 1–5 | T3 | source ↗ |
arch-b1-oops-all-jace |
Oops, All Jace! | B1–B2 | B2 | ✓ in range | 3.8 | 1–5 | T4 | source ↗ |
arch-b2-ognis-aggro |
Ognis, the Dragon's Lash: Speed Kills | B2–B3 | B2 | ✓ in range | 3.8 | 1.5–6 | T3 | source ↗ |
arch-b2-cant-block-this |
2. Can't Block This | B2–B3 | B2 | ✓ in range | 3.7 | 1.5–6 | T3 | source ↗ |
arch-b2-glissa-engine |
Glissa, the Traitor: The Engine of Betrayal | B2–B3 | B2 | ✓ in range | 3.9 | 1.5–6 | T3 | source ↗ |
arch-b2-commander-commander |
A "Commander" Commander Deck | B1–B2 | B2 | ✓ in range | 3.4 | 1–5.5 | T4 | source ↗ |
arch-b2-ffvii-budget |
FFVII Limit Break | $30 Upgrade | B2–B3 | B2 | ✓ in range | 4.3 | 1.5–6 | T3 | source ↗ |
arch-b3-baby-lasagna |
Baby Lasagna | B2–B3 | B3 | ✓ in range | 6.6 | 3.5–8 | T3 | source ↗ |
arch-b3-sauron-spellslinger |
$100 Sauron, The Dark Lord Spellslinger B2/B3 | B2–B3 | B3 | ✓ in range | 6.8 | 3.5–8 | T3 | source ↗ |
arch-b3-lathril-elves |
🧝♀️ It came from the woods | Lathril, Blade of the Elves | B2–B3 | B3 | ✓ in range | 6.7 | 3.5–8 | T3 | source ↗ |
arch-b3-meria-equipment |
Equipment are the best mana rocks | B2–B3 | B2 | ✓ in range | 4.0 | 3–7.5 | T2 | source ↗ |
arch-b3-goth-girl-tribal |
Goth Girl Tribal | B3 | B3 | ✓ in range | 7.1 | 4.5–8 | T3 | source ↗ |
wotc-precon-peaceoffering-blc |
Peace Offering (Bloomburrow Commander) | B2–B3 | B2 | ✓ in range | 3.9 | 1.5–6 | T3 | source ↗ |
Limits and honest framing
- The reference set is still small (36 authoritative + 14 community-labeled decks). We're expanding incrementally; larger sets reduce variance. The clearest signal is at B5 (cEDH, unambiguous per WotC) and the rules-crisp boundaries; the precon band is where the framework itself is fuzzy. The boundary cohort's labels are our judgment informed by deck-owner labels — they measure the fuzzy edges, and we report them separately for exactly that reason.
- B4 ("Optimized") coverage is 3 decks — rule-forced, not judgment-labeled. No public source — not WotC, not the cEDH Decklist Database, not any tournament site — publishes a canonical set of "Bracket 4" decks; WotC explicitly declines to provide example B4 decklists, and community B4 labels are interpretive. Rather than synthesize judgment-based B4 references, our B4 anchors are decks rule-forced to at least Bracket 4 by the Game Changer floor (4+ Game Changers exceeds Bracket 3's "up to three" allowance) while clearly not meeting B5/cEDH criteria (no cheap two-card infinite win). These validate correct application of the GC floor and the absence of spurious B5 promotion — not the subjective "exactly B4 power" call, which the framework itself leaves interpretive.
- B3 coverage is 6 decks. WotC-published precons carrying one or more Game Changers (forced to a B3 floor by the framework) plus higher-power Universes Beyond precons — the cleanest authoritative path to B3 references. We're expanding this further.
- B1 ("Exhibition") coverage in the core set is zero — and that is the honest state. Like B4, WotC publishes no canonical B1 decklist, and B1 is defined by intent ("winning is not the primary goal; highly thematic or substandard win conditions"), not by card choices. Our earlier B1 anchors were stock precons, which WotC's framework and independent tools (Moxfield, ScryCheck) place at B2; we re-based them rather than assert an uncited B1. The community-labeled boundary cohort above includes 3 owner-declared theme decks at B1 for exactly this reason — measured separately because those labels are judgment, not canon.
- cEDH decklists are canonical Moxfield primers from cEDH-DDB tier-list panels. Bracket assignment (B5) is unambiguous per WotC framework. The exact card-by-card list will vary across tournament copies — the primer is the community's reference build at
last_verifieddate. - The B1/B2 boundary is fuzzy by WotC's own design — and we grade precons to the authoritative default. WotC originally anchored "the average current preconstructed deck" at Bracket 2 (Core), then decoupled precons from a fixed bracket (Oct 2025: "precons span a range of power levels"; reaffirmed Feb 2026). Independent tools (Moxfield auto-bracket, ScryCheck) default stock no-Game-Changer precons to Bracket 2, and no authoritative source classifies them Bracket 1. We therefore set each stock precon's expected bracket to B2 with an in-range B1 floor (range [1,2]) and cite the basis per deck in reference-decks.json — so a B1 or a B2 verdict is in-range either way.
- "Exact" misses live entirely in the fuzzy precon band, not at the rules-crisp boundaries. The engine is 25/25 (100%) bracket-exact on every core deck with a single unambiguous expected bracket (B3 forced by a Game Changer; B5 cEDH). Core bracket-exact is 97.2%; the gap is stock precons landing B2 where our reference allows [1,2] — which WotC explicitly treats as a range, not an engine error. (The community-labeled boundary cohort is reported separately above.)
- We trend slightly high vs independent tools, and we show it. Cross-checked against Moxfield's independent bracket algorithm (see the cross-tool section below), our verdicts agree within ±1 the large majority of the time but lean marginally higher on optimization-heavy decks. We publish the disagreements rather than hide them.
- The B2/B3 line follows WotC exactly — synergy is not a bracket. Per the official framework, only two-card infinite combos (plus Game Changers, mass land denial, or chained extra turns) force Bracket 3. Three-or-more-card combos and high synergy density do not; a deck with one fragile 3-card combo and no Game Changers is correctly Bracket 2. This is the single most common source of "my deck should be higher" disputes, and the engine resolves it by the rules, not by feel.
Reproduce these results yourself
This validation is reproducible end-to-end. From a clone of the repo:
node scripts/build-reference-decks.cjs— fetches MTGJSON precon data + cEDH archetype lists intodata/reference-decks.jsonwith full provenance metadata.node scripts/run-validation.cjs— runs each deck through the live bracket calculator via Puppeteer (defaults to staging; pass--prodfor production).node scripts/render-validation-page.cjs— regenerates this page from the latest validation results.
Expected runtime: ~2 minutes for 50 decks (~7572ms per deck on this run).
What's next
- Expand the reference set toward 250+ decks. Priority: more cEDH archetypes (B5), more recent precons (B1–B3), authoritatively-tagged B3–B4 decks (community + tournament).
- Add automated CI: re-run validation on every bracket.js change. Keep accuracy honest as the engine evolves.
Browse the full precon library
Beyond this reference set, we've run every recent Commander precon through the same engine. Browse all 61 analyzed precons → — filter by bracket, set, or color identity. Each precon links to a full per-deck analysis with the same passport (bracket, power, Tipping Point) the calculator produces.
The methodology behind the metric
For the long-form story on how the engine produces the Tipping Point chip you see on every analysis — including the WASM Monte Carlo internals, comparison to Frank Karsten's land-count formula, and why no competing bracket calculator can replicate it — read "We Simulated 5 Million Mana Bases. Here's What We Learned About Tipping Points." →