An expanded 15-expert panel — dovish, hawkish, and independent — scored across 113 past predictions and aggregated with 100,000 Monte Carlo draws. The consensus still points to prolonged quagmire: quick-win regime change is under 10%, and catastrophic spread, while real, is not the modal outcome.
Panel includes dovish, hawkish, and independent analysts to ensure ideological balance. Weighting is purely empirical: past accuracy determines influence on consensus.
Methodology at a glance
Each expert's track record is scored on a 5-point scale — TRUE (1.0), MOSTLY TRUE (0.75), PARTIAL (0.5), MOSTLY FALSE (0.25), FALSE (0.0) — and fed into a Beta-distributed posterior with a Jeffreys prior of Beta(0.5, 0.5). The posterior mean E[p] = α/(α+β) gives each expert's expected accuracy while preserving uncertainty: an analyst with five calls at 0.75 has a wider posterior than one with twenty calls at the same rate.
Scenario predictions are modelled as Dirichlet distributions over the four outcomes. The concentration parameter κ (range 6–16 across the panel) captures how confident and specific each expert's stated position is — higher κ for analysts with sharper public views, lower κ where their writings allow more variation. The probability vectors themselves are inferred from published writings, testimonies, and public statements, not elicited directly.
Monte Carlo aggregation (N = 100,000)
Each draw samples every expert's reliability from their Beta posterior, then samples their scenario prediction from their Dirichlet, and computes three aggregations: a linear pool (reliability-weighted average), a logarithmic pool (geometric weighted average), and an extremized correlation-adjusted estimate. All reported intervals are Highest Density Intervals at 80% and 95% — proper Bayesian intervals from the MC posterior, not normal approximations.
A 15×15 correlation matrix captures shared information between experts. The FDD cluster (Dubowitz, Gerecht, Ben Taleblu, Schanzer) shares institutional overlap at 0.45–0.60; the realist cluster (Mearsheimer, Sachs, Kinzer) shares framework overlap at 0.40–0.50; cross-ideology correlations sit at 0.05–0.25. Eigenvalue decomposition yields an effective number of independent experts of 10.13 out of 15, producing an extremizing factor d = 0.675 following Satopaa et al. (2014).
The 15-expert panel is deliberately balanced: seven dovish / anti-interventionist analysts (Mearsheimer, Sachs, Kinzer, Bajoghli, Vaez, Ritter, Crooke), six hawkish / interventionist analysts (Dubowitz, Gerecht, Rubin, Takeyh, Ben Taleblu, Schanzer), and two independents (Molyneux, Friedman). The model assigns no ideological weights — accuracy alone determines influence. If hawks had better track records, they would naturally dominate the consensus.
Expert reliability scores
The posterior mean E[p] is the weight each expert carries in aggregation. The ordering below is by accuracy, not ideology; realists and Iran-specialists cluster near the top, Iraq-era advocates near the bottom.
| Expert | Affiliation | Camp | N | Beta posterior | E[p] |
|---|---|---|---|---|---|
| Mearsheimer | U. Chicago (Realist) | Dovish | 8 | Beta(7.75, 1.25) | 0.861 |
| Vaez | Intl Crisis Group (ICG) | Dovish | 6 | Beta(5.75, 1.25) | 0.821 |
| Kinzer | Boston Univ / Author | Dovish | 7 | Beta(6.25, 1.75) | 0.781 |
| Bajoghli | Johns Hopkins (Ethnographer) | Dovish | 5 | Beta(4.50, 1.50) | 0.750 |
| Ben Taleblu | FDD Iran Program Dir. | Hawkish | 7 | Beta(5.75, 2.25) | 0.719 |
| Sachs | Columbia / UN Advisor | Dovish | 8 | Beta(6.00, 3.00) | 0.667 |
| Takeyh | CFR Sr Fellow | Hawkish | 8 | Beta(6.00, 3.00) | 0.667 |
| Schanzer | FDD Exec Director | Hawkish | 7 | Beta(5.25, 2.75) | 0.656 |
| Dubowitz | FDD (CEO) | Hawkish | 8 | Beta(5.75, 3.25) | 0.639 |
| Crooke | Conflicts Forum / Ex-MI6 | Dovish | 8 | Beta(5.50, 3.50) | 0.611 |
| Friedman | Geopolitical Futures | Other | 9 | Beta(5.75, 4.25) | 0.575 |
| Rubin | AEI / Middle East Forum | Hawkish | 8 | Beta(4.50, 4.50) | 0.500 |
| Ritter | Ex-UNSCOM Inspector | Dovish | 9 | Beta(3.75, 6.25) | 0.375 |
| Gerecht | FDD / Ex-CIA | Hawkish | 8 | Beta(3.00, 6.00) | 0.333 |
| Molyneux | Independent / YouTube | Other | 7 | Beta(1.00, 7.00) | 0.125 |
Ideological balance check
The accuracy gap between camps is moderate, not extreme, and it is driven primarily by two outliers: Gerecht (0.333) pulling hawks down, and Ritter (0.375) pulling doves down. The highest-accuracy hawk — Ben Taleblu at 0.719 — is comparable to mid-tier doves. The data, not the ideology, drives the weighting.
| Group | Count | Avg accuracy | Key observation |
|---|---|---|---|
| Dovish / Anti-Interventionist | 7 | 0.695 | Strong on intervention blowback |
| Hawkish / Interventionist | 6 | 0.586 | Strong on Iran technical threats |
| Other / Independent | 2 | 0.350 | Molyneux pulls the average down |
Selected prediction track record
The full ledger runs to 113 scored predictions. The table below highlights the calls that most strongly anchor each expert's posterior — the structural calls, the Iraq-era stress tests, and the Iran-specific technical forecasts that separate the panel's top and bottom tiers.
| Expert | Prediction | Year | Score | Verdict |
|---|---|---|---|---|
| Mearsheimer | NATO expansion would provoke Russian aggression vs Ukraine | 2014 | 1.00 | TRUE |
| Mearsheimer | Iraq War would be a strategic disaster / quagmire | 2003 | 1.00 | TRUE |
| Mearsheimer | Libya intervention would create a failed state / chaos | 2011 | 1.00 | TRUE |
| Mearsheimer | Ukraine conflict would become a prolonged war of attrition | 2022 | 1.00 | TRUE |
| Vaez | US withdrawal from JCPOA would accelerate enrichment | 2018 | 1.00 | TRUE |
| Vaez | Maximum pressure sanctions fail to change regime behavior | 2019 | 1.00 | TRUE |
| Vaez | Iran would accelerate nuclear program after JCPOA collapse | 2019 | 1.00 | TRUE |
| Kinzer | 1953 coup blowback pattern repeats in future interventions | 2003 | 1.00 | TRUE |
| Kinzer | Libya intervention would create chaos | 2011 | 1.00 | TRUE |
| Bajoghli | Sanctions hurt Iranian civilians more than regime elites | 2022 | 1.00 | TRUE |
| Bajoghli | Mahsa Amini protests = deep structural legitimacy crisis | 2022 | 0.75 | MOSTLY TRUE |
| Ben Taleblu | Iran would transfer drones / missiles to Russia after embargo | 2020 | 1.00 | TRUE |
| Ben Taleblu | Houthi missile capabilities pose a genuine regional threat | 2018 | 1.00 | TRUE |
| Sachs | Iraq sanctions cause humanitarian catastrophe w/o regime change | 1990s | 1.00 | TRUE |
| Sachs | Iraq War would destabilize the Middle East broadly | 2003 | 1.00 | TRUE |
| Takeyh | Regime multi-layered; bombing alone won't topple it | 2026 | 0.75 | MOSTLY TRUE |
| Takeyh | Air power can destroy nuclear / missile programs but Iran rebuilds | 2026 | 0.75 | MOSTLY TRUE |
| Dubowitz | Biden admin lax enforcement enabled Iran oil revenue surge | 2021–24 | 1.00 | TRUE |
| Dubowitz | Maximum pressure would bring Iran to the negotiating table | 2018 | 0.25 | MOSTLY FALSE |
| Rubin | Iraq regime change would benefit regional stability | 2002–03 | 0.00 | FALSE |
| Rubin | Chalabi as reliable US partner in post-Saddam Iraq | 2003 | 0.00 | FALSE |
| Rubin | Turkey under Erdogan increasingly hostile to US interests | 2015 | 1.00 | TRUE |
| Ritter | Iraq had no significant WMD stockpiles (pre-2003) | 2002 | 1.00 | TRUE |
| Ritter | Decisive Russian military victory in Ukraine in 2023 | 2023 | 0.25 | MOSTLY FALSE |
| Gerecht | Iraq War won't destabilize the Mideast | 2002 | 0.00 | FALSE |
| Gerecht | US invasion of Iraq would provoke democratic revolution in Iran | 2002 | 0.00 | FALSE |
| Molyneux | Western civilization collapse imminent due to immigration | 2015–19 | 0.00 | FALSE |
| Molyneux | Brexit would trigger EU collapse within years | 2016 | 0.00 | FALSE |
Selected excerpts; the complete ledger of 113 predictions is in the source report. Categories span Geopolitical, Military, Economic, Nuclear, and Regime outcomes.
Final scenario forecasts
The consensus forecast below is the extremized, correlation-adjusted posterior across 100,000 Monte Carlo draws. HDIs are Highest Density Intervals from the posterior sample.
| Scenario | Mean | Median | 80% HDI | 95% HDI | Std Dev |
|---|---|---|---|---|---|
| Quick Win / Regime Change | 9.2% | 8.9% | [5.4%, 12.2%] | [4.3%, 14.8%] | 2.8% |
| Prolonged Quagmire | 52.3% | 52.4% | [45.1%, 59.7%] | [41.2%, 63.4%] | 5.7% |
| Stalemate / De-escalation | 21.9% | 21.7% | [15.7%, 27.2%] | [13.5%, 31.1%] | 4.5% |
| Catastrophic Spread | 16.6% | 16.2% | [11.2%, 21.2%] | [9.2%, 24.5%] | 4.0% |
Model evolution: 4 → 10 → 15 experts
The panel has grown across three iterations. The 15-expert run re-balances the earlier, dove-heavy aggregations by explicitly adding hawkish voices — and the shifts, while real, are smaller than the rhetoric around them would suggest.
| Scenario | Original (4-Expert) | 10-Expert | 15-Expert | Δ (15 vs 10) |
|---|---|---|---|---|
| Quick Win / Regime Change | 25.0% | 5.2% | 9.2% | +4.0 pp |
| Prolonged Quagmire | 53.0% | 56.9% | 52.3% | −4.6 pp |
| Stalemate / De-escalation | 14.0% | 18.8% | 21.9% | +3.1 pp |
| Catastrophic Spread | 8.0% | 19.1% | 16.6% | −2.5 pp |
Key structural findings
At 52.3%, prolonged attrition is the strong modal outcome. Adding hawkish experts pulled it down only ~4.6 percentage points from the 10-expert model. Even the highest-accuracy hawks — Takeyh and Ben Taleblu — explicitly argue that air power alone cannot topple the Iranian regime.
From 5.2% in the 10-expert model to 9.2% here. Hawks like Dubowitz (30%) and Gerecht (25%) boost this, but their lower track-record accuracy limits their influence. The consensus probability is still under 10%.
From 18.8% to 21.9%. Several hawks — Rubin at 30%, Takeyh at 28% — assign meaningful probability to a negotiated off-ramp, reflecting policy sophistication even within interventionist frameworks.
Down from 19.1% to 16.6%. Hawks generally assign lower catastrophe probability (15–18%) than doves (20–30%), reflecting their assessment that Iranian conventional capabilities are degraded. The data partially validates this — but 16.6% is still the fourth-largest tail in a four-outcome partition.
Four FDD experts — Dubowitz, Gerecht, Ben Taleblu, Schanzer — share institutional correlations of 0.45–0.60. The correlation adjustment prevents their shared analytical framework from being counted four times. Effective independent experts: 10.13 out of 15.
Dovish experts average 0.695 accuracy versus hawkish 0.586. The gap is driven largely by the Iraq War era: hawkish analysts who predicted Iraq would stabilize or catalyze Iranian democracy were systematically wrong. On Iran-specific technical assessments — JCPOA flaws, missile threats, proxy networks — hawks have strong records.
Model parameters
| Parameter | Value |
|---|---|
| Monte Carlo draws | N = 100,000 |
| Prior | Jeffreys Beta(0.5, 0.5) |
| Total experts | 15 (7 dovish, 6 hawkish, 2 other) |
| Total scored predictions | 113 |
| Correlation matrix | 15×15 with FDD cluster (0.45–0.60), realist cluster (0.40–0.50) |
| Effective independent experts | 10.13 out of 15 (eigenvalue method) |
| Extremizing factor d | 0.675 (Satopaa et al. 2014) |
| Dirichlet κ range | 6–16 (confidence parameter per expert) |
| Aggregation methods | Linear Pool, Logarithmic Pool, Extremized (corr-adj) |
| HDI credibility levels | 80% and 95% |
| Scoring scale | TRUE = 1.0, MOSTLY TRUE = 0.75, PARTIAL = 0.5, MOSTLY FALSE = 0.25, FALSE = 0.0 |
| Random seed | 42 (for reproducibility) |
Limitations
What the model does not capture
- Subjective scoring. Different analysts could reasonably assign different verdict scores to the same prediction. The 5-point scale mitigates this versus binary scoring, but does not eliminate it.
- Inferred, not elicited. Scenario probability vectors are inferred from public statements; experts have not been surveyed to confirm the distributions attributed to them.
- Qualitative correlation matrix. The 15×15 matrix is constructed from assessment of shared sources and institutional affiliations, not estimated from data.
- Iraq-era shadow. Several hawkish experts — Gerecht, Rubin — have track records heavily shaped by 2002–03 calls. Whether their post-Iraq analytical evolution is captured in the static scoring is debatable.
- No temporal discount. All past predictions are equally weighted regardless of recency or difficulty. A more sophisticated model might discount older or easier calls.
- Living-conflict caveat. Ground truth for the four scenarios has not yet been determined. The consensus reflects expert-aggregated probabilities as of March 1, 2026, not certainty about outcomes.
Source: Bayesian Expert Aggregation — US–Israel vs Iran Conflict Scenario Forecasting, Unmitigated Wisdom, report dated March 1, 2026. Expanded 15-Expert Panel · 113 Scored Predictions · 100,000 Monte Carlo Draws. The full PDF is available via the Download link in the header.