Bayesian Expert Aggregation · 15-Expert Panel on US–Israel vs Iran

An expanded 15-expert panel — dovish, hawkish, and independent — scored across 113 past predictions and aggregated with 100,000 Monte Carlo draws. The consensus still points to prolonged quagmire: quick-win regime change is under 10%, and catastrophic spread, while real, is not the modal outcome.

Experts scored7 dovish · 6 hawkish · 2 independent

113

Scored predictions

100k

Monte Carlo draws

52.3%
Prolonged quagmireModal scenario · 80% HDI [45.1, 59.7]

21.9%

Stalemate / de-escalation

16.6%

Catastrophic spread

9.2%

Quick win · regime change

10.13

Effective independent expertsout of 15 · after correlation adjustment

Panel includes dovish, hawkish, and independent analysts to ensure ideological balance. Weighting is purely empirical: past accuracy determines influence on consensus.

Methodology at a glance

Each expert's track record is scored on a 5-point scale — TRUE (1.0), MOSTLY TRUE (0.75), PARTIAL (0.5), MOSTLY FALSE (0.25), FALSE (0.0) — and fed into a Beta-distributed posterior with a Jeffreys prior of Beta(0.5, 0.5). The posterior mean E[p] = α/(α+β) gives each expert's expected accuracy while preserving uncertainty: an analyst with five calls at 0.75 has a wider posterior than one with twenty calls at the same rate.

Scenario predictions are modelled as Dirichlet distributions over the four outcomes. The concentration parameter κ (range 6–16 across the panel) captures how confident and specific each expert's stated position is — higher κ for analysts with sharper public views, lower κ where their writings allow more variation. The probability vectors themselves are inferred from published writings, testimonies, and public statements, not elicited directly.

Monte Carlo aggregation (N = 100,000)

Each draw samples every expert's reliability from their Beta posterior, then samples their scenario prediction from their Dirichlet, and computes three aggregations: a linear pool (reliability-weighted average), a logarithmic pool (geometric weighted average), and an extremized correlation-adjusted estimate. All reported intervals are Highest Density Intervals at 80% and 95% — proper Bayesian intervals from the MC posterior, not normal approximations.

A 15×15 correlation matrix captures shared information between experts. The FDD cluster (Dubowitz, Gerecht, Ben Taleblu, Schanzer) shares institutional overlap at 0.45–0.60; the realist cluster (Mearsheimer, Sachs, Kinzer) shares framework overlap at 0.40–0.50; cross-ideology correlations sit at 0.05–0.25. Eigenvalue decomposition yields an effective number of independent experts of 10.13 out of 15, producing an extremizing factor d = 0.675 following Satopaa et al. (2014).

Ideological balance by design

The 15-expert panel is deliberately balanced: seven dovish / anti-interventionist analysts (Mearsheimer, Sachs, Kinzer, Bajoghli, Vaez, Ritter, Crooke), six hawkish / interventionist analysts (Dubowitz, Gerecht, Rubin, Takeyh, Ben Taleblu, Schanzer), and two independents (Molyneux, Friedman). The model assigns no ideological weights — accuracy alone determines influence. If hawks had better track records, they would naturally dominate the consensus.

Expert reliability scores

The posterior mean E[p] is the weight each expert carries in aggregation. The ordering below is by accuracy, not ideology; realists and Iran-specialists cluster near the top, Iraq-era advocates near the bottom.

Expert	Affiliation	Camp	N	Beta posterior	E[p]
Mearsheimer	U. Chicago (Realist)	Dovish	8	Beta(7.75, 1.25)	0.861
Vaez	Intl Crisis Group (ICG)	Dovish	6	Beta(5.75, 1.25)	0.821
Kinzer	Boston Univ / Author	Dovish	7	Beta(6.25, 1.75)	0.781
Bajoghli	Johns Hopkins (Ethnographer)	Dovish	5	Beta(4.50, 1.50)	0.750
Ben Taleblu	FDD Iran Program Dir.	Hawkish	7	Beta(5.75, 2.25)	0.719
Sachs	Columbia / UN Advisor	Dovish	8	Beta(6.00, 3.00)	0.667
Takeyh	CFR Sr Fellow	Hawkish	8	Beta(6.00, 3.00)	0.667
Schanzer	FDD Exec Director	Hawkish	7	Beta(5.25, 2.75)	0.656
Dubowitz	FDD (CEO)	Hawkish	8	Beta(5.75, 3.25)	0.639
Crooke	Conflicts Forum / Ex-MI6	Dovish	8	Beta(5.50, 3.50)	0.611
Friedman	Geopolitical Futures	Other	9	Beta(5.75, 4.25)	0.575
Rubin	AEI / Middle East Forum	Hawkish	8	Beta(4.50, 4.50)	0.500
Ritter	Ex-UNSCOM Inspector	Dovish	9	Beta(3.75, 6.25)	0.375
Gerecht	FDD / Ex-CIA	Hawkish	8	Beta(3.00, 6.00)	0.333
Molyneux	Independent / YouTube	Other	7	Beta(1.00, 7.00)	0.125

Ideological balance check

The accuracy gap between camps is moderate, not extreme, and it is driven primarily by two outliers: Gerecht (0.333) pulling hawks down, and Ritter (0.375) pulling doves down. The highest-accuracy hawk — Ben Taleblu at 0.719 — is comparable to mid-tier doves. The data, not the ideology, drives the weighting.

Group	Count	Avg accuracy	Key observation
Dovish / Anti-Interventionist	7	0.695	Strong on intervention blowback
Hawkish / Interventionist	6	0.586	Strong on Iran technical threats
Other / Independent	2	0.350	Molyneux pulls the average down

Selected prediction track record

The full ledger runs to 113 scored predictions. The table below highlights the calls that most strongly anchor each expert's posterior — the structural calls, the Iraq-era stress tests, and the Iran-specific technical forecasts that separate the panel's top and bottom tiers.

Expert	Prediction	Year	Score	Verdict
Mearsheimer	NATO expansion would provoke Russian aggression vs Ukraine	2014	1.00	TRUE
Mearsheimer	Iraq War would be a strategic disaster / quagmire	2003	1.00	TRUE
Mearsheimer	Libya intervention would create a failed state / chaos	2011	1.00	TRUE
Mearsheimer	Ukraine conflict would become a prolonged war of attrition	2022	1.00	TRUE
Vaez	US withdrawal from JCPOA would accelerate enrichment	2018	1.00	TRUE
Vaez	Maximum pressure sanctions fail to change regime behavior	2019	1.00	TRUE
Vaez	Iran would accelerate nuclear program after JCPOA collapse	2019	1.00	TRUE
Kinzer	1953 coup blowback pattern repeats in future interventions	2003	1.00	TRUE
Kinzer	Libya intervention would create chaos	2011	1.00	TRUE
Bajoghli	Sanctions hurt Iranian civilians more than regime elites	2022	1.00	TRUE
Bajoghli	Mahsa Amini protests = deep structural legitimacy crisis	2022	0.75	MOSTLY TRUE
Ben Taleblu	Iran would transfer drones / missiles to Russia after embargo	2020	1.00	TRUE
Ben Taleblu	Houthi missile capabilities pose a genuine regional threat	2018	1.00	TRUE
Sachs	Iraq sanctions cause humanitarian catastrophe w/o regime change	1990s	1.00	TRUE
Sachs	Iraq War would destabilize the Middle East broadly	2003	1.00	TRUE
Takeyh	Regime multi-layered; bombing alone won't topple it	2026	0.75	MOSTLY TRUE
Takeyh	Air power can destroy nuclear / missile programs but Iran rebuilds	2026	0.75	MOSTLY TRUE
Dubowitz	Biden admin lax enforcement enabled Iran oil revenue surge	2021–24	1.00	TRUE
Dubowitz	Maximum pressure would bring Iran to the negotiating table	2018	0.25	MOSTLY FALSE
Rubin	Iraq regime change would benefit regional stability	2002–03	0.00	FALSE
Rubin	Chalabi as reliable US partner in post-Saddam Iraq	2003	0.00	FALSE
Rubin	Turkey under Erdogan increasingly hostile to US interests	2015	1.00	TRUE
Ritter	Iraq had no significant WMD stockpiles (pre-2003)	2002	1.00	TRUE
Ritter	Decisive Russian military victory in Ukraine in 2023	2023	0.25	MOSTLY FALSE
Gerecht	Iraq War won't destabilize the Mideast	2002	0.00	FALSE
Gerecht	US invasion of Iraq would provoke democratic revolution in Iran	2002	0.00	FALSE
Molyneux	Western civilization collapse imminent due to immigration	2015–19	0.00	FALSE
Molyneux	Brexit would trigger EU collapse within years	2016	0.00	FALSE

Selected excerpts; the complete ledger of 113 predictions is in the source report. Categories span Geopolitical, Military, Economic, Nuclear, and Regime outcomes.

Final scenario forecasts

The consensus forecast below is the extremized, correlation-adjusted posterior across 100,000 Monte Carlo draws. HDIs are Highest Density Intervals from the posterior sample.

Scenario	Mean	Median	80% HDI	95% HDI	Std Dev
Quick Win / Regime Change	9.2%	8.9%	[5.4%, 12.2%]	[4.3%, 14.8%]	2.8%
Prolonged Quagmire	52.3%	52.4%	[45.1%, 59.7%]	[41.2%, 63.4%]	5.7%
Stalemate / De-escalation	21.9%	21.7%	[15.7%, 27.2%]	[13.5%, 31.1%]	4.5%
Catastrophic Spread	16.6%	16.2%	[11.2%, 21.2%]	[9.2%, 24.5%]	4.0%

Model evolution: 4 → 10 → 15 experts

The panel has grown across three iterations. The 15-expert run re-balances the earlier, dove-heavy aggregations by explicitly adding hawkish voices — and the shifts, while real, are smaller than the rhetoric around them would suggest.

Scenario	Original (4-Expert)	10-Expert	15-Expert	Δ (15 vs 10)
Quick Win / Regime Change	25.0%	5.2%	9.2%	+4.0 pp
Prolonged Quagmire	53.0%	56.9%	52.3%	−4.6 pp
Stalemate / De-escalation	14.0%	18.8%	21.9%	+3.1 pp
Catastrophic Spread	8.0%	19.1%	16.6%	−2.5 pp

Key structural findings

1. Quagmire remains dominant

At 52.3%, prolonged attrition is the strong modal outcome. Adding hawkish experts pulled it down only ~4.6 percentage points from the 10-expert model. Even the highest-accuracy hawks — Takeyh and Ben Taleblu — explicitly argue that air power alone cannot topple the Iranian regime.

2. Quick Win recovers modestly

From 5.2% in the 10-expert model to 9.2% here. Hawks like Dubowitz (30%) and Gerecht (25%) boost this, but their lower track-record accuracy limits their influence. The consensus probability is still under 10%.

3. Stalemate rises

From 18.8% to 21.9%. Several hawks — Rubin at 30%, Takeyh at 28% — assign meaningful probability to a negotiated off-ramp, reflecting policy sophistication even within interventionist frameworks.

4. Catastrophe moderates

Down from 19.1% to 16.6%. Hawks generally assign lower catastrophe probability (15–18%) than doves (20–30%), reflecting their assessment that Iranian conventional capabilities are degraded. The data partially validates this — but 16.6% is still the fourth-largest tail in a four-outcome partition.

5. The FDD cluster gets discounted

Four FDD experts — Dubowitz, Gerecht, Ben Taleblu, Schanzer — share institutional correlations of 0.45–0.60. The correlation adjustment prevents their shared analytical framework from being counted four times. Effective independent experts: 10.13 out of 15.

6. The accuracy gap is real but nuanced

Dovish experts average 0.695 accuracy versus hawkish 0.586. The gap is driven largely by the Iraq War era: hawkish analysts who predicted Iraq would stabilize or catalyze Iranian democracy were systematically wrong. On Iran-specific technical assessments — JCPOA flaws, missile threats, proxy networks — hawks have strong records.

Model parameters

Parameter	Value
Monte Carlo draws	N = 100,000
Prior	Jeffreys Beta(0.5, 0.5)
Total experts	15 (7 dovish, 6 hawkish, 2 other)
Total scored predictions	113
Correlation matrix	15×15 with FDD cluster (0.45–0.60), realist cluster (0.40–0.50)
Effective independent experts	10.13 out of 15 (eigenvalue method)
Extremizing factor d	0.675 (Satopaa et al. 2014)
Dirichlet κ range	6–16 (confidence parameter per expert)
Aggregation methods	Linear Pool, Logarithmic Pool, Extremized (corr-adj)
HDI credibility levels	80% and 95%
Scoring scale	TRUE = 1.0, MOSTLY TRUE = 0.75, PARTIAL = 0.5, MOSTLY FALSE = 0.25, FALSE = 0.0
Random seed	42 (for reproducibility)

Limitations

What the model does not capture

Subjective scoring. Different analysts could reasonably assign different verdict scores to the same prediction. The 5-point scale mitigates this versus binary scoring, but does not eliminate it.
Inferred, not elicited. Scenario probability vectors are inferred from public statements; experts have not been surveyed to confirm the distributions attributed to them.
Qualitative correlation matrix. The 15×15 matrix is constructed from assessment of shared sources and institutional affiliations, not estimated from data.
Iraq-era shadow. Several hawkish experts — Gerecht, Rubin — have track records heavily shaped by 2002–03 calls. Whether their post-Iraq analytical evolution is captured in the static scoring is debatable.
No temporal discount. All past predictions are equally weighted regardless of recency or difficulty. A more sophisticated model might discount older or easier calls.
Living-conflict caveat. Ground truth for the four scenarios has not yet been determined. The consensus reflects expert-aggregated probabilities as of March 1, 2026, not certainty about outcomes.

Source: Bayesian Expert Aggregation — US–Israel vs Iran Conflict Scenario Forecasting, Unmitigated Wisdom, report dated March 1, 2026. Expanded 15-Expert Panel · 113 Scored Predictions · 100,000 Monte Carlo Draws. The full PDF is available via the Download link in the header.