Bayesian Panel · 15 Experts · 113 Predictions

Bayesian Expert Aggregation on US–Israel vs Iran

By Unmitigated Wisdom  ·   ·  View on Telegram →  ·  Download PDF ↓

An expanded 15-expert panel — dovish, hawkish, and independent — scored across 113 past predictions and aggregated with 100,000 Monte Carlo draws. The consensus still points to prolonged quagmire: quick-win regime change is under 10%, and catastrophic spread, while real, is not the modal outcome.

15
Experts scored7 dovish · 6 hawkish · 2 independent
113
Scored predictions
100k
Monte Carlo draws
52.3%
Prolonged quagmireModal scenario · 80% HDI [45.1, 59.7]
21.9%
Stalemate / de-escalation
16.6%
Catastrophic spread
9.2%
Quick win · regime change
10.13
Effective independent expertsout of 15 · after correlation adjustment

Panel includes dovish, hawkish, and independent analysts to ensure ideological balance. Weighting is purely empirical: past accuracy determines influence on consensus.

Methodology at a glance

Each expert's track record is scored on a 5-point scale — TRUE (1.0), MOSTLY TRUE (0.75), PARTIAL (0.5), MOSTLY FALSE (0.25), FALSE (0.0) — and fed into a Beta-distributed posterior with a Jeffreys prior of Beta(0.5, 0.5). The posterior mean E[p] = α/(α+β) gives each expert's expected accuracy while preserving uncertainty: an analyst with five calls at 0.75 has a wider posterior than one with twenty calls at the same rate.

Scenario predictions are modelled as Dirichlet distributions over the four outcomes. The concentration parameter κ (range 6–16 across the panel) captures how confident and specific each expert's stated position is — higher κ for analysts with sharper public views, lower κ where their writings allow more variation. The probability vectors themselves are inferred from published writings, testimonies, and public statements, not elicited directly.

Monte Carlo aggregation (N = 100,000)

Each draw samples every expert's reliability from their Beta posterior, then samples their scenario prediction from their Dirichlet, and computes three aggregations: a linear pool (reliability-weighted average), a logarithmic pool (geometric weighted average), and an extremized correlation-adjusted estimate. All reported intervals are Highest Density Intervals at 80% and 95% — proper Bayesian intervals from the MC posterior, not normal approximations.

A 15×15 correlation matrix captures shared information between experts. The FDD cluster (Dubowitz, Gerecht, Ben Taleblu, Schanzer) shares institutional overlap at 0.45–0.60; the realist cluster (Mearsheimer, Sachs, Kinzer) shares framework overlap at 0.40–0.50; cross-ideology correlations sit at 0.05–0.25. Eigenvalue decomposition yields an effective number of independent experts of 10.13 out of 15, producing an extremizing factor d = 0.675 following Satopaa et al. (2014).

Ideological balance by design

The 15-expert panel is deliberately balanced: seven dovish / anti-interventionist analysts (Mearsheimer, Sachs, Kinzer, Bajoghli, Vaez, Ritter, Crooke), six hawkish / interventionist analysts (Dubowitz, Gerecht, Rubin, Takeyh, Ben Taleblu, Schanzer), and two independents (Molyneux, Friedman). The model assigns no ideological weights — accuracy alone determines influence. If hawks had better track records, they would naturally dominate the consensus.

Expert reliability scores

The posterior mean E[p] is the weight each expert carries in aggregation. The ordering below is by accuracy, not ideology; realists and Iran-specialists cluster near the top, Iraq-era advocates near the bottom.

ExpertAffiliationCampNBeta posteriorE[p]
MearsheimerU. Chicago (Realist)Dovish8Beta(7.75, 1.25)0.861
VaezIntl Crisis Group (ICG)Dovish6Beta(5.75, 1.25)0.821
KinzerBoston Univ / AuthorDovish7Beta(6.25, 1.75)0.781
BajoghliJohns Hopkins (Ethnographer)Dovish5Beta(4.50, 1.50)0.750
Ben TalebluFDD Iran Program Dir.Hawkish7Beta(5.75, 2.25)0.719
SachsColumbia / UN AdvisorDovish8Beta(6.00, 3.00)0.667
TakeyhCFR Sr FellowHawkish8Beta(6.00, 3.00)0.667
SchanzerFDD Exec DirectorHawkish7Beta(5.25, 2.75)0.656
DubowitzFDD (CEO)Hawkish8Beta(5.75, 3.25)0.639
CrookeConflicts Forum / Ex-MI6Dovish8Beta(5.50, 3.50)0.611
FriedmanGeopolitical FuturesOther9Beta(5.75, 4.25)0.575
RubinAEI / Middle East ForumHawkish8Beta(4.50, 4.50)0.500
RitterEx-UNSCOM InspectorDovish9Beta(3.75, 6.25)0.375
GerechtFDD / Ex-CIAHawkish8Beta(3.00, 6.00)0.333
MolyneuxIndependent / YouTubeOther7Beta(1.00, 7.00)0.125

Ideological balance check

The accuracy gap between camps is moderate, not extreme, and it is driven primarily by two outliers: Gerecht (0.333) pulling hawks down, and Ritter (0.375) pulling doves down. The highest-accuracy hawk — Ben Taleblu at 0.719 — is comparable to mid-tier doves. The data, not the ideology, drives the weighting.

GroupCountAvg accuracyKey observation
Dovish / Anti-Interventionist70.695Strong on intervention blowback
Hawkish / Interventionist60.586Strong on Iran technical threats
Other / Independent20.350Molyneux pulls the average down

Selected prediction track record

The full ledger runs to 113 scored predictions. The table below highlights the calls that most strongly anchor each expert's posterior — the structural calls, the Iraq-era stress tests, and the Iran-specific technical forecasts that separate the panel's top and bottom tiers.

ExpertPredictionYearScoreVerdict
MearsheimerNATO expansion would provoke Russian aggression vs Ukraine20141.00TRUE
MearsheimerIraq War would be a strategic disaster / quagmire20031.00TRUE
MearsheimerLibya intervention would create a failed state / chaos20111.00TRUE
MearsheimerUkraine conflict would become a prolonged war of attrition20221.00TRUE
VaezUS withdrawal from JCPOA would accelerate enrichment20181.00TRUE
VaezMaximum pressure sanctions fail to change regime behavior20191.00TRUE
VaezIran would accelerate nuclear program after JCPOA collapse20191.00TRUE
Kinzer1953 coup blowback pattern repeats in future interventions20031.00TRUE
KinzerLibya intervention would create chaos20111.00TRUE
BajoghliSanctions hurt Iranian civilians more than regime elites20221.00TRUE
BajoghliMahsa Amini protests = deep structural legitimacy crisis20220.75MOSTLY TRUE
Ben TalebluIran would transfer drones / missiles to Russia after embargo20201.00TRUE
Ben TalebluHouthi missile capabilities pose a genuine regional threat20181.00TRUE
SachsIraq sanctions cause humanitarian catastrophe w/o regime change1990s1.00TRUE
SachsIraq War would destabilize the Middle East broadly20031.00TRUE
TakeyhRegime multi-layered; bombing alone won't topple it20260.75MOSTLY TRUE
TakeyhAir power can destroy nuclear / missile programs but Iran rebuilds20260.75MOSTLY TRUE
DubowitzBiden admin lax enforcement enabled Iran oil revenue surge2021–241.00TRUE
DubowitzMaximum pressure would bring Iran to the negotiating table20180.25MOSTLY FALSE
RubinIraq regime change would benefit regional stability2002–030.00FALSE
RubinChalabi as reliable US partner in post-Saddam Iraq20030.00FALSE
RubinTurkey under Erdogan increasingly hostile to US interests20151.00TRUE
RitterIraq had no significant WMD stockpiles (pre-2003)20021.00TRUE
RitterDecisive Russian military victory in Ukraine in 202320230.25MOSTLY FALSE
GerechtIraq War won't destabilize the Mideast20020.00FALSE
GerechtUS invasion of Iraq would provoke democratic revolution in Iran20020.00FALSE
MolyneuxWestern civilization collapse imminent due to immigration2015–190.00FALSE
MolyneuxBrexit would trigger EU collapse within years20160.00FALSE

Selected excerpts; the complete ledger of 113 predictions is in the source report. Categories span Geopolitical, Military, Economic, Nuclear, and Regime outcomes.

Final scenario forecasts

The consensus forecast below is the extremized, correlation-adjusted posterior across 100,000 Monte Carlo draws. HDIs are Highest Density Intervals from the posterior sample.

ScenarioMeanMedian80% HDI95% HDIStd Dev
Quick Win / Regime Change9.2%8.9%[5.4%, 12.2%][4.3%, 14.8%]2.8%
Prolonged Quagmire52.3%52.4%[45.1%, 59.7%][41.2%, 63.4%]5.7%
Stalemate / De-escalation21.9%21.7%[15.7%, 27.2%][13.5%, 31.1%]4.5%
Catastrophic Spread16.6%16.2%[11.2%, 21.2%][9.2%, 24.5%]4.0%

Model evolution: 4 → 10 → 15 experts

The panel has grown across three iterations. The 15-expert run re-balances the earlier, dove-heavy aggregations by explicitly adding hawkish voices — and the shifts, while real, are smaller than the rhetoric around them would suggest.

ScenarioOriginal (4-Expert)10-Expert15-ExpertΔ (15 vs 10)
Quick Win / Regime Change25.0%5.2%9.2%+4.0 pp
Prolonged Quagmire53.0%56.9%52.3%−4.6 pp
Stalemate / De-escalation14.0%18.8%21.9%+3.1 pp
Catastrophic Spread8.0%19.1%16.6%−2.5 pp

Key structural findings

1. Quagmire remains dominant

At 52.3%, prolonged attrition is the strong modal outcome. Adding hawkish experts pulled it down only ~4.6 percentage points from the 10-expert model. Even the highest-accuracy hawks — Takeyh and Ben Taleblu — explicitly argue that air power alone cannot topple the Iranian regime.

2. Quick Win recovers modestly

From 5.2% in the 10-expert model to 9.2% here. Hawks like Dubowitz (30%) and Gerecht (25%) boost this, but their lower track-record accuracy limits their influence. The consensus probability is still under 10%.

3. Stalemate rises

From 18.8% to 21.9%. Several hawks — Rubin at 30%, Takeyh at 28% — assign meaningful probability to a negotiated off-ramp, reflecting policy sophistication even within interventionist frameworks.

4. Catastrophe moderates

Down from 19.1% to 16.6%. Hawks generally assign lower catastrophe probability (15–18%) than doves (20–30%), reflecting their assessment that Iranian conventional capabilities are degraded. The data partially validates this — but 16.6% is still the fourth-largest tail in a four-outcome partition.

5. The FDD cluster gets discounted

Four FDD experts — Dubowitz, Gerecht, Ben Taleblu, Schanzer — share institutional correlations of 0.45–0.60. The correlation adjustment prevents their shared analytical framework from being counted four times. Effective independent experts: 10.13 out of 15.

6. The accuracy gap is real but nuanced

Dovish experts average 0.695 accuracy versus hawkish 0.586. The gap is driven largely by the Iraq War era: hawkish analysts who predicted Iraq would stabilize or catalyze Iranian democracy were systematically wrong. On Iran-specific technical assessments — JCPOA flaws, missile threats, proxy networks — hawks have strong records.

Model parameters

ParameterValue
Monte Carlo drawsN = 100,000
PriorJeffreys Beta(0.5, 0.5)
Total experts15 (7 dovish, 6 hawkish, 2 other)
Total scored predictions113
Correlation matrix15×15 with FDD cluster (0.45–0.60), realist cluster (0.40–0.50)
Effective independent experts10.13 out of 15 (eigenvalue method)
Extremizing factor d0.675 (Satopaa et al. 2014)
Dirichlet κ range6–16 (confidence parameter per expert)
Aggregation methodsLinear Pool, Logarithmic Pool, Extremized (corr-adj)
HDI credibility levels80% and 95%
Scoring scaleTRUE = 1.0, MOSTLY TRUE = 0.75, PARTIAL = 0.5, MOSTLY FALSE = 0.25, FALSE = 0.0
Random seed42 (for reproducibility)

Limitations

What the model does not capture

  1. Subjective scoring. Different analysts could reasonably assign different verdict scores to the same prediction. The 5-point scale mitigates this versus binary scoring, but does not eliminate it.
  2. Inferred, not elicited. Scenario probability vectors are inferred from public statements; experts have not been surveyed to confirm the distributions attributed to them.
  3. Qualitative correlation matrix. The 15×15 matrix is constructed from assessment of shared sources and institutional affiliations, not estimated from data.
  4. Iraq-era shadow. Several hawkish experts — Gerecht, Rubin — have track records heavily shaped by 2002–03 calls. Whether their post-Iraq analytical evolution is captured in the static scoring is debatable.
  5. No temporal discount. All past predictions are equally weighted regardless of recency or difficulty. A more sophisticated model might discount older or easier calls.
  6. Living-conflict caveat. Ground truth for the four scenarios has not yet been determined. The consensus reflects expert-aggregated probabilities as of March 1, 2026, not certainty about outcomes.

Source: Bayesian Expert Aggregation — US–Israel vs Iran Conflict Scenario Forecasting, Unmitigated Wisdom, report dated March 1, 2026. Expanded 15-Expert Panel · 113 Scored Predictions · 100,000 Monte Carlo Draws. The full PDF is available via the Download link in the header.