An interactive exercise in what would change our mind about a global health funding recommendation, and whether we would notice in time.
Most funding recommendations look solid in writing. There is a positive expected value, a couple of supportive trials, a pathway from input to outcome that an intelligent reader can follow without too many uncomfortable questions. The model has been built carefully and the assumptions have been ticked off in a footnote.
Then reality intervenes. In one country office the effect holds. In another it collapses, often for reasons no one thought to put in the model. The monitoring system measures activity rather than impact, and by the time the underlying question — is the intervention still doing what we paid for? — receives an honest answer, the next funding cycle has already been locked in.
Not whether the model has any uncertainty. Whether enough of it sits in places that would actually change the funding call.
And would we recognise that evidence in time to change course, before the next funding cycle starts treating the current call as settled.
A research team tends to be judged less by the confidence of its first answer than by what it later admits to having got wrong.
Each parameter below is something a real research team has to estimate from imperfect evidence. The chart on the right shows the resulting distribution of plausible true returns. The dots are a hundred sample scenarios under your current assumptions. The optimizer's curse lives in how easily the headline number sits at the optimistic end of a much wider distribution.
I saw a version of this pattern repeatedly across 160+ UNDP country offices. The same procurement framework would produce a reliable signal in one operational setup and noise in another, less because of the framework itself than because of what was actually working in the field that quarter. Slide the context fidelity below to see how the realised effect changes. The original 4.8× sits on the left as a constant reminder of what the model promised.
Most monitoring systems are built to confirm that activities took place. Many fewer are built to detect whether the intervention is still working. The gap between those two questions is where most late-detected failures sit, quietly accruing, until someone finally looks for the right thing in the right place.
Each box below is a gap that any honest research team will recognise. The meter on the right gives a rough estimate of the confidence we would have that major underperformance would be detected in time to change course, rather than in time for a polite paragraph in the lookback.
You've moved sliders one at a time and tested one context at a time. The heatmap below shows the expected value for every combination of two parameters, with the others held at their defaults. Each cell is one possible reality. Green is robust, red is breaks, and the amber band in the middle is where most real-world grantmaking actually lives. The ★ marks the cell where the headline 4.8× recommendation currently sits.
None of this is an argument against ambitious global health funding. It is an argument for building research and monitoring systems that are honest about how little they sometimes know, and how late that knowledge tends to arrive.
None of the numbers on this page reproduce a GiveWell estimate or grant figure. The concepts and language, though, are taken directly from GiveWell's published work on cost-effectiveness, mistakes and moral weights, and from the broader effective altruism research conversation around uncertainty and red-teaming.
The Fragility Lab started as a way to clarify my own thinking about how cost-effectiveness recommendations actually fail in the field, and how late a research team usually finds out about it. The tool is small and stylised. The intention behind it is not.
It sits in a portfolio of solo-built interactive evidence tools, alongside Travel Shockwaves (a global travel-disruption engine, my entry to the Capgemini UK Visualisation Guild Challenge 2026) and The Civilization Lab (193 nations across 1900–2050, 50+ indicators, currently in private development). All three are built end-to-end as single-file vanilla HTML, through a documented multi-model AI orchestration protocol in which I keep the editorial work to myself.
My background is twenty years across UNDP, UNICEF Supply Division, WHO Europe, LEGO and Capgemini, mostly working on the messy institutional data that senior decisions actually depend on. Ten of those years were at UNDP in cross-portfolio analytics across 160+ country offices, including health commodity procurement (bed nets, ARVs, TB and malaria treatments) with Global Fund PQR data alongside UNDP Atlas records.