Published on May 21, 2025 9:08 PM GMT
I appear on Patrick McKenzie's Complex Systems podcast, presenting my overall view of what's wrong in the world of drug development.
Here's a section I particularly enjoyed:
Patrick McKenzie: So, my fun anecdote about recruiting for a clinical trial. I happened to graduate from Washington University in St. Louis, and by coincidence, during my stint with the U.S. vaccine location information architecture project, WashU was running a trial called Stop COVID. They were studying the efficacy of fluvoxamine, an antidepressant that might have off-label potential as an acute COVID treatment. The idea was to offer patients a range of effective medications to reduce severe clinical outcomes from their exposure to COVID.
I got an outsider’s look at their efforts.
One of the unique circumstances for the WashU team was that, since fluvoxamine was already FDA-approved, they didn’t need the full regulatory gauntlet but just needed data showing improved outcomes if patients took this readily available drug. They still had to run a study, recruit thousands—or ultimately, a few hundred—patients, and do so within a very short window. You need to recruit each patient between contracting COVID and (days later!) either recovering totally or experiencing a severe clinical outcome.
So, they went with an Internet marketing campaign to reach people who’d recently gotten COVID. Naturally, they encountered some issues because, as it turns out, med school doesn’t exactly cover Internet marketing.
One skilled individual who happened to know HTML on the team wrote up a 200-question screener to assess eligibility for the trial. They sent potential patients to this screener, and, as many who work online could predict, many people dropped off before completing all 200 questions. When I connected with the team, I assumed there must be a solid regulatory or procedure-based constraint for the length. I asked, “Is this an institutional review board requirement? Medical ethics? Something specific to drug development?” The answer? “We have 200 questions because that's what the grad student implemented.”
So, I asked, “How many of these questions do you actually need?” They replied, “Four.” At that point, I offered the help of someone skilled in conversion rate optimization to design a simple landing page with just those four essential questions to screen prospective patients and capture contact info for trial enrollment. Surprisingly, that approach worked. So, there are definitely opportunities for upskilling across the clinical trial supply chain. [Patrick notes: A full discussion of the clinical trial is beyond my ken, but I’m as-far-as-I-know accurately summarizing VaccinateCA’s small engagement with it, which was a) staffing my good friend Keith Perhac on their patient intake form and b) attempting to assist with patient recruitment via e.g. Facebook ads which we funded.]
And two, this was a clear example of how asking, “This thing that’s broken—why is it broken? Can it be improved?” often yields value. It reinforced my belief that, contrary to assumptions, just because many PhDs were involved in a project doesn’t mean that all possible alpha has been optimized out.
[Patrick notes: Eliezer Yudkowsky has a book-length treatment, Inadequate Equilibria, exploring why our intuition that “systems generally run about as efficiently as they can be made to run” functions well for certain systems (e.g. public equities markets, which are practically the base case for the Efficient Markets Hypothesis) and grossly mispredicts other systems. Medicine and drug development are both examined as miniature case studies within it.]
Ross Rheingans-Yoo: That’s right. One of the most surprising adjustments I had to make moving from public equity markets to clinical trials was realizing just how much untapped potential exists. There’s a surprising amount of low-hanging fruit, and there are practices that, if reconsidered, could lead to better medical and commercial outcomes. Often these simply haven’t been considered.
The fact that a 200-question survey could be reduced to four if someone simply asked, “How many questions do we really need?” doesn’t surprise me at all.
The Stop COVID—or StopCOVID-2—trial from WashU is interesting because the investigators were collaborating on another trial I know well.
[Patrick notes: Does the world ever seem preposterously small to anyone else?]
Ross Rheingans-Yoo: For some specifics, I believe the StopCOVID trial at WashU was recruiting for several months, averaging about 100 patients a month. [Patrick notes: I obviously wanted that to be higher, but 2021 was a bit busy for VaccinateCA and myself, and we were limited in how much we could throw at their project.] They stopped at around 500 participants, realizing that, at the recruitment rate and with the criteria for measuring outcomes, they wouldn’t achieve statistically significant results anytime soon. So, they closed it out once they understood it wasn’t on track to deliver meaningful data.
Patrick McKenzie: It’s counterintuitive, but there are plenty of ways to lose while winning in drug development.
You can end a trial early because it becomes unethical to continue—like if preliminary data suggests the drug’s effectiveness is so strong you should start offering it immediately to everyone in the study. "Immediately" might be a bit of a stretch administratively, but it applies to the patients currently enrolled.
Or, you might stop a trial for the opposite reason—if it becomes clear the treatment isn’t effective.
In the case of StopCOVID, the trial was partially overtaken by events: the rollout of COVID vaccines started reducing severe cases, making it harder to recruit patients for countermeasure studies.
[Patrick notes: The U.S. then had a rough experience with Omicron, for people wondering why the graph didn’t stay improved.]
Ross Rheingans-Yoo: Sure, at least in that period and in that locale.
The interesting aspect of the StopCOVID-2 trial, which ran in early 2021, was that it was happening around the same time as the TOGETHER trial in Brazil. The TOGETHER trial studied multiple drugs, including fluvoxamine, and their recruitment overlapped with StopCOVID-2’s fluvoxamine recruitment.
They were recruiting about three times as many patients in roughly the same timeframe—1,500 patients instead of 500. Crucially, around 15% of the TOGETHER trial patients experienced clinically significant events, whereas only about 5% of StopCOVID-2’s patients did. Those numbers matter because, statistically, the study’s aim is to set a threshold for worsening health outcomes and determine if we can prevent patients from reaching that threshold. If I give the drug to someone who doesn’t get worse—like the 85% of people who naturally improve—they don’t contribute meaningfully to the study’s statistical analysis. What we want to observe is whether the percentage needing additional care decreases, say from 15% to 11%, or from 5% to 4.2%. To detect shifts within smaller percentages, though, we need a larger sample size. So, when I say TOGETHER recruited three times as fast, it’s also about the trial progressing towards statistically meaningful results faster.
Patrick McKenzie: This is the classic “statistics 102” topic on statistical power that often gets overlooked—even by people in marketing. A common mistake in A/B testing, for example, is looking for statistically significant results without enough sample size to detect a real effect at the levels one expects conversions to happen at.
[Patrick notes: As an illustration of this, a software company which currently has a trial-to-paid conversion rate of 3% and wants to run A/B tests which might generate 10% lifts (i.e. 3.3%) needs to burn ~60k trials per iteration to achieve 95% confidence that it will detect those lifts. This is difficult for companies which don’t get 60k trials in a 6 month period.
Awareness of statistical power in marketing departments is below awareness of statistical significance, and marketing departments understand statistical significance better than poorly educated sectors of society like e.g. doctors, who mostly are not given the lecture on why Bayes’ Rule matters for ordering tests.
(A test with a 1% false positive rate and a 1% false negative rate applied to a population with a 1% true incidence rate of Disease X will, contingent on flagging a patient as having X, have a what percent chance of being accurate? Most physicians answer ~99% when asked, not ~50%.)]
Ross Rheingans-Yoo: Exactly. TOGETHER’s recruitment rate for fluvoxamine patients didn’t just outpace StopCOVID-2 by sheer numbers; statistically, they were achieving useful data far faster on a per-patient basis. In practical terms, they were progressing toward their goal about nine times faster than StopCOVID-2. This difference highlights why a trial in the U.S.—with the vaccine rollout impacting COVID cases—proceeded very differently from one in Brazil, where TOGETHER was unaffected by widespread vaccination and operating under different conditions.
Ultimately, TOGETHER’s fluvoxamine study showed a roughly 30% reduction in patients needing hospitalization or intensive care, which was a significant finding. That meant a reduction from about 16% of patients needing further care down to 11%.
Frankly, if I could push a button to reduce my hospitalization odds by 30%, I’d push it immediately. By comparison, StopCOVID-2, with around 500 patients prior to cutting their fluvoxamine arm, saw a reduction in severe outcomes from about 5.5% to 4.5%. While it was still a positive result, it leaned closer to a 10% reduction rather than 30%, possibly due to the baseline context of medical care and vaccination rates in the U.S. versus Brazil.... (continues)
Discuss