Rejecting the Accusation of a Violated STAR*D Protocol

Author(s)John Rush, MD, Madhukar H. Trivedi, MD, Maurizio Fava, MD

The original STAR*D authors refute claims of a violated protocol.

At the heart of this matter is the eye-popping and contentious difference in the results of 2 teams analyzing ostensibly the same data set, with one team reporting an estimated cumulative remission rate of 67% across up to 4 levels of treatment, provided sequentially over about 1 year,¹ and the other team asserting that the actual cumulative remission rates “per protocol” analyses was only 35%.² We understand the obvious dissonance raised by seeing such discrepant results reported by 2 teams analyzing the same set of data. However, this difference is not the result of one team being the accurate conveyor of the truth and the other being caught using statistical sleight of hand. We wish to briefly address the issues raised in the Psychiatric Times cover story³ before turning attention to the additional points raised by Pigott et al in their commentary in this issue.

The December 2023 story stated that the STAR*D study is often cited as having merit and as providing guidance to clinicians 24 years following its conceptualization. The story further questioned whether the study’s findings should be challenged. We should always question our views and positions as clinical researchers, as we owe it to our patients. It bears merit to have both positions discussing feasible explanations for discrepancies, while offering then an entry point of assuring that there can be an adjustment to what we have thought of when making sequential decisions for patients.

Given that this derived from an antiquated scheme, there is a clear need for modernizing our approach to patients with major depressive disorder. Although it is true that the STAR*D study was a huge undertaking, funded by the National Institute of Mental Health (NIMH) and conducted in collaboration with NIMH program officers, the study was designed 24 years ago and reflected the state of knowledge and research on depression therapeutics circa 2000. We believe that, looking back across the decades, the methods used to conduct this large-scale, multistage pragmatic research program were sound, although we appreciate that some of the options studied in the third and fourth steps of STAR*D are no longer widely used. Obviously, STAR*D does not address many treatment options that have emerged in the 21st century for individuals with difficult-to-treat depression. This issue was addressed in our 2022 editorial, published in JAMA Psychiatry, suggesting that many of the STAR*D findings are no longer relevant to the current pharmacopoeia.⁴

With respect to the more specific issues raised, it appears the piece accepted Pigott et al’s assertion that we “violated the STAR*D protocol” by not using the Hamilton Rating Scale for Depression (HRSD) to define remission in our 2006 paper.¹ To be clear, we chose the Quick Inventory of Depressive Symptomatology-Self Rated (QIDS-SR) because it permitted inclusion of a larger number of participants than the HRSD, not because we wished to “inflate” the benefits of treatment. As we have noted elsewhere,⁵ these analyses were not part of the prespecified data analysis plan that was used for the main STAR*D outcome papers, and we were free to select the most appropriate method to accurately assess the outcome of our sample. As reported by Pigott et al in their very conservative reanalysis of the STAR*D outcome data,² the use of QIDS-SR instead of the HRSD resulted in about an 18% relative increase in the estimated likelihood of the cumulative remission rate (in absolute terms, from 35% to 41%).

The cover story also suggested that a patient-reported outcome (PRO), such as the QIDS-SR, is a less valid indicator of benefit than the HRSD.³ It was also suggested that the QIDS-SR was a parochial measure (ie, that it had only been studied by members of the STAR*D investigative team). We wish to simply state that PROs are now routinely used in large-scale, pragmatic clinical trials to represent the participants’ perceptions of benefit, independent of the clinician assessment bias. We also wish to note that research in the past 20 years since the original publications on the QIDS-SR has confirmed the reliability and validity of this PRO.⁶

A third, arguably most important, point is that the cover story appears to accept the contention by Pigott et al that we further “inflated” the cumulative estimate of remission by “violating” the STAR*D protocol to include the data of patients who had low HRSD scores at various baseline assessment points (some patients had baseline assessments for up to 4 steps of treatment) and by not using the “per protocol” method of handling missing data. This again reflects a misunderstanding of the rationale for estimating a cumulative remission rate for the STAR*D participants across up to 1 year of treatment. Such a cumulative assessment of outcome is only truly relevant for those who remained in the study despite not benefiting from 1 or more courses of treatment and participated in up to 4 treatment trials. Pigott et al assumed that patients who left the STAR*D study at any point before achieving a remission level of benefit would have remained unremitted for the study duration. This is the largest source of disagreement in our findings, and it needs to be acknowledged that, in our 2006 paper, STAR*D investigators did not violate the study’s protocol or “NIMH-approved” data analysis plan.¹

As pointed out in our letter to the American Journal of Psychiatry,⁵ the analytic approach taken by Pigott et al² has significant methodological flaws. Pigott et al selectively eliminated the data from 561 (15%) of the 3671 patients reported by Rush et al who enrolled in Level 1 of STAR*D, 297 (21%) of the 1439 patients who enrolled in Level 2, 80 (21%) of the 377 patients who enrolled in Level 3, and 3 (3%) out of 109 patients who enrolled in Level 4. In total, 941 patients included in our original analyses¹ were eliminated from Pigott et al’s reanalyses² based on their post hoc criteria. The rationale for removing these participants from the longitudinal analysis appears to reflect a studious misunderstanding of the aims of the Rush et al paper, with the resulting large difference in remission rates most likely the result of exclusion by Pigott et al² of hundreds of patients with low symptom scores at the time of study exit.

As discussed in that letter,⁵ the overall goal of STAR*D was to conduct a series of randomized comparisons of the effectiveness of several commonly used antidepressant medications and adjunctive strategies across 3 steps (Levels 2, 3, and 4) in a representative sample of outpatients with depression. To enter the sequential comparative effectiveness trials, patients first were treated for up to 3 months with the antidepressant citalopram (Level 1). Effectiveness trials by design aim to be more inclusive and representative of the real world than efficacy trials. By removing the data of more than 900 study participants from their reanalyses, Pigott et al failed to recognize the purpose of inclusiveness. It appears that the authors created rules to define post hoc which participants to include; this eliminated many individuals who experienced large improvements during one or another of the study’s levels. By doing so, the sample is biased to underestimate the actual remission rates.

Our original report of the Level 1 outcome in STAR*D⁷ was criticized by researchers for underestimating remission rates in Level 1, as Pigott et al mention: “STAR*D investigators state in their level 1 article, ‘our primary analyses classified patients with missing exit HRSD scores as nonremitters a priori.’ ”² This approach, which also was used in the primary analyses of each of the randomized treatment levels, is very conservative because some patients who drop out of studies are improved at the time of exit from the study. One of the limitations of STAR*D was the fact that the HRSD was only administered at baseline and at the end of each level (typically after 3 months of treatment). As a result, HRSD scores were typically not available when patients dropped out of the study. By contrast, the patient-reported QIDS-SR, a well-validated measure of depression severity,⁶ was administered at every patient visit and was therefore deemed by the authors to provide a more accurate reflection of the patients’ clinical status during the trial. It is for this reason that the QIDS-SR was used in the Rush et al¹ paper, because the QIDS-SR captured patients’ symptom status regardless of level/step and regardless of whether the HRSD was obtained at study exit.

A primary criticism leveled in the Pigott et al paper is that “the STAR*D investigators did not use the protocol-stipulated HRSD to report cumulative remission and response rates in their summary.”² As previously reported,⁵ what Pigott et al fail to appreciate is that the overall outcomes of patients across 1 year of treatment reported by Rush et al¹ was not an a priori–identified analysis in the protocol,⁸ but a secondary post hoc report, specifically requested by the editor-in-chief of the American Journal of Psychiatry at that time, with the goal of summarizing the clinical outcomes—as measured by the QIDS-SR (capturing the symptom status of each patient at the last visit regardless of level and regardless of whether or not the HRSD was obtained at study exit)—of this complicated multilevel trial. As such, the use of different methods and alternate measures in secondary analyses is a well-accepted scientific approach to explore the data and develop new hypotheses for future research. Moreover, as clearly stated in the Rush et al paper,¹ the estimated cumulative remission rate was based on the assumption that the patient remained in the study, completed it, and, if needed, participated in all 4 levels of treatment.

The large discrepancy in remission rates reported in 2 papers working with the same set of patient data is surely provocative but indicates that one of the conclusions is not plausible. Pigott et al concluded that only 35% of depressed participants achieved remission with up to 4 consecutive antidepressant treatments in the course of approximately 12 months.² In Level 1 of STAR*D, remission rates were 28% based on the HRSD and 33% based on the QIDS-SR.⁵ Therefore, the finding of Pigott et al is that only an additional 7% of the patients with depression achieved remission in Levels 2, 3, or 4. Our primary papers reporting the outcomes of those levels disprove that.

We have noted⁵ that Jay Amsterdam, MD, senior author of the Pigott et al paper, coauthored a paper reporting the results of a study in which he played a key role and that utilized a sequential pharmacotherapy protocol informed by the STAR*D results.⁹ That study found a 60% cumulative remission rate across 12 months with antidepressant treatment alone, a result that is much closer to the 67% remission rate of the original STAR*D report1 than the Pigott et al rate of 35%.² In their response to our published comment, Pigott et al stated: “Once again though, STAR*D PIs make yet another scientific error by their apples-to-oranges comparison and instead should have compared STAR*D’s outcomes to that of the precursor study to the STAR*D trial: the Texas Medication Algorithm Project [T-MAP] Depression Study.”

This comment demonstrates the lack of familiarity with the T-MAP study, as the T-MAP study engaged only public sector patients in Texas with mental health disorders, the vast majority of whom were disabled, semiemployed, or unemployed, and no primary care patients participated. The STAR*D trial was meant to be far more inclusive, had a clear focus on the recruitment of primary and specialty care patients, and included patients seen in public and private settings.

In addition, Pigott et al argue that the STAR*D investigators used data from 941 patients deemed ineligible for analysis in step 1 because they lacked a blindly administered HRSD score of 14 or higher at entry into the study. The purpose of the reanalysis was inclusiveness; excluding participants because of a single missing assessment, in the presence of patients’ self-rated reports of adequate depression severity, was therefore not deemed appropriate.

Concluding Thoughts

We believe that these responses have provided ample evidence to reject Pigott et al’s accusation that we “violated the STAR*D protocol.” Furthermore, Pigott et al imply that the exposure to antidepressant therapies in STAR*D may have increased the risk for suicide among its participants. As Pigott et al mention in their commentary, we published a post hoc analysis of emergent suicidality that reported a likely suicide among 1 of those 234 patients not included in the primary analyses of STAR*D.¹⁰ No other suicides had occurred among the remaining 233 patients. Pigott et al argue that this was a significant omission. It is standard practice in clinical trials to exclude from the analyses patients who never come back for a visit after antidepressant treatment initiation and, for that reason, we excluded from our analyses 234 patients who were prescribed citalopram and dropped out prior to their first postbaseline visit. In fact, any adverse event that happens to patients who are given a prescription but never return for a first visit could be attributed to not getting any treatment rather than to the treatment itself, as patients may not return to their first visit because they never started the medicine or changed their minds about participating in the trial.

Because standard antidepressants can be helpful to patients but are not a panacea and are not devoid of adverse effects, a balanced approach to patient care incorporating a discussion of the utility as well as risks and benefits of standard antidepressants is necessary, without either exaggerating or minimizing their efficacy or safety.

Dr Rush is a professor emeritus at Duke-NUS Medical School at the National University of Singapore, and an adjunct professor of psychiatry and behavioral sciences at Duke University School of Medicine in Durham, North Carolina. Dr Trivedi is professor of psychiatry, chief of the Division of Mood Disorders, and director of the Center for Depression Research and Clinical Care at the University of Texas Southwestern Medical Center in Dallas. Dr Fava is associate dean for clinical and translational research at Harvard Medical School in Boston, Massachusetts. He is psychiatrist-in-chief in the Department of Psychiatry at Massachusetts General Hospital and vice chair of the Executive Committee on Research at the Massachusetts General Hospital in Boston. He is also executive director of the Clinical Trials Network & Institute, and the Slater Family Professor of Psychiatry at Harvard Medical School. Dr Thase is a professor of psychiatry at Perelman School of Medicine, University of Pennsylvania, and the Corporal Michael J. Crescenz Veterans Affairs Medical Center, both in Philadelphia. Dr Wisniewski is professor and codirector of the Epidemiology Data Center, and the vice provost for budget and analytics at the University of Pittsburgh, Pennsylvania.

References

1. Rush AJ, Trivedi MH, Wisniewski SR, et al. Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report. Am J Psychiatry. 2006;163(11):1905-1917.

2. Pigott HE, Kim T, Xu C, et al. What are the treatment remission, response and extent of improvement rates after up to four trials of antidepressant therapies in real-world depressed patients? A reanalysis of the STAR*D study’s patient-level data with fidelity to the original research protocol. BMJ Open. 2023;13(7):e063095.

3. Miller JJ. STAR*D dethroned? Psychiatric Times. 2023;40(12).

4. Perlis RH, Fava M. Is it time to try sequenced treatment alternatives to relieve depression (STAR*D) again? JAMA Psychiatry. 2022;79(4):281-282.

5. Rush AJ, Trivedi MH, Fava M, et al. The STAR*D data remain strong: reply to Pigott et al. Am J Psychiatry. 2023;180(12):919-920.

6. Reilly TJ, MacGillivray SA, Reid IC, Cameron IM. Psychometric properties of the 16-item Quick Inventory of Depressive Symptomatology: a systematic review and meta-analysis. J Psychiatr Res. 2015;60:132-140.

7. Trivedi MH, Rush AJ, Wisniewski SR, et al; STAR*D Study Team. Evaluation of outcomes with citalopram for depression using measurement-based care in STAR*D: implications for clinical practice. Am J Psychiatry. 2006;163(1):28-40.

8. Rush AJ, Fava M, Wisniewski SR, et al; STAR*D Investigators Group. Sequenced treatment alternatives to relieve depression (STAR*D): rationale and design. Control Clin Trials. 2004;25(1):119-142.

9. Hollon SD, DeRubeis RJ, Fawcett J, et al. Effect of cognitive therapy with antidepressant medications vs antidepressants alone on the rate of recovery in major depressive disorder: a randomized clinical trial. JAMA Psychiatry. 2014;71(10):1157-1164.

10. Zisook S, Trivedi MH, Warden D, et al. Clinical correlates of the worsening or emergence of suicidal ideation during SSRI treatment of depression: an examination of citalopram in the STAR*D study. J Affect Disord. 2009;117(1-2):63-73.