A Response to "How Reliable Is Reliable Enough?"

Author(s):

This is the title of a disturbing commentary written by the leaders of the DSM-5 Task Force and published in a prominent psychiatric journal.

This is the title of a disturbing commentary written by the leaders of the DSM-5 Task Force and published in this month’s American Journal of Psychiatry,¹titled "DSM-5: How Reliable Is Reliable Enough?" The contents suggest that we must lower our expectations and be satisfied with levels of unreliability in DSM-5 that historically have been clearly unacceptable. Two approaches are possible when the DSM-5 field trials reveal low reliability for a given suggestion: (1) admit that the suggestion was a bad idea or that it is written so ambiguously as to be unusable in clinical practice, research, and forensics or (2) declare by arbitrary fiat that the low reliability is indeed now to be relabeled “acceptable.”

In the past, “acceptable” meant kappas of 0.6 or above. When the personality disorders in DSM-III came in at 0.54, they were roundly derided and given only a reluctant bye. For DSM-5, “acceptable” reliability has been reduced to a startling 0.2-0.4. This barely exceeds the level of agreement you might expect to get by pure chance.

Previously in its development, DSM-5 placed great store in its field trials. This quote is from the Chair of the DSM-5 Task Force: “There’s a myth that all the decisions have been made, when in fact, all the decisions haven’t been made. Just because things have been proposed doesn’t necessarily mean they’ll end up in the DSM-5. If they don’t achieve a level of reliability, clinician acceptability, and utility, it’s unlikely they’ll go forward.”

And this quote is from a 2010 interview given to a science writer by the head of the DSM-5 Oversight Committee: “It’s going to be based on the work of the field trials-based on the assessment and analysis of them. I don’t think anyone is going to say we’ve got to go forward if we get crappy results.”

The DSM-5 tune has now changed dramatically. The commentary written for AJP by the leadership of DSM-5 Task Force appears to be suggesting that they will, in fact, “go forward,” and with sub par reliabilities of 0.2-0.4. Now consider that the original field trial plan was to have a second phase to permit fixing those diagnostic criteria that were found to have unacceptable reliability in the first phase. These would go back to the work groups who could then rewrite the offending criteria and retest the new version in the second phase of the field trial.

But poor planning and administrative foul-ups kept pushing back the field trials so that they are now at least 18 months late in completion. As time was running out, DSM-5 leadership quietly dropped the second phase of the field trials, removing any reference to it from the timeline posted on the DSM-5 Web site. Their Plan B substitute for adequate field testing appears in AJP. To wit: a drastic lowering of the bar for what is “acceptable” reliability.

Can “accepting” unacceptably poor agreement uphold the integrity of psychiatric diagnosis? Poor reliability degrades our ability to communicate with one another clinically and prohibits meaningful research. “Accepting” as reliable kappas of 0.2-0.4 is to go backward more than 30 years to the days of DSM-II. Before DSM-III, Dr Robert Spitzer and Dr Mel Sabshin saw the need to develop a criterion-based system that could achieve reasonable diagnostic agreement. This is the very minimum condition necessary for current clinical work and future progress in psychiatry.

Reference1. Kraemer HC, Kupfer DJ, Clarke DE, et al. DSM-5: How Reliable Is Reliable Enough?Am J Psychiatry. 2012;169:13-15.