Clinical use of the Hamilton Depression Rating Scale: is increased efficiency possible? A post hoc comparison of Hamilton Depression Rating Scale, Maier and Bech subscales, Clinical Global Impression, and Symptom Checklist-90 scores☆
Introduction
Major depressive disorder is a severe disabling illness, expected to be the world's second health problem in 2020 [1]. Depression is associated with high costs, regarding direct treatment and indirect costs of loss of productivity and quality of life [2]. Several clinical guidelines were developed to guide the treatment of this disorder; both psychotherapy and pharmacotherapy (or in combination) appear effective [3], [4], [5], [6], [7], [8], [9], [10].
The use of self-report or clinician-rated symptom scales is recommended to assess severity and response to treatment [8], [11], [12]. Some experts claim clinician-rated symptom scales to have a larger validity and reliability than self-reporting scales, especially in patients with cognitive impairment, and more severe or psychotic depressions [11], [13], [14]. Specific symptom scales are more reliable than global rating scales [11], [13], [15] Especially, rating scales can be used to objectively determine specific cutoff points for response and remission [12], [16], [17].
In most clinical trials, the Hamilton Depression Rating Scale (HDRS) [18], [19]—a clinician-rated symptom scale—is used as a standard to determine severity and response. [5], [8], [11], [15], [20], [21], [22], [23]. Many versions of the HDRS exist, with the number of items usually varying between 17 and 24 [11], [18], [19], [22]; however, up to 36 items have been described [23]. Longer versions were especially developed to cover reverse neurovegetative (atypical) symptoms [23]. The Clinical Global Impression (CGI) [24]—a clinician-rated global scale—is also frequently used [5], [8], [15], [25]. In clinical practice, although recommended, rating scales are not used routinely. Explanations for this discrepancy could be ignorance of existing scales, a strong belief in one's clinical judgement, an unsystematic approach of depression, and also the amount of time needed for rating scales (eg, 15-20 minutes for the HDRS [11]) and the necessity of training [20], [26].
The HDRS is criticized as being sensitive to somatic symptoms (eg, somatic illness or side effects of drugs) [11], [15], [27], [28], for not rating all 9 Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition domains, its unequal weightings of different symptoms, and for the multidimensionality of the HDRS total score [13], [21], [29], [30], [31]. Multidimensionality is important to cover the maximum range of clinical features of major depressive disorder but does not necessarily measure depression severity. Multidimensional scales can be misleading when measurement of severity and treatment response is concerned [13], [21], [28], especially when the measured depressive symptoms do not change proportionally with depression severity. Finally, some reports emphasize that the HDRS systematically favors (sedative) tricyclic antidepressants (TCAs) above selective serotonin reuptake inhibitors (SSRIs) [27], [32], [33], [34], [35]. Sleep and somatic items may appear to be “improved” by side effects of TCAs but worsened by side effects (eg, insomnia, gastrointestinal complaints, and agitation) of SSRIs.
To overcome the problems of the multidimensional HDRS mentioned above, a more unidimensional subscale from the HDRS covering core symptoms of severity is desired. Also, from a clinical point of view, fewer items will be less time consuming for application by busy clinicians. However, for the purpose of reference, subscale scores must remain anchored to the original HDRS. To identify shorter unidimensional subscales, Maier and Philipp [28] used Rasch and Mokken analyses, and Gibbons et al [29] used factor analysis. Bech et al [36] developed another 6-item subscale. This scale initially emerged from an analysis with experienced psychiatrists as a validity criterion [36] and was validated psychometrically thereafter using Rasch analyses [37], [38]. This Bech subscale was combined with 4 items of the Cronholm-Ottosson Depression Scale to form the Bech-Rafaelsen Melancholia Scale [39]. Santor and Coyne [21] examined the score performances of individual HDRS items as a function of depression severity with a nonparametric Item Response Theory (IRT) approach, retaining 14 items. These 14 items included all 6 items of the Maier subscale and all 8 items of the Gibbons subscale. However, 1 item from the Bech subscale (13, somatic symptoms) was not included.
In a meta-analysis of individual patient data, Faries et al [40] evaluated the responsiveness of total HDRS and subscale scores in TCA and SSRI pharmacotherapy trials, finding a maximal sensitivity for the Maier subscale. In a similar reanalysis, Entsuah et al [41] found larger effect sizes (E-S) for the Bech, Maier, and Gibbons subscales compared with the HDRS in trials comparing SSRIs or venlafaxine. O'Sullivan et al [20] found comparable sensitivity to detect changes for the 6-item Bech subscale compared with the 17-item HDRS. Hooper and Bakish [26] found equal sensitivity to change during treatment for the 6-item Bech subscale compared with the HDRS 17-item version. Moller [32] and Bech et al [42], [43], [44] used the Bech subscale to reexamine treatment efficacy of SSRIs and mirtazapine (vs TCAs or placebo). The latter publications did not provide data for the Maier subscale.
In this paper, we describe a secondary analysis of our trial data to answer the following questions:
- (1)
Are the Maier, Bech, and HDRS comparable in the measurement of depression severity and the sensitivity to measure changes in severity?
- (2)
Is this comparability stable across the full range of response to treatment (eg, nonresponse, partial response, and full response), across different treatments and different baseline severity of depression?
- (3)
What are clinical cutoff points for the subscales to determine remission compared with conventional definitions [12], [16], [17].
We hypothesized that the differences between Maier, Bech, and HDRS scales would be small and that there would be no apparent effect modification across neither treatments nor baseline severity. In contrast, we hypothesized that for nonresponse and partial responders, the E-S would be smaller than for responders. This would additionally prove the hypothesis of sensitivity to change.
Section snippets
Patient selection
In the present analyses, we use data from 2 published, randomized controlled trials conducted between 1993 and 1998 [45], [46]. The first trial aimed at efficacy and effectiveness of pharmacotherapy versus the combination of pharmacotherapy with Short Psychodynamic Supportive Psychotherapy (SPSP) [47], [48], [49], [50] (16 sessions) [45]. The second trial investigated efficacy and effectiveness of a combination of pharmacotherapy with 8 versus 16 sessions of SPSP [46]. Pharmacotherapy in both
Patient characteristics
Table 1 shows demographics for the diagnostic and per protocol samples. There were no significant differences observed between the diagnostic and per protocol sample (tested as excluded vs included), except from a lower mean HDRS score (and Maier, Bech, and SCL-90 depression scores) in the diagnostic sample. This difference was caused by the application of the entrance criterion (HDRS ≥14) for randomization. No significant differences existed between the different treatment groups. The studied
Major findings
This study examined the relative effectiveness of the HDRS subscales as developed by Maier and Philipp [28] and Bech et al [37] in monitoring severity and treatment effects in depression. We found that the Maier and Bech subscales gave results comparable to the original 17-item HDRS, with high concurrent validity and increased mean inter-item correlations and internal consistency. Maier and Bech subscales were highly comparable to each other in the measurement of treatment changes. Differences
Conclusion
We think that both Maier and Bech subscales of the HDRS are equivalent to the HDRS and can easily be used to increase efficiency to measure treatment response in clinical practice. On theoretical grounds, we have a slight preference for the Maier subscale. The use of subscales would improve the efficiency and objectivity of measuring response in clinical practice, where often no scale (instead of a CGI) is used at all. This would further bridge the gap between clinical practice and
Acknowledgment
The original randomized controlled trials were supported by an unrestricted educational grant from Eli Lilly Netherlands. All studies were carried out by the Mentrum Depression Research Group. The authors thank all psychotherapists, psychiatrists, and residents for their excellent work.
References (65)
- et al.
A comparison of alternative assessments of depressive symptom severity: a pilot study
Psychiatry Res
(2000) - et al.
Exactly what does the Hamilton Depression Rating Scale measure?
J Psychiatr Res
(1993) - et al.
The responsiveness of the Hamilton Depression Rating Scale
J Psychiatr Res
(2000) - et al.
A critical examination of the sensitivity of unidimensional subscales derived from the Hamilton Depression Rating Scale to antidepressant drug effects
J Psychiatr Res
(2002) - et al.
Combining psychotherapy and antidepressants in the treatment of depression
J Affect Disord
(2001) - et al.
Definition and epidemiology of treatment-resistant depression
Psychiatr Clin North Am
(1996) - et al.
The 16-Item Quick Inventory of Depressive Symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression
Biol Psychiatry
(2003) - et al.
Relationships among measures of treatment outcome in depressed patients
J Affect Disord
(2003) - et al.
Evidence-based health policy-lessons from the Global Burden of Disease Study
Science
(1996) - et al.
Depression: a neglected major illness
J Clin Psychiatry
(1993)
Practice guideline for the treatment of patients with major depressive disorder (revision). American Psychiatric Association
Am J Psychiatry
Evidence-based guidelines for treating depressive disorders with antidepressants: a revision of the 1993 British Association for Psychopharmacology guidelines
J Psychopharmacol
Evidence report on: treatment of depression—newer pharmacotherapies
Psychopharmacol Bull
Consensusbijeenkomst depressie bij volwassenen
Depression in primary care: volume 1. Detection and diagnosis. Clinical practice guideline, number 5. AHCPR publication no. 93-0550
Depression in primary care: volume 2. Treatment of major depression. Clinical practice guideline, number 5. AHCPR publication no. 93-0551
Consensus guidelines in the treatment of major depressive disorder
J Clin Psychiatry
Clinical results for patients with major depressive disorder in the Texas Medication Algorithm Project
Arch Gen Psychiatry
Mood disorders measures
Clinical guidelines for establishing remission in patients with depression and anxiety
J Clin Psychiatry
Depression rating scales. A critical review
Arch Gen Psychiatry
Scales for assessment of diagnosis and severity of mental disorders
Acta Psychiatr Scand Suppl
Conceptualization and rationale for consensus definitions of terms in major depressive disorder. Remission, recovery, relapse, and recurrence
Arch Gen Psychiatry
The definition and operational criteria for treatment outcome of major depressive disorder. A review of the current research literature
Arch Gen Psychiatry
A rating scale for depression
J Neurol Neurosurg Psychiatry
Development of a rating scale for primary depressive illness
Br J Soc Clin Psychol
Sensitivity of the six-item Hamilton Depression Rating Scale
Acta Psychiatr Scand
Examining symptom expression as a function of symptom severity: item performance on the Hamilton Rating Scale for Depression
Psychol Assess
The Hamilton Depression Rating Scale: has the gold standard become a lead weight?
Am J Psychiatry
Standardizing the Hamilton Depression Rating Scale: past, present, and future
Eur Arch Psychiatry Clin Neurosci
ECDEU Assessment manual for psychopharmacology. DHEW publication (ADM) 76-338
Handbook of psychiatric measures
Cited by (84)
Effectiveness of Behavioral and Pharmacologic Interventions for Depressive Symptoms After Spinal Cord Injury: Findings From a Systematic Review
2024, Archives of Physical Medicine and RehabilitationDouble trouble: Do symptom severity and duration interact to predicting treatment outcomes in adolescent depression?
2020, Behaviour Research and TherapyCitation Excerpt :TADS used Clinical Global Impression (CGI) ratings as a categorical outcome (i.e., response/non-response). However, given that prior research has cast doubts on the reliability/validity of CGI ratings (Ruhé, Dekker, Peen, Holman, & De Jonghe, 2005), we used the self-reported Reynolds Adolescent Depression Scale (RADS) (Reynolds, 1987, 2004) as our outcome. The RADS is a 30-item self-reported depression severity scale that was completed at baseline, mid-treatment (i.e., week 6), and end of treatment (i.e., week 12).
Early improvement in HAMD-17 and HAMD-6 scores predicts ultimate response and remission for depressed patients treated with fluoxetine or ECT
2019, Journal of Affective DisordersTemperament traits and remission of depression: A naturalistic six-month longitudinal study
2019, Journal of Affective Disorders
- ☆
Conflicts of interest. External funding did not support these post hoc analyses.