Questions to Ask Regarding Validity of a Reading

J Gen Intern Med. 2001 Sep; 16(ix): 606–613.

The PHQ-9

Validity of a Cursory Depression Severity Measure

Kurt Kroenke

¹Received from the Regenstrief Institute for Wellness Intendance and Department of Medicine, Indiana University, Indianapolis, Ind

Robert L Spitzer

²The New York State Psychiatric Institute and Department of Psychiatry, Columbia University, New York, NY

Janet B W Williams

²The New York Land Psychiatric Institute and Department of Psychiatry, Columbia University, New York, NY

Abstract

OBJECTIVE

While considerable attention has focused on improving the detection of depression, cess of severity is as well of import in guiding treatment decisions. Therefore, we examined the validity of a cursory, new measure out of low severity.

MEASUREMENTS

The Patient Health Questionnaire (PHQ) is a self-administered version of the PRIME-Dr. diagnostic instrument for common mental disorders. The PHQ-9 is the depression module, which scores each of the nine DSM-4 criteria as "0" (not at all) to "three" (almost every day). The PHQ-9 was completed by 6,000 patients in 8 main care clinics and 7 obstetrics-gynecology clinics. Construct validity was assessed using the xx-item Short-Grade General Wellness Survey, self-reported ill days and dispensary visits, and symptom-related difficulty. Criterion validity was assessed against an independent structured mental health professional (MHP) interview in a sample of 580 patients.

RESULTS

As PHQ-9 low severity increased, there was a substantial decrease in functional status on all 6 SF-20 subscales. Likewise, symptom-related difficulty, sick days, and wellness care utilization increased. Using the MHP reinterview as the criterion standard, a PHQ-9 score ≥x had a sensitivity of 88% and a specificity of 88% for major depression. PHQ-9 scores of 5, 10, 15, and twenty represented mild, moderate, moderately severe, and severe low, respectively. Results were like in the principal care and obstetrics-gynecology samples.

Conclusion

In improver to making criteria-based diagnoses of depressive disorders, the PHQ-9 is also a reliable and valid measure of depression severity. These characteristics plus its brevity make the PHQ-9 a useful clinical and research tool.

Keywords: depression, diagnosis, screening, psychological tests, wellness status

Depression is one of the most prevalent and treatable mental disorders and is regularly seen by a wide spectrum of health care providers, including mental health specialists, medical and surgical subspecialists, and principal care clinicians. There are a number of case-finding instruments for detecting low in principal care, ranging from ii to 28 items in length.ⁱ ^, ² Typically, these can exist scored as continuous measures of low severity and also have established cutting points to a higher place which the probability of major low is substantially increased. Scores on these diverse measures tend to exist highly correlated,ⁱⁱⁱ and it is not evident that any 1 measure is superior to the others.¹ ^, ² ^, ⁴

The Patient Wellness Questionnaire (PHQ) is a new musical instrument for making criteria-based diagnoses of depressive and other mental disorders commonly encountered in chief care. The diagnostic validity of the PHQ has recently been established in 2 studies involving iii,000 patients in 8 chief care clinics and 3,000 patients in vii obstetrics-gynecology clinics.⁵ ^, ^six At 9 items, the PHQ depression scale (which we phone call the PHQ-9) is half the length of many other depression measures, has comparable sensitivity and specificity, and consists of the actual 9 criteria upon which the diagnosis of DSM-IV depressive disorders is based. The latter feature distinguishes the PHQ-9 from other "2-step" depression measures for which, when scores are high, additional questions must exist asked to establish DSM-4 depressive diagnoses. The PHQ-nine has the potential of being a dual-purpose instrument that, with the same 9 items, can establish depressive disorder diagnoses every bit well as course depressive symptom severity. In this paper, we analyze data regarding the PHQ-9 to address 3 major questions:

What is the reliability and efficiency of the PHQ-nine in clinical practice?
What are the operating characteristics (sensitivity and specificity) of the PHQ-nine as a diagnostic instrument for depressive disorders?
What is the construct validity of the PHQ-nine as a low severity measure in relation to functional condition, disability days, and wellness intendance utilization?

METHODS

Description of the PHQ and PHQ-nine

The Patient Health Questionnaire (PHQ) is a 3-folio questionnaire that can be entirely self-administered by the patient.⁵ The clinician scans the completed questionnaire, verifies positive responses, and applies diagnostic algorithms that are abbreviated at the bottom of each folio. The PHQ assesses 8 diagnoses, divided into threshold disorders (disorders that correspond to specific DSM-4 diagnoses: major depressive disorder, panic disorder, other anxiety disorder, and bulimia nervosa), and subthreshold disorders (disorders whose criteria encompass fewer symptoms than are required for any specific DSM-IV diagnoses: other depressive disorder, probable alcohol abuse/dependence, somatoform, and binge eating disorder).

The PHQ-nine (Appendix) is the 9-item depression module from the full PHQ. Major depression is diagnosed if 5 or more of the 9 depressive symptom criteria have been present at least "more than than half the days" in the past 2 weeks, and 1 of the symptoms is depressed mood or anhedonia. Other depression is diagnosed if 2, iii, or four depressive symptoms take been present at to the lowest degree "more than one-half the days" in the past 2 weeks, and i of the symptoms is depressed mood or anhedonia. I of the 9 symptom criteria ("thoughts that y'all would be better off dead or of hurting yourself in some way") counts if present at all, regardless of elapsing. Equally with the original PRIME-MD, earlier making a final diagnosis, the clinician is expected to rule out physical causes of low, normal bereavement, and history of a manic episode.

Every bit a severity measure, the PHQ-9 score can range from 0 to 27, since each of the 9 items can be scored from 0 (not at all) to 3 (nearly every day). An detail was also added to the end of the diagnostic portion of the PHQ-9 asking patients who checked off whatsoever problems on the questionnaire: "How hard take these problems fabricated it for you lot to do your piece of work, take intendance of things at home, or go along with other people?"

PHQ Report Samples and Procedures

From May 1997 to November 1998, 3,890 patients, eighteen years or older, were invited to participate in the PHQ Principal Care Report.^five There were 190 who declined to participate, 266 who started but did not complete the questionnaire (oftentimes because there was inadequate time before seeing their doc), and 434 whose questionnaires were not entered into the data set up because the equivalent of approximately 1 folio (20 items) was not completed. This resulted in the 3,000 primary care patients reported hither (one,422 from 5 full general internal medicine clinics and 1,578 from 3 family practice clinics). From May 1997 to March 1999, 3,636 patients, xviii years or older, were approached to participate in the PHQ Obstetrics-Gynecology (Ob-Gyn) Study.^six There were 245 patients who declined to participate, 127 who started just did non consummate the questionnaire, and 264 whose questionnaires were not entered into the data set because the equivalent of approximately i page was not completed. This resulted in the 3,000 subjects from seven obstetrics-gynecology (ob-gyn) sites. All sites used one of 2 discipline selection methods to minimize sampling bias: either consecutive patients for a given clinic session or every north th patient until the intended quota for that session was achieved. Patient characteristics are summarized in Tabular array ane. Too being entirely women, the ob-gyn sample had a younger average age, more Hispanic subjects, lower boilerplate pedagogy, and less medical comorbidity.

Table 1

Characteristics of Patients in the PHQ Main Care and Obstetrics-gynecology Studies

Patient Characteristic	Study 1 PHQ Primary Intendance	Study 2 PHQ Ob-gyn
Subjects, N	3,000	three,000
Established dispensary patient, %	fourscore	71
Mean age, y ±SD	46 ± 17	31 ± eleven
Women, %	66	100
Race, %
White	79	39
African American	thirteen	15
Hispanic	four	39
Marital condition, %
Married	48	52
Never married	23	33
Divorced/separated/widowed	29	xv
Education, %
College graduate	27	sixteen
Partial college	27	25
High school graduate only	33	32
Less than high school	xiii	27
Medical conditions, %
Hypertension	25	2
Arthritis	11	ane
Diabetes	8	one
Pulmonary	7	2

A total of 62 physicians participated in the PHQ Principal Care Study (21 general internal medicine and 41 family unit exercise [19 of who were family practise residents]). Their hateful age was 37 years (standard deviation [SD], 6.5), and 63% were male person. A total of xl physicians and 21 nurse practitioners participated in the PHQ Ob-Gyn. Their mean historic period was 39 years (SD, 8.nine), and 48% were male.

Earlier seeing the medico, all patients completed the PHQ. Additionally, they completed the Medical Outcomes Study Brusk-Form General Health Survey (SF-20).⁷ The SF-20 measures functional condition in 6 domains (all scores from 0 to 100; 100=best wellness). Also, patients estimated the number of medico visits and disability days during the by 3 months.

Mental Health Professional (MHP) Validation Interviews

To determine the agreement of PHQ diagnoses with those of MHPs, midway through the PHQ Master Care Study, a MHP (a PhD clinical psychologist or 1 of 3 senior psychiatric social workers) attempted to interview by telephone all subsequently entered subjects who had a phone, agreed to exist interviewed, and could exist contacted inside 48 hours. All except 1 site participated in these validation interviews. The MHP was blinded to the results of the PHQ. The rationale and further details of the MHP telephone interview, which used the overview from the SCID^viii and diagnostic questions from the Prime-Dr., are described in the original PRIME-Physician report.⁹ To examine test-retest reliability, the MHP graded the 9 Prime-Doc questions assessing DSM-Four symptoms using the same four response options every bit the PHQ-9 (i.east., not at all, several days, more than half the days, well-nigh every 24-hour interval).

The 580 subjects who had a MHP interview inside 48 hours of completing the PHQ were, within each site, similar to patients not reinterviewed in terms of demographic profile, functional condition, and frequency of psychiatric diagnoses. Agreement between the PHQ diagnoses and the MHP diagnoses was examined. One modification from the original Prime-MD algorithm was necessary. The number of criteria required for diagnosing major depressive disorder could remain the same every bit in DSM-4, i.e., 5 of nine during the by two weeks. However, considering the PHQ response set up was expanded from the simple "yes/no" in the original PRIME-Dr. to 4 frequency levels, lowering the PHQ threshold from "virtually every twenty-four hour period" to "more than half the days" raised the sensitivity from 37% to 73% while maintaining loftier specificity (94%).

Analysis

For virtually analyses, the PHQ-nine score was divided into the post-obit categories of increasing severity: 0–4, 5–nine, 10–14, fifteen–19, and xx or greater. These categories were chosen for several reasons. The kickoff was businesslike, in that the cutting points of five, x, 15, and 20 are elementary for clinicians to think and apply. The 2d reason was empiric, in that using unlike cut points did not noticeably change the associations between increasing PHQ-9 severity and measures of construct validity.

For analyses assessing the operating characteristics of diverse PHQ-nine intervals or cut points, diagnostic status (major depressive disorder, other depressive disorder, or no depressive disorder) was that assigned by the independent MHP structured psychiatric interview. The latter is considered the criterion standard and provides the most conservative estimate of the operating characteristics of the PHQ-9 score. Also calculating sensitivity and specificity of the PHQ-9 over various intervals, nosotros also determined likelihood ratios¹⁰ and conducted ROC curve analysis¹¹ equally quantitative methods for combining sensitivity and specificity into a unmarried metric.

Construct validity of the PHQ-9 as a measure of depression severity was assessed by examining functional status (the half-dozen SF-20 scales), disability days, symptom-related difficulty, and wellness care utilization (clinic visits) over the 5 PHQ-9 intervals. Assay of covariance was used, with PHQ-9 category as the independent variable and adjusting for age, gender, race, teaching, written report site, and number of physical disorders. Bonferroni'southward correction was used to adapt for multiple comparisons.

RESULTS

Reliability and Efficiency of the PHQ-9

The internal reliability of the PHQ-nine was excellent, with a Cronbach's α of 0.89 in the PHQ Primary Care Study and 0.86 in the PHQ Ob-Gyn Study. Examination-retest reliability of the PHQ-9 was also fantabulous. Correlation between the PHQ-9 completed by the patient in the clinic and that administered telephonically by the MHP within 48 hours was 0.84, and the mean scores were nearly identical (five.08 vs five.03).

In 85% of cases clinicians required less than three minutes to review responses on the total iii-page PHQ,⁵ which consists of v modules and 28 to 58 items (depending upon the number of skip-outs). Although time to review the PHQ depression items was not measured separately, it is unlikely this took more than than a minute, since the PHQ-9 includes less than ane third of the items contained in the full PHQ.

Distribution of PHQ-9 Scores According to Depression Diagnostic Status

Table 2 shows the distribution of PHQ-9 scores according to low diagnostic status in the 580 patients interviewed by a mental health professional who was blinded to the PHQ-ix results. The mean PHQ-9 score was 17.1 (SD, 6.1) in the 41 patients diagnosed past the MHP as having major depression; ten.4 (SD, 5.4) in the 65 patients diagnosed as other depressive disorder; and 3.iii (SD, three.8) in the 474 patients with no depressive disorder. The vast bulk of patients (93%) with no depressive disorder had a PHQ-9 score less than 10, while most patients (88%) with major depression had scores of x or greater. Scores less than v almost always signified the absence of a depressive disorder; scores of 5 to 9 predominantly represented patients with either no depression or subthreshold (i.eastward., other) low; scores of 10 to xiv represented a spectrum of patients; and scores of 15 or greater usually indicated major depression.

Table 2

Distribution of PHQ-9 Scores According to Low Diagnostic Condition^*

	Major Depressive Disorder (Due north = 41)	Other Depressive Disorder (Northward = 65)	No Depressive Disorder (N = 474)
Level of Low Severity, PHQ-9 Score	n (%)	n (%)	n (%)
Minimal, 0–four	one (two.4)	8 (12.three)	348 (73.4)
Mild, v–9	4 (9.eight)	23 (35.4)	93 (19.six)
Moderate, 10–fourteen	8 (19.5)	17 (26.one)	23 (4.9)
Moderately severe, 15–19	fourteen (34.1)	14 (21.five)	8 (i.vii)
Severe, 20–27	14 (34.1)	three (iv.half-dozen)	2 (0.4)

Criterion Validity of PHQ-ix Assessed by Mental Health Professional Interview

Considering PHQ-ix scores in the 10 to 15 range appear to correspond an important "gray zone," nosotros conducted a more detailed examination of the operating characteristics of diverse cut points in this range. Table 3 displays the sensitivity, specificity, and likelihood ratios for different PHQ-nine thresholds in diagnosing major depression in the 580 patients who had a MHP interview. For case, a patient with major depression is 6 times more likely than a patient without major depression to have a PHQ-9 score of 9 or greater and 13.6 times more than likely to have a score of xv or greater. In this sample with a 7% prevalence of major low (41 out of 580 patients), the positive predictive value for major low ranged from 31% for a PHQ-9 cutting point of ix to 51% for a cut indicate of 15.

Table 3

Operating Characteristics of Various PHQ-nine Cutpoints for Diagnosing Major Low^*

PHQ-9 Depression Score	Sensitivity (%)	Specificity (%)	Likelihhod Ratio
≥ix	95	84	6.0
≥10	88	88	vii.1
≥11	83	89	seven.viii
≥12	83	92	10.2
≥xiii	78	93	11.i
≥14	73	94	12.0
≥fifteen	68	95	xiii.6

Examination of likelihood ratios further confirmed the substantial clan between increasing PHQ-ix scores and the likelihood of major low. The positive likelihood ratios of PHQ-ix scores of 0–4, five–9, x–fourteen, xv–xix, and twenty–27 for major depression were 0.04, 0.5, ii.6, 8.4, and 36.eight, respectively. Interpretation of these likelihood ratios means that, for example, a PHQ-9 score in the 0–iv ranges is only 0.04 (i.due east., ane/25) times equally probable in a patient with major low compared to a patient without major depression, while a score of 10 to 14 is 2.half-dozen times as probable and a score of 15 to 19 is 8.iv times every bit likely. The positive likelihood ratio of these same 5 PHQ-nine intervals for any depression (i.e., major or other depressive disorder) was 0.12, 1.iii, 4.9, fifteen.7, and 38.0, respectively.

ROC analysis showed that the area under the bend for the PHQ-9 in diagnosing major low was 0.95, suggesting a test that discriminates well betwixt persons with and without major depression. The expanse under the bend for the five-particular mental wellness calibration of the SF-xx was 0.93.

Construct Validity of PHQ-9 Assessed by Functional Status and other Measures

Equally shown in Tabular array 4, in that location was a strong association between increasing PHQ-9 depression severity scores and worsening role on all 6 SF-twenty scales. Several findings should be noted. First, results were essentially the same for both the chief intendance and obstetrics-gynecology samples. Second, the monotonic decrease in SF-20 scores with increasing PHQ-nine scores were greatest for the scales that previous studies have shown should exist most strongly related to low, i.e., mental wellness, followed by social, overall, and role operation, with a lesser relationship to pain and physical functioning.¹² Tertiary, most pairwise comparisons within each SF-20 scale betwixt successive PHQ-9 levels were highly meaning.

Tabular array 4

Relationship Betwixt PHQ-ix Depression Score and SF-20 Health-related Quality of Life Scales^*

	Hateful (95% CI) SF-twenty Scale Score
	Mental		Social		Role		General		Pain		Physical
Level of Low Severity, PHQ-ix Score	Primary Care	Ob-gyn	Chief Care	Ob-gyn	Primary Intendance	Ob-gyn	Primary Care	Ob-gyn	Primary Care	Ob-gyn	Primary Care	Ob-gyn
Minimal, 1–4	81 (80 to 82)	81 (80 to 82)	92 (91 to 93)	91 (90 to 92)	86 (84 to 88)	88 (87 to xc)	70 (69 to 71)	75 (73 to 76)	66 (65 to 68)	73 (72 to 74)	83 (81 to 83)	86 (85 to 87)
Mild, v–9	65 (64 to 66)	66 (64 to 67)	77 (75 to 79)	81 (79 to 83)	63 (60 to 66)	77 (74 to 79)	50 (48 to 52)	57 (55 to 58)	52^a (50 to 54)	59^a (57 to 61)	69 (67 to 71)	76^a (74 to 77)
Moderate, x–14	51 (fifty to 53)	53 (51 to 55)	65 (62 to 68)	75^a (72 to 78)	53^a (49 to 58)	64^a (threescore to 69)	40^a (37 to 43)	48 (45 to 51)	49^a (45 to 52)	53^a,b(50 to 57)	63^a(60 to 66)	74^a(71 to 77)
Moderately severe, xv–19	43 (forty to 45)	45 (42 to 48)	55 (51 to 59)	68^a (63 to 72)	42^a (36 to 48)	64^a,b (57 to 71)	33^a,b (29 to 37)	40^a (35 to 44)	45^a,b (41 to 50)	fifty^b (45 to 55)	57^a,b (53 to 61)	74^a (69 to 78)
Severe, twenty–27	29 (25 to 31)	35 (31 to 39)	40 (35 to 44)	50 (43 to 56)	27 (20 to 35)	48^b (39 to 58)	27^b (22 to 31)	xxx^a (24 to 36)	40^b (35 to 45)	46^b (40 to 53)	53^b (48 to57)	56 (l to 62)

Figure one illustrates graphically the human relationship between increasing PHQ-9 scores and worsening functional status. Decrements in SF-xx scores are shown in terms of effect size, which is the divergence in mean SF-20 scores, expressed as the number of standard deviations, between each PHQ-9 interval subgroup and the reference group. The reference grouping is the group with the lowest PHQ-9 scores (i.e., 0–4), and the standard departure used is that of the unabridged sample. Upshot sizes of 0.5 and 0.8 are typically considered moderate and large between-group differences, respectively.¹³ Effigy 1 shows effect sizes for the master care sample; results for the obstetrics-gynecology sample (non displayed) were similar.

An external file that holds a picture, illustration, etc. Object name is jgi_01114_f1.jpg

Relationship between depression severity as measured past the PHQ-9 and decline in functional status every bit measured by the 6 subscales of the SF-xx. The decrement in SF-20 scores are shown as the difference betwixt each PHQ-9 severity group and the nondepressed reference group (i.e., those with PHQ-nine scores of 0 to four). Event size is the deviation in group ways divided by the standard departure of the entire sample.

When the PHQ-9 was examined as a continuous variable, its force of association with the SF-twenty scales was concordant with the pattern seen in Figure 1. The PHQ-9 correlated virtually strongly with mental wellness (0.73), followed past general health perceptions (0.55), social functioning (0.52), part operation (0.43), physical functioning (0.37), and actual pain (0.33).

Table 5 shows the association between PHQ-ix severity levels and 3 other measures of construct validity: cocky-reported disability days, clinic visits, and the general amount of difficulty patients attribute to their symptoms. Greater levels of depression severity were associated with a monotonic increase in disability days, health-care utilization, and symptom-related difficulty in activities and relationships. When the PHQ-9 was examined equally a continuous variable, its correlation was 0.39 with disability days, 0.24 with medico visits, and 0.55 with symptom-related difficulty.

Table 5

Human relationship Between PHQ-nine Depression Severity Score and Disability Days, Symptom-related Difficulty, and Clinic Visits

	Mean Disability Days (95% CI)^*		Symptom-related Difficulty (%)^†		Mean Physician Visits (95% CI)^*
Level of Depression Severity, PHQ-9 Score	Principal Care	Obstetrics-gynecology	Primary Intendance	Obstetrics-gynecology	Primary Care	Obstetrics-gynecology
Minimal, i–four	two.4 (1.vii to 3.ane)	2.2 (1.7 to two.7)	i.5	0.6	1.0 (0.9 to one.1)	0.9^a (0.8 to 1.0)
Mild, 5–9	six.vii (5.5 to seven.8)	v.eight (4.ix to 6.half-dozen)	10.2	4.viii	i.8^a (i.six to 2.0)	0.9^a (one.0 to 1.4)
Moderate, 10–14	xi.4 (9.5 to xiii.1)	9.9^a (8.four to 11.iii)	24.4	16.8	ii.0^a (ane.7 to ii.four)	1.3^a (i.0 to ane.6)
Moderately severe, 15–19	16.6 (14.1 to 19.0)	10.8^a (8.vi to 13.0)	45.1^a	36.0	2.4^a (one.ix to 2.8)	2.iii^b (ane.8 to 2.8)
Severe, 20–27	28.1 (25.2 to 31.0)	thirteen.8^a (10.viii to 16.vii)	57.ane^a	56.6	three.7 (3.2 to four.two)	two.3^b (1.7 to 3.0)

Considering our sample was relatively young and unduly female, nosotros examined the influence of age and gender in several ways. Commencement, simple correlations between PHQ-9 score and measures of construct validity were similar when examined separately for women and men, while correlations were somewhat lower just however highly significant in patients 65 years and older compared to younger individuals. Second, analysis of covariance results showed age had an independent and weak effect on only one outcome (SF-20 physical functioning), while gender had no independent issue.

The single particular assessing difficulty that the patients attributed to their depressive symptoms correlated strongly with impairment as measured by the SF-xx subscales, particularly those domains known to be nearly affected by mental disorders. Correlations of the single symptom-related difficulty item with the SF-20 scales in the primary care sample were 0.53 for mental wellness, 0.42 for general health perceptions, 0.40 for social functioning, 0.38 for role performance, 0.27 for bodily hurting, and 0.27 for physical performance. Although slightly lower in the obstetrics-gynecology sample, correlations showed a similar rank guild.

DISCUSSION

Data from our 2 studies totaling 6,000 patients provide stiff evidence for the validity of the PHQ-nine as a brief measure of depression severity. Criterion validity was demonstrated in the sample of 580 main care patients who underwent an contained reinterview past a mental health professional. Construct validity was established by the strong association betwixt PHQ-ix scores and functional condition, disability days, and symptom-related difficulty. External validity was achieved past replicating the findings from the 3,000 primary care patients in a second sample of iii,000 obstetrics-gynecology patients. Indeed, the like results seen in rather dissimilar patient populations suggests our PHQ-9 findings may exist generalizable to outpatients seen in a variety of clinic settings.

Our assay of the total range of PHQ-9 scores complements rather than supercedes the validated PHQ-9 algorithm for establishing categorical diagnoses. However, as the PHQ-nine is increasingly used as a continuous measure of depression severity, it volition be helpful to know the probability of a major or subthreshold depressive disorder at various cutting points. PHQ-9 scores of 5, x, 15, and 20 represent valid and easy-to-remember thresholds demarcating the lower limits of balmy, moderate, moderately astringent, and severe depression. In item, scores less than ten seldom occur in individuals with major depression while scores of 15 or greater usually signify the presence of major low. In the "gray zone" of 10 to 14, increasing PHQ-ix scores are associated, equally expected, with increasing specificity and declining sensitivity. Nonetheless, the operating characteristics of the PHQ-ix displayed at various cutting points in Table 2 compare favorably to 9 other instance-finding instruments for depression in primary care which have an overall sensitivity of 84%, a specificity of 72%, and a positive likelihood ratio of 2.86.¹ Likewise, the positive predictive value of the PHQ-9 (ranging from 31% to 51% depending upon the cutting indicate) is similar to other instruments; of note, predictive value is related not only to a measure out'south sensitivity and specificity merely also the prevalence of depressive disorders.

The i depression measure that was used concurrently with the PHQ-9 in our subjects was the 5-item mental health calibration of the SF-xx, also known every bit the Mental Health Inventory (MHI-5). PHQ-nine scores were strongly correlated with MHI-5 scores in our subjects (Table 4 and Figure 1). Berwick et al. used ROC assay to make up one's mind how well the MHI-five and several other measures discriminated between patients with and without major depression.¹⁴ In their study, the area under the bend (AUC) was 0.89 for the MHI-5, 0.90 for the longer MHI-eighteen, 0.89 for the thirty-item Full general Health Questionnaire, and 0.80 for the 28-item Somatic Symptom Inventory. In our study, the AUC for major depression was 0.95 for the PHQ-ix and 0.93 for the MHI-5. It is unlikely that other depression-specific measures would be significantly improve than the PHQ-nine since an AUC of 1.0 represents a perfect test.

A particularly important characteristic of a severity measure is its sensitivity to change over fourth dimension. In other words, how precisely practise failing or rising scores on the measure reflect improving or worsening depression in response to constructive therapy or natural history? Although an exhaustive review of depression measures is beyond the telescopic of this newspaper merely tin can be plant elsewhere,⁴ ^, ¹² a brief discussion of selected measures is warranted. The Hamilton Rating Scale for Low has been the criterion standard outcome measure out in clinical trials, just it tin crave 15 to 30 minutes of clinician fourth dimension to administrate and is therefore not feasible in many practise settings. The HAM-D is also rather complicated to score and requires substantial training in order to get reasonable inter-rater understanding. The Montgomery-Asberg Low Rating Scale is about half equally long as the HAM-D and probably only equally sensitive to modify.¹⁵ ^, ¹⁶ Like the HAM-D, however, the Montgomery-Asberg calibration must be administered by a clinician with special preparation and withal is moderately fourth dimension intensive. Several cocky-administered scales—the 21-item Beck Depression Inventory and the xx-particular Zung Cocky-Rating Depression Scale—as well have been used as effect measures merely may be somewhat less sensitive to change than the HAM-D.¹⁷ The SCL-20 has been used equally an upshot measure in master care clinical trials,¹⁸ ^– ²⁰ although published evidence on its sensitivity to change as well as other psychometric characteristics is express. Epidemiological and clinical studies have established the 20-item CES-D as a valid measure for identifying low, just there is less information regarding its sensitivity to change.

In summary, there appear to exist many comparable measures for identifying depression,^one ^, ⁱⁱ ^, ^four ^, ¹² including a number of self-administered scales. In contrast, it is less articulate what the optimal measure for monitoring response to treatment may be, especially outside the setting of a clinical trial. Sensitivity to change is clearly a necessary feature, merely other businesslike considerations include the number of items, time required for completion, fashion of administration (self-rating vs interviewer-administered calibration), complexity of scoring, inter-rater agreement, and special preparation requirements. The specific items included in the scale are another gene. One advantage of the PHQ-9 is its exclusive focus on the 9 diagnostic criteria for DSM-IV depressive disorders. On the other hand, some may argue that instruments including symptoms non in the DSM-IV criteria (eastward.yard., loneliness, hopelessness, and anxiety) may have boosted value to the clinician. At the same fourth dimension, it is possible that such scales are less specific for major depression and other mood disorders and may discriminate less accurately depression from anxiety or even general psychological distress.

The major limitation of our report is its cross-sectional pattern. While our large sample establishes the construct and criterion validity of the PHQ-9, longitudinal studies are needed to establish its sensitivity to modify. This will require the completion of several large ongoing clinical trials using the PHQ-9 in parallel with the HAM-D or other established outcome measures. Information technology will too be useful to define the threshold that represents an adequate clinical response. A preliminary approach would be to consider a PHQ-9 score less than 10 and a 50% decline from the pretreatment score as clinically significant improvement. While any proposed threshold requires prospective verification, this approach would be consistent with that established for the HAM-D. Other study limitations are that validation was based on telephone rather than face up-to-face interviews and the time for patients to complete the PHQ-9 was not adamant.

Detecting depression and initiating treatment are necessary simply often insufficient steps to improve outcomes in primary care.²¹ Monitoring clinical response to therapy is too critical. Multiple studies have shown that monitoring is often inadequate, resulting in clinician failure to detect medication noncompliance, increase the antidepressant dosage, change or broaden pharmacotherapy, or add psychotherapy as needed.²¹ ^, ²² Having a simple cocky-administered measure to complete either in the clinic or by phone administration (e.thou., nurse assistants²³ or interactive voice recording²⁴) would relieve clinicians the fourth dimension needed to inquire about the presence and severity of each of the nine DSM-IV symptoms to assess outcomes.

Brief measures are more likely to be used in the decorated setting of clinical do. For case, many practitioners have found it more feasible to utilize the 4-item Muzzle questionnaire than a number of longer alcohol screening measures. Of note, as few every bit 1 or ii questions have demonstrated a high sensitivity in screening for major depression.ⁱⁱ ^, ²⁵ Brevity is just as likely to exist a valued attribute when it comes to assessing depression severity as it is when establishing depressive diagnoses. Brevity coupled with its construct and criterion validity makes the PHQ-ix an attractive, dual-purpose instrument for making diagnoses and assessing severity of depressive disorders. If the PHQ-nine proves sensitive to change in clinical trials, it could also be a useful measure for monitoring outcomes of depression therapy.

Acknowledgments

The development of the PHQ-9 was underwritten past an educational grant from Pfizer US Pharmaceuticals, New York, NY. Prime-Md is a trademark of Pfizer Copyright held by Pfizer.

APPENDIX

Nine-symptom Checklist

Name ______________________ Date _________
Over the last 2 weeks, how oftentimes have you been bothered by any of the post-obit bug?	Not at all	Several days	More than half the days	Nearly every 24-hour interval
i. Little interest or pleasure in doing things	0	i	2	3
2. Feeling downwardly, depressed, or hopeless	0	1	2	3
3. Trouble falling or staying asleep, or sleeping too much	0	i	2	3
four. Feeling tired or having little free energy	0	1	2	3
five. Poor appetite or overeating	0	1	2	3
6. Feeling bad near yourself—or that you are a failure or have permit yourself or your family downward	0	1	2	3
seven. Problem concentrating on things, such as reading the newspaper or watching telly	0	1	ii	3
8. Moving or speaking so slowly that other people could have noticed? Or the reverse—being so fidgety or restless that you have been moving around a lot more than than usual	0	one	2	3
9. Thoughts that you would be better off dead or of pain yourself in some way	0	1	ii	3
(For office coding: Full Score ____ = ____ + ____ + ____)

If yous checked off any bug, how difficult have these problems made it for yous to do your piece of work, take care of things at domicile, or go along with other people?

Not difficult at all	Somewhat difficult	Very difficult	Extremely difficult
□	□	□	□

From the Chief Intendance Evaluation of Mental Disorders Patient Health Questionnaire (PRIME-MD PHQ). The PHQ was developed by Drs. Robert L. Spitzer, Janet BW Williams, Kurt Kroenke, and colleagues. For research data, contact Dr. Spitzer at ude.aibmuloc@8slr. PRIME-MD is a trademark of Pfizer Inc. Copyright 1999 Pfizer Inc. All rights reserved. Reproduced with permission

REFERENCES

1. Mulrow CD, Williams JW, Gerety MB, Ramirez G, Montiel OM, Kerber C. Case-finding instruments for low in chief intendance settings. Ann Intern Med. 1995;122:913–21. [PubMed] [Google Scholar]

2. Whooley MA, Avins AL, Miranda J, Browner WS. Case-finding instruments for depression: two questions are equally good as many. J Gen Intern Med. 1997;12:439–45. [PMC costless article] [PubMed] [Google Scholar]

3. Keller MB, Kocsis JH, Thase ME, et al. Maintenance phase efficacy of sertraline for chronic depression: a randomized controlled trial. JAMA. 1998;280:1665–72. [PubMed] [Google Scholar]

4. McDowell I, Kristjansson East, Newell C. Depression. In: McDowell I, Newell C, editors. Measuring Wellness: A Guide to Rating Scales and Questionnaires. 2nd ed. New York, NY: Oxford University Press; 1996. pp. 238–86. [Google Scholar]

5. Spitzer RL, Kroenke K, Williams JBW. Patient Health Questionnaire Report Grouping. Validity and utility of a self-written report version of Prime-Dr.: the PHQ Main Care Study. JAMA. 1999;282:1737–44. [PubMed] [Google Scholar]

vi. Spitzer RL, Williams JBW, Kroenke K, et al. Validity and utility of the Patient Health Questionnaire in assessment of 3000 obstetric-gynecologic patients: the PRIME-Medico Patient Health Questionnaire Obstetrics-Gynecology Report. Am J Obstet Gynecol. 2000;183:759–69. [PubMed] [Google Scholar]

7. Stewart AL, Hays RD, Ware JE. The MOS Short-Form General Health Survey: reliability and validity in a patient population. Med Intendance. 1988;26:724–32. [PubMed] [Google Scholar]

eight. Spitzer RL, Williams JBW, Gibbon 1000, Start MB. The structured clinical interview for DSM-III-R (SCID). Arch Gen Psychiatry. 1992;49:624–9. [PubMed] [Google Scholar]

ix. Spitzer RL, Williams JBW, Kroenke Grand, et al. Utility of a new procedure for diagnosing mental disorders in primary care: the Prime number-MD thou written report. JAMA. 1994;272:1749–56. [PubMed] [Google Scholar]

10. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology: A Basic Science for Clinical Medicine. 2nd ed. Boston, MA: Little, Brownish and Company; 1991. pp. 1–441. [Google Scholar]

eleven. Murphy JM, Berwick DM, Weinstein MC, et al. Operation of screening and diagnostic tests: application of receiver operating characteristic analysis. Arch Gen Psychiatry. 1987;44:550–5. [PubMed] [Google Scholar]

12. Pasacreta JV. Measuring depression. In: Frank-Stromborg M, Olsen SJ, editors. Instruments for Clinical Health-Care Enquiry. 2nd Ed. Sudbury, MA: Jones and Bartlett Publishers; 1997. pp. 342–630. [Google Scholar]

13. Kazis LE, Anderson JJ, Meenan RF. Outcome sizes for interpreting changes in wellness status. Med Care. 1989;27:S178–89. [PubMed] [Google Scholar]

xiv. Berwick DM, Potato JM, Goldman PA, Ware JE, Barsky AJ, Weinstein MC. Performance of a 5-item mental health screening test. Med Care. 1991;29:169–76. [PubMed] [Google Scholar]

fifteen. Montgomery SA, Asberg M. A new depression calibration designed to be sensitive to modify. Br J Psychiatry. 1979;134:382–9. [PubMed] [Google Scholar]

xvi. Davidson J, Turnbull CD, Strickland R, et al. The Montgomery-Asberg Depression Scale: reliability and validity. Acta Psychiatr Scand. 1986;73:544–viii. [PubMed] [Google Scholar]

17. Lambert MJ, Hatch DR, Kingston MD, et al. Zung, Beck, and Hamilton rating scales as measures of treatment issue: a meta-analytic comparison. J Consult Clin Psychol. 1986;54:54–nine. [PubMed] [Google Scholar]

18. Katon W, Robinson P, Von Korff M, et al. A multifaceted intervention to improve treatment of depression in master care. Arch Gen Psychiatry. 1996;53:924–32. [PubMed] [Google Scholar]

19. Katon W, Von Korff G, Lin E, et al. Collaborative management to achieve handling guidelines: bear on on depression in principal intendance. JAMA. 1995;273:1026–31. [PubMed] [Google Scholar]

20. Williams JW, Barrett J, Oxman T, et al. Handling of dysthymia and minor depression in primary intendance: a randomized controlled trial in older adults. JAMA. 2000;284:1519–26. [PubMed] [Google Scholar]

21. Kroenke Thousand, Taylor-Vaisey A, Dietrich AJ, Oxman TE. Interventions to improve provider diagnosis and handling of mental disorders in primary care: a disquisitional review of the literature. Psychosomatics. 2000;41:39–52. [PubMed] [Google Scholar]

22. Simon GE. Can depression be managed appropriately in main care? J Clin Psychiatry. 1998;59(suppl 2):3–eight. [PubMed] [Google Scholar]

23. Hunkeler EM, Meresman J, Hargreaves WA, et al. Efficacy of nurse telehealth care and peer back up in augmenting treatment of depression in primary care. Arch Fam Med. 2000;ix:700–8. [PubMed] [Google Scholar]

24. Kobak KA, Taylor LH, Dottl SL, et al. A computer-administered telephone interview to place mental disorders. JAMA. 1997;278:905–10. [PubMed] [Google Scholar]

25. Williams JW, Mulrow CD, Kroenke One thousand, et al. Example-finding for depression improves patient outcomes: results from a randomized trial in principal care. Am J Med. 1999;106:36–43. [PubMed] [Google Scholar]

whiteheadprucestras.blogspot.com

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1495268/

Questions to Ask Regarding Validity of a Reading

The PHQ-9

Kurt Kroenke

Robert L Spitzer

Janet B W Williams

Abstract

OBJECTIVE

MEASUREMENTS

RESULTS

Conclusion

METHODS

Description of the PHQ and PHQ-nine

PHQ Report Samples and Procedures

Table 1

Mental Health Professional (MHP) Validation Interviews

Analysis

RESULTS

Reliability and Efficiency of the PHQ-9

Distribution of PHQ-9 Scores According to Depression Diagnostic Status

Table 2

Criterion Validity of PHQ-ix Assessed by Mental Health Professional Interview

Table 3

Construct Validity of PHQ-9 Assessed by Functional Status and other Measures

Tabular array 4

Table 5

DISCUSSION

Acknowledgments

APPENDIX

Nine-symptom Checklist

REFERENCES

0 Response to "Questions to Ask Regarding Validity of a Reading"

Publicar un comentario

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel