Concordancia intra- e interevaluadores

Campo-Arias, Adalberto; Herazo, Edwin

doi:10.1016/S0034-7450(14)60261-4

Información del artículo

Resumen

Bibliografía

Descargar PDF

Estadísticas

Resumen

Introducción

En psiquiatría, los estudios de concordancia intra e interevaluador son importantes para medir la confiabilidad o reproducibilidad de las evaluaciones (entrevistas o escalas heteroaplicadas).

Objetivo

Presentar algunos principios sobre el proceso de validación de entrevistas diagnósticas o escalas heteroaplicadas y el manejo y comprensión de las pruebas estadísticas más útiles para estos fines.

Método

Revisión de literatura.

Resultados

Se entiende por concordancia el grado de acuerdo o de desacuerdo entre las evaluaciones hechas a un mismo sujeto de forma sucesiva por parte de un evaluador o entre dos o más entrevistadores. Este proceso es de la validación de instrumentos, ya sea para identificar posibles casos o confirmar la presencia de un trastorno mental. En la concordancia interevaluador, dos o más psiquiatras realizan una entrevista de manera independiente y casi simultánea a una persona y así se puede estimar el grado de acuerdo, convergencia o concordancia (o lo contrario) entre las evaluaciones y los consiguientes diagnósticos. La concordancia intraevaluador es el grado de acuerdo en el diagnóstico que tiene en el tiempo un mismo evaluador. La prueba kappa de Cohen se usa para estimar la concordancia y se esperan, por lo general, valores superiores a 0,50; pero es necesario conocer la prevalencia esperada del trastorno mental, el número de evaluadores o evaluaciones y el número de categorías o casillas diagnósticas posibles.

Palabras clave:

psicometría

escalas

reproducibilidad de resultados

estudios de validación

revisión

Abstract

Introduction

Intra- and inter-rater concordance studies are important in order to measure the reliability or the reproducibility of evaluations (interviews or scales applied by a rater) in psychiatry.

Objective

To present some principles regarding the validation process of diagnostic interviews or scales applied by a rater, and regarding the handling and comprehension of more useful statistical tests.

Method

Review of literature.

Results

Concordance is understood as the grade of agreement or disagreement among evaluations made to the same subject successively by an evaluator or among two or more interviewers. This process is part of the validation of instruments, scale reliability, in order to identify possible cases or to confirm the presence of a mental disorder. Inter-rater concordance refers to the case when two or more psychiatrists realize an interview independently and almost simultaneously to a person; this can help to estimate the grade of agreement, convergence or concordance (and disagree, divergence or discordance) among the evaluations and the consequent diagnostics. Intra-rater concordance is the grade of agreement on the diagnosis made by the same rater in different times. Cohen's kappa is used to estimate concordance, and values higher than 0.50 are expected in general. To reliably estimate Cohen's kappa is necessary to know previously the expected prevalence of mental disorder, the number of evaluations or raters, and the number of possible diagnosis categories.

Key words:

Psychometrics

scales

reproducibility of results

validation studies

review

El Texto completo está disponible en PDF

Referencias

[1]

SG Acton, JJ Zodda.

Classification of psychopathology. Goals and methods in an empirical approach.

Theory Psychol, 15 (2005), pp. 373-399

[2]

LH Rogler.

Making sense of historical changes in the diagnostic and statistical manual of mental disorders: five propositions.

J Health Soc Behav, 38 (1997), pp. 9-20

Medline

[3]

F Páez, H Nicolini.

Las entrevistas para el diagnóstico clínico en psiquiatría.

Salud Mental, 19 (1996), pp. 19-25

[4]

I Calinou, J McClellan.

Diagnostic interviews.

Cur Psychiatry Rep, 6 (2004), pp. 88-95

[5]

WW Eaton, AL Hall, R MacDonald, J McKibben.

Case identification in psychiatric epidemiology: a review.

Int Rev Psychiatry, 19 (2007), pp. 497-507

http://dx.doi.org/10.1080/09540260701564906 | Medline

[6]

JL Carrasco, L Jover.

Métodos estadísticos para evaluar la concordancia.

Med Clin (Barc), 122 (2004), pp. 28-34

[7]

AM Alarcón, S Muñoz.

Medición en salud: Algunas consideraciones metodológicas.

Rev Med Chile, 136 (2008), pp. 125-130

http://dx.doi.org//S0034-98872008000100016 | Medline

[8]

E Othmer, SC Othmer.

DSM-IV-TR. La entrevista clínica. Fundamentos. Tomo I, Masson, (2001),

[9]

R Sánchez, J Echeverry.

Validación de escalas de medición en salud.

Rev Salud Pública, 6 (2004), pp. 302-318

[10]

K Pearson.

Determination of the coefficient of correlation.

Science, 30 (1909), pp. 23-25

http://dx.doi.org/10.1126/science.30.757.23 | Medline

[11]

C Spearman.

Correlation calculated from faulty data.

Br J Psychol, 3 (1910), pp. 271-295

[12]

PE Shrout, JL Fleiss.

Intraclass correlations: uses in assessing rater reliability.

Psychol Bull, 86 (1979), pp. 420-428

Medline

[13]

L Lin.

A concordance correlation coefficient to evaluate reproducibility.

Biometrics, 45 (1989), pp. 255-268

Medline

[14]

JM Bland, DG Altman.

Statistical methods for assessing agreement between two methods of clinical measurement.

Lancet, 1 (1986), pp. 307-310

Medline

[15]

Y Lecrubier.

Refinement of diagnosis and disease classification in psychiatry.

Eur Arch Psychiatry Clin Neurosci, 258 Suppl 1 (2008), pp. 6-11

http://dx.doi.org/10.1007/s00406-007-1003-0 | Medline

[16]

HW Neighbors, SJ Trieweiler, BC Ford, JR Muroff.

Racial differences in DSM diagnosis using a semi-structured instrument: The importance of clinical judgment in the diagnosis of African Americans.

J Health Soc Behav, 44 (2003), pp. 237-256

Medline

[17]

RK Rielgelman, RP Hirsch.

Definición de enfermedad: la prueba de oro.

Bol Of Sanit Panam, 111 (1991), pp. 534-538

[18]

JA Knottnerus, C van Weel, JWM Muris.

Evaluation of diagnostic procedures.

BMJ, 324 (2002), pp. 477-480

Medline

[19]

F López-Jiménez, LEF Rohde, MA Luna-Jiménez.

Problemas y soluciones en la interpretación de pruebas diagnósticas.

Rev Invest Clin, 50 (1998), pp. 65-72

Medline

[20]

MA Castro-Jiménez, D Cabrera-Rodríguez, MI Castro-Jiménez.

Evaluación de tecnologías diagnósticas: conceptos básicos en un estudio con muestreo transversal.

Rev Colomb Obstet Ginecol, 58 (2007), pp. 45-52

[21]

GA Morgan, JA Gliner, RJ Harmon.

Measurement validity.

J Am Acad Child Adolesc Psychiatry, 40 (2001), pp. 729-731

http://dx.doi.org/10.1097/00004583-200106000-00019 | Medline

[22]

JM Bland, DG Altman.

Validating scales and indexes.

BMJ, 324 (2002), pp. 606-607

Medline

[23]

J Ludbrook.

Statistical techniques for comparing measurers and methods of measurements: a critical review.

Clin Exp Pharmacol Physiol, 29 (2002), pp. 527-536

Medline

[24]

MW Watkins, M Pacheco.

Interobserver agreement in behavioral research: importance and calculation.

J Behav Educ, 10 (2000), pp. 205-212

[25]

HC Kramer, VS Periyakoil, A Noda.

Kappa coefficients in medical research.

Stat Med, 21 (2002), pp. 2109-2129

http://dx.doi.org/10.1002/sim.1180 | Medline

[26]

C Ibáñez, C Maganto.

El proceso de evaluación clínica: cogniciones del evaluador.

Summa Psicol UST, 6 (2009), pp. 81-99

[27]

T McGinn, PC Wyer, TB Newmann, S Keitz, R Leipzig, GG For, et al.

Tips for learners for evidence-based medicine: 3. Measures of observer variability (kappa statistic).

CMAJ, 171 (2004), pp. 1369-1373

http://dx.doi.org/10.1503/cmaj.1031981 | Medline

[28]

HE Álvarez-Martínez, E Pérez-Campos.

Utilidad clínica de la tabla 2×2.

Rev Eviden Invest Clin, 2 (2009), pp. 22-27

[29]

K-M Colimon.

Programa de estudio y programa de control.

Fundamentos de epidemiología, 3ª edición, pp. 123-124

[30]

N Landa, A Goñi, E García de Jalón, JJ López-Goñi.

Concordancia en el diagnóstico entre pediatra y salud mental.

An Sist Sanit Navar, 32 (2009), pp. 161-168

Medline

[31]

M Conradsson, L Lundin-Olsson, N Lindelöf, H Littbrand, L Malmqvist, Y Gustafson, et al.

Berg Balance Scale: Intrarater test-retest reliability among older people dependent in activities of daily living and living in residential care facilities.

Phys Ther, 87 (2007), pp. 1155-1163

http://dx.doi.org/10.2522/ptj.20060343 | Medline

[32]

CC Lin, YM Bai, CY Liu, MC Hsiao, JY Chen, SJ Tsai, et al.

Web-based tools can be used reliably to detect patients with major depressive disorder and subsyndromal depressive symptoms.

BMC Psychiatry, 7 (2007), pp. 12

http://dx.doi.org/10.1186/1471-244X-7-12 | Medline

[33]

C Schuster.

Kappa as a parameter of a symmetry model for rater agreement.

J Educ Behav Stat, 26 (2001), pp. 331-342

[34]

HX Barnhart, J Song, MJ Haber.

Assessing intra, inter and total agreement with replicated readings.

Stat Med, 24 (2005), pp. 1371-1384

http://dx.doi.org/10.1002/sim.2006 | Medline

[35]

J Cohen.

A coefficient of agreement for nominal scales.

Educ Psychol Meas, 20 (1960), pp. 37-46

[36]

J Cohen.

Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.

Psychol Bull, 70 (1968), pp. 213-220

Medline

[37]

Epi-Info 3.5.1, Centers for Disease Control and Prevention (CDC), (2008),

[38]

PAWS 18.0, SPSS. Inc, (2009),

[39]

SAS 9, SAS Institute Inc., (2009),

[40]

STATA 11 for windows, StataCorp LP, (2009),

[41]

AN Herrera, C Quintero, R Sanchez.

Algunas estadísticas de uso frecuente en investigación en salud (1ª parte).

Rev Colomb Anest, 26 (1998), pp. 225-232

[42]

VM Montori, J Kleinbart, TB Newman, S Keitz, PC Wyer, V Moyer, et al.

Measures of precision (confidence intervals).

CMAJ, 171 (2004), pp. 611-615

http://dx.doi.org/10.1503/cmaj.1031667 | Medline

[43]

E Cepeda-Cuervo, W Aguilar, V Cervantes, M Corrales, I Díaz, D Rodríguez.

Intervalos de confianza e intervalos de credibilidad para una proporción.

Rev Colomb Estat, 31 (2008), pp. 211-228

[44]

V Abraira.

El índice kappa.

Semergen, 27 (2000), pp. 247-249

[45]

T McGinn, PC Wyer, TB Newmann, S Keitz, R Leipzig, G Guyatt, et al.

Understanding and calculating kappa. CMAJ [Internet]. 2004 [citado 2010 Ene 26];171(11):1-9.

Disponible en: www.cmaj.ca/cgi/data/171/11/1369/DC1/1

[46]

MS Cepeda, A Pérez.

Estudios de concordancia, pp. 287-301

[47]

AR Feinstein, DV Cicchetti.

High agreement but low kappa: I. The problems of two paradoxes.

J Clin Epidemiol, 43 (1990), pp. 543-549

Medline

[48]

T Byrt, J Bishop, JB Carlin.

Bias, prevalence and kappa.

J Clin Epidemiol, 46 (1993), pp. 422-429

[49]

DV Cicchetti, AR Feinstein.

High agreement but low kappa: II. Resolving the paradoxes.

J Clin Epidemiol, 43 (1990), pp. 551-558

Medline

[50]

DL Streiner.

Learning how to differ: agreement and reliability statistics in psychiatry.

J Can Psychiatry, 40 (1995), pp. 60-66

[51]

I Guggenmoos-Holzmann.

The meaning of kappa: Probabilistic concepts of reliability and validity revisited.

J Clin Epidemiol, 49 (1996), pp. 775-782

Medline

[52]

AB Cantor.

Sample-size calculations for Cohen's Kappa.

Psychol Methods, 1 (1996), pp. 150-153

[53]

DL Streiner.

Diagnosing tests: Using and misusing diagnostic and screening tests.

J Pers Assess, 81 (2003), pp. 209-219

http://dx.doi.org/10.1207/S15327752JPA8103_03 | Medline

[54]

A Flahault, M Cadilhac, G Thomas.

Sample size calculation should be performed for design accuracy in diagnostic test studies.

J Clin Epidemiol, 58 (2005), pp. 859-862

http://dx.doi.org/10.1016/j.jclinepi.2004.12.009 | Medline

[55]

HC Kramer, DA Bloch.

A note on case-control sampling to estimate kappa coefficients.

Biometrics, 46 (1990), pp. 49-59

Medline

[56]

MG Scotto, AT Garcés.

Interpretando correctamente en salud pública estimaciones puntuales, intervalos de confianza y contrates de hipótesis.

Salud Publica Mex, 45 (2003), pp. 505-511

Conflicto de interés: los autores manifiestan que no tienen ningún conflicto de interés en este artículo.

Indexada en:

Síguenos:

Indexada en:

Síguenos:

Suscríbase a la newsletter