Vol. 48. Issue 2.
Pages 80-87 (April - June 2019)
Original article
DOI: 10.1016/j.rcpeng.2019.03.003
Inter-rater reliability in videos of patients with a suspected diagnosis of autism and child psychosis
Concordancia entre observadores de videos de pacientes con sospecha diagnóstica de autismo y psicosis infantiles
Martha Isabel Jordán-Quinteroa,b,
Corresponding author.
, Catalina Ayala Corredorc, José Francisco Cepeda Torresd, Carolina Porras Chaparroe, Virginia Coromoto Sánchez Arenasf
a Departamento de Psiquiatría y Salud Mental, Facultad de Medicina, Pontificia Universidad Javeriana, Bogotá, Colombia
b Instituto de Ortopedia Infantil Roosevelt, Bogotá, Colombia
c Sanitas, Bogotá, Colombia
d Clínica El Prado, Armenia, Colombia
e Hospital San Camilo, Bucaramanga, Colombia
f Clínica Nuestra Señora de la Paz, Bogotá, Colombia
Table 1. Assessment instrument. Psychosis: inter-rater reliability.
Table 2. Quantitative results.
Diagnosing and treating autism and child psychoses is very difficult; these pathologies impact not only the child's life but also the family as a whole. Therefore caution is required when giving a diagnosis with prognostic implications. We use the psychodynamic perspective in order to take into consideration both pathological and healthy diagnostic aspects, and to promote evolutionary potential of each patient.


To determine the inter-rater reliability and to test quantitatively and qualitatively the variables involved in the diagnoses of patients with suspected autism and child psychoses, based upon psychodynamic concepts.


An inter-rater reliability study was carried on, based upon the diagnostic evaluation of videos of patients with suspected autism or child psychoses who attended the diagnostic meeting (observation session) at the Instituto de Ortopedia Infantil Roosevelt.


Kappa values were obtained, ranging from .24 to .50, with a reliability force varying from slight to moderate, and κ=.24 for personality organisation. This type of diagnosis takes into account both the pathological and healthy aspects which make up personality organisation. Finally, limitations and aspects that should be considered in further studies were discussed.


The results reinforce the need to evaluate a child with major disorders in an interdisciplinary team, and but a single observer, in order to allow discussion, and debate, and therefore, avoid partial readings of the patient’ s psychological functioning.

Child psychosis
Interdisciplinary diagnosis
Inter-rater reliability

Dados la complejidad del abordaje y el tratamiento del autismo y las psicosis infantiles y el impacto que tienen en la vida del niño y su familia, es fundamental ser cautos a la hora de hacer un diagnóstico que tiene implicaciones pronósticas. Se utiliza la perspectiva psicodinámica para acceder a la posibilidad de incluir en el diagnóstico aspectos tanto patológicos como sanos y promover el potencial evolutivo de cada paciente.


Determinar la concordancia entre evaluadores y evaluar cuantitativa y cualitativamente las variables que influyen en el diagnóstico de pacientes con sospecha de autismo y psicosis infantiles desde la perspectiva psicodinámica.


Se realizó un estudio de concordancia entre observadores basado en la evaluación diagnóstica de videos de pacientes con sospecha de autismo o psicosis infantiles que asisten a junta diagnóstica (sesión de observación) en el Instituto de Ortopedia Infantil Roosevelt.


Se obtuvieron valores de kappa que oscilaron en general entre 0,24 y 0,50, con una fuerza de concordancia que varió de leve a moderada y κ=0,24 para la organización de personalidad. Esta manera de diagnosticar toma en consideración los aspectos tanto patológicos como saludables que constituyen la organización de personalidad. Al final se discuten las limitaciones y los puntos que tener en cuenta en los estudios posteriores.


Se recomienda que evalúe al niño con perturbaciones mayores un equipo interdisciplinario, y no un solo observador, pues este espacio favorece la discusión, que es crucial a la hora de establecer un diagnóstico y evita lecturas parciales del funcionamiento psíquico del paciente que consulta.

Palabras clave:
Psicosis infantiles
Diagnóstico interdisciplinario
Concordancia interobservador
There are various perspectives for understanding the behaviour, development and psychopathology of humans, not just from a medical point of view but also from psychological, anthropological, social, pedagogical and philosophical points of view, among others.

From the medical perspective, classifications such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) and the mental disorders chapter of the International Classification of Diseases (ICD) have been created.

In child and adolescent psychiatry, the difference in perspectives is even more evident, as the diagnoses can change as the child grows and, to a greater or lesser extent, all approaches must take environmental and family influences into account.

Concepts exist that are inherent to childhood and adolescence, such as evolutive disharmony, that mean it is essential to consider the whole person, both the healthy and pathological aspects, as well as the underlying organisation, also taking into account object relations and attachment.

Given this heterogeneity, different ways of conceiving and approaching psycho-emotional development, behaviour and childhood psychopathology have come to exist. The French Classification for Child and Adolescent Mental Disorders, 2000 revised version (CFTMEA-R 2000), a specific classification for understanding psychiatric disorder in children and adolescents, is inspired by psychodynamics and uses a multi-dimensional perspective that gives room to the various currents in psychiatry. Diagnosis using this classification carried out in line with the guideline, which offers psychopathological clinical reflection, i.e. both symptoms and the place they occupy in the subject's psychic economy are taken into account. It is a biaxial classification, in which the axis I establishes the major categories, which correspond to the organisation of the basic personality (psychosis, neurosis, borderline personality disorder, reactive disorders), followed by secondary categories, which establish the comorbidity, where applicable. Axis II covers organic factors and environmental factors and conditions. This classification also includes an Infant Axis, which allows early detection in children at risk of developing pathological organisation and focuses on the nature of the bond with the adult carer.

Given the complexity of psychological phenomena when assessing severely affected children, the views of various professionals and discussion in order to integrate the concepts necessary for a full understanding of the phenomenon observed are essential. In the Instituto de Ortopedia Infantil Roosevelt [Roosevelt Institute of Paediatric Orthopaedics], an interdisciplinary mechanism, the “Juntas de autismo y psicosis infantiles-sesiones de observación” [autism and child psychosis committees—observation sessions] has been formed over time, made up of professionals in child psychiatry, psychology, neuropsychology, occupational therapy and speech therapy, in order to diagnose and propose therapeutic approaches for these patients. This mechanism has been used to form, corroborate, specify or rule out diagnoses of autism, child psychosis and severe psychological disorders linked to alterations in early interactions and somatic conditions.

In Colombia, there are no published studies to date and, to our knowledge, no studies currently being conducted that assess inter-rater reliability in similar clinical scenarios or diagnostic medical committees for child psychosis. Likewise, a systematic review of the literature revealed no such reliability studies with a psychoanalytical basis. The Universidad Javeriana's line of research in child psychosis is a result of the need to transmit and bring to light the experiences of the current team in observation sessions and the importance of discussion to reach a diagnosis that will change the child's life and have major prognostic implications on their quality of life and environment. In the psychodynamic model, qualitative research predominates. This study is a first attempt at a quantitative approximation that will enable a dialogue between professionals with different focuses in child mental health, in a common language.


A study of diagnostic tests to assess inter-rater reliability on diagnosis and personality organisation through videos of patients with suspected autism or child psychosis. With this aim, the team designed and applied an instrument based on psychodynamic concepts to enable a diagnostic approximation to be made and the variables to be taken into account when assessing patients to be homogenised.

Design and application of the instrument

The assessors and authors of this article were familiarised with the methodology used in the observation sessions conducted at the Instituto de Ortopedia Infantil Roosevelt. Taking this past experience as a reference,1 the authors established by consensus the theoretical considerations for the team to take into account in the diagnostic approach.2–4 Thus, six domains were established (positions according to M. Klein, self or self image, transitionality, object relations/attachment, transference/countertransference and anxiety), and from these, eight questions were formulated that would compose the questionnaire, corresponding to the inquiries made by each of the observers in order to locate their patient in one of the organisations5 (major category of axis I) according to CFTMEA-R 2000.6

Selecting and editing the videos

The observation sessions at the Instituto de Ortopedia Infantil Roosevelt are made up of two parts: the first is clinical and has three phases, the child's interaction with the parents or carers, then the parents’ departure and the child's response to this, and finally the one-to-one interaction of the child with a member of the observing team and the end of the session. The second part is the discussion allowing clinical and psychopathological reflection among the team members (Fig. 1).

Fig. 1.

Diagram of observation sessions.


The institute has a database of videos corresponding to observation sessions from April 2013; on the study's start date there were 48 videos. Before beginning the session, the parents and/or carers are informed that a video of the session will be recorded and used for clinical purposes, research studies and teaching, always respecting the confidentiality of the patient's identity and personal data. The informed consent of the parents or legal guardians was obtained at the time of the assessment for all videos used in this study.

A total of 10 videos were selected for convenience: one collaborator, a resident in psychiatry, performed the selection, with the objective of none of the observers being familiar with the audiovisual material or medical history prior to the assessment. The collaborator excluded videos in which the usual structure of the observation sessions was not followed because another modality, known as a therapeutic consultation, was used due to the needs of the patients or their family.36–39 The first 10 videos making up the initial study sample were selected in this manner.

The videos had a duration of approximately 2h, so it was decided, by consensus among the researchers, to edit them so that each item to be observed had a duration of 20min, distributed as follows: 5min from the patient's entry, 5min before the parents’ departure, the 5min following this and, finally, the 5min prior to the end of the first part of the session. This editing was performed by the psychiatry resident.

Viewing the videos

With the material ready for analysis, the group of five researchers met on five occasions to view the videos and ensure that the variables relating to timing, the projection method of the video and sound were homogeneous for all members of the team. The maximum number of videos observed per session was three, in order to reduce fatigue and emotional burden, which could bias the assessment. Each of the researchers applied the instrument individually immediately after viewing each video. The appraisals of each could only be shared once all instruments had been completed. Each observer was assigned a number, which remained the same throughout the study, so that the statistician was blinded when analysing the data.

Compiling and analysing the data obtained

A database with the information collected was created in Excel. To facilitate the subsequent process of analysing the data, the information was differentiated by question and each response option was assigned a category. The kappa statistic was calculated for each question and the organisation. The kappa statistic used allowed reliability to be assessed for κ categories and n raters; for questions 7 and 8 (Table 1), for which multiple responses were allowed, reliability was calculated for each category (see “Results”). The statistical analysis of the information was performed using the Stata version 13 program (StataCorp.; College Station, Texas, USA).

Table 1.

Assessment instrument. Psychosis: inter-rater reliability.

Id.  Item assessed  Question  Response options 
1Positions according to M. Klein7,8Which position did the patient adopt the majority of the time?Paranoid-schizoid9 
Early depressive position10 
Established depressive position10 
Not clear from what I have seen 
2SelfDoes the patient feel like a whole person?Yes 
3Transitionality11–16Does the child have the capacity to play?Yes 
4Object relations17–22Do you consider the child to be relating to:Whole objects 
Part objects 
Not relating 
5Transference/countertransference23,24Do you consider the parents to be whole people to the child?Yes 
6Transference/countertransference23,24Does the child relate to the assessor as a whole person?25,26,20,27–30Yes 
7Anxiety31,32What type of anxiety is caused? (you can choose more than one response)Psychotic (non-integration, fusion/diffusion, persecutory, invasion, falling, annihilation) 
Depressive or precastration (“I was bad to the breast”, separation) 
Neurotic33 (castration) 
8AnxietyWho feels the anxiety? (you can choose more than one response)The patient 
Family member(s) 
*Organisation6,34What organisation do you consider the child to have?Psychotic 
Category 5 (mental deficiency) as principal 
Consequence of alteration of the mother–baby dyad 
In 100% of cases where the rater selected “Paranoid-schizoid position” as the response to question 1, they concluded that the child's psychological organisation (according to CFTMEA-R 2000) was psychotic.

In 100% of cases where the rater selected “Depressive position” as the response to question 1, they concluded that the child's psychological organisation was neurotic.

When the option “Established depressive position” was selected, this corresponded to a neurotic organisation in 100% of cases.

There is a 90% correlation in responses 2, 5 and 6, regardless of whether the responses are positive or negative. This indicates that, if the child has a clear self-image, they also recognise others as whole people.

In this vein, a 100% correlation was found between a negative response to question 2 and responding that the anxiety is psychotic in question 7. A child who does not feel integrated as a whole person is overwhelmed by psychotic-type anxiety (Table 2).

Table 2.

Quantitative results.

Assessment question  κ 
1. M. Klein position  0.24 
2. Patient as whole person  0.45 
3. Capacity for play  0.04 
4. The child relates to…  0.48 
5. Parents as whole people  0.46 
6. The child relates to the assessor as a whole person  0.50 
7. What type of anxiety is caused?
Psychotic anxiety  0.39 
Depressive anxiety  0.18 
Neurotic anxiety  0.11 
8. Who feels the anxiety?
Patient  0.32 
Family member  0.28 
Assessor  0.11 
Observer  0.26 
*Personality organisation  0.29 

As mentioned in the “Introduction”, the systematic literature search performed as part of this study found no studies of inter-rater reliability in autism or child psychosis diagnoses made using instruments with a psychoanalytical basis.

A study conducted by Mahoney et al.40 in 1998 found that inter-rater reliability using DSM-IV criteria varied significantly between categories and between diagnostic subtypes. The level of reliability was established with the kappa value (κ=0.67) among the raters, with a reliability of 91% for differentiating patients with pervasive developmental disorder (PDD) from those without. In spite of this, the level of inter-rater reliability varied between the different subtypes of PDD. In 2000, Klin et al.41 published a field test of the DSM-IV diagnostic criteria for autism with the participation of 977 clinical professionals in different areas and with different levels of training. They assessed 131 cases in 13 locations in North America, 4 in Europe and 4 in the Middle East, Asia and Oceania. They found inter-rater reliability for all raters in the clinical diagnosis of autism versus no PDD (κ=0.95) and autism versus other PDDs (κ=0.81), while the level of reliability was merely good when comparing autism and non-autism PDDs (κ=0.65). On the other hand, this study found that there were variations in the levels of reliability when comparing diagnoses made using the DSM-IV criteria between experienced and inexperienced clinicians. While the level of reliability reached κ=0.84 and an observed reliability of 98% among experienced clinicians, among inexperienced clinicians these figures were κ=0.59 and 80%. It is also worth noting that, on examining the results obtained, it was found that experienced clinicians’ diagnoses made without using the DSM-IV criteria had a better level of inter-rater reliability than when using said criteria (κ=0.94).

Subsequently, with the development of the DSM-V diagnostic classification, in 2012 Huertas et al.42 conducted a study assessing the specificity and sensitivity of the DSM-V diagnostic criteria for autism spectrum disorders (ASD) and the DSM-IV criteria for PDD. Although this is not a study designed specifically to detect inter-rater reliability, it is designed in such a way that diagnoses made by experienced clinicians (psychologists and psychiatrists) according to the DSM-IV criteria were assessed by the authors in view of the DSM-V criteria, finding that 91% of patients with a PDD diagnosis under DSM-IV were identified as patients with ASD according to the DSM-V criteria. There are many studies such as the one just cited, conducted to establish the sensitivity and specificity of the diagnostic criteria proposed for ASD in the DSM-V; however, no studies designed specifically to determine inter-rater reliability in the diagnosis of ASD were found.

Although the criteria used to define a diagnosis were not specified, the study conducted by Van Daalen et al.43 in 2009, which assessed inter-rater reliability and stability of the diagnosis of ASD in children aged between 14 and 42 months, was included. This study found a level of inter-rater reliability for ASD of κ=0.74 and a reliability of 87% among psychiatrist raters. In spite of these high levels of reliability, these diagnoses were made using instruments such as the Autism Diagnostic Observation Schedule (ADOS) and the Autism Diagnostic Interview (ADI) but did not use a set diagnostic classification.

Finally, Garrido44 states that the work of Manzano has included various studies that seek to determine the reliability of the structural psychopathological assessment in terms of the CFTMEA6 using an instrument derived from Anna Freud's psychological profile. Garrido states that these studies have obtained an inter-rater reliability of between 85% and 95%; however, it was not possible to access the study described or the methodology used, in spite of contacting the author.

As a research team, we are of the opinion that the studies reviewed should be given consideration, but cannot be compared with this study, as its theoretical basis is different.

Having compiled the data, the team met for the discussion and qualitative analysis of the information obtained from the instruments. Among the points in common for all raters was the tendency of the team to always attempt to evoke the child's healthiest moments when responding to questions, even though they were instructed that the response should reflect what occurred for the majority of the time. We consider only viewing 20min of each video to be a limitation, as it eliminated key moments for the formulation of the diagnosis.

The team were more likely to have the same responses when applying the instrument to patients with psychotic organisation, which raises the question of whether this is due to the severity of this diagnosis and the more florid and constant symptoms, which facilitate understanding via video.

Regarding the interpretation of the quantitative results, the kappa values obtained tend to be low. The team is of the view that this situation may be due to the fact that diagnosing such a complex clinical phenomenon through a video is very different from the in-person experience of a committee. Neither the influence of countertransference of equal magnitude nor the possibility of feedback and a dialogue with the patient and their family are present, and the clinician does not participate in all of the psychological manoeuvres that take place in a diagnostic session and is not able to complement their view of the child with that of other assessors. The committee format avoids diagnoses being guided by each observer's blind spots.

Seeking to further justify the point above, the team considered it necessary to compare the results obtained in this study with the clinical diagnosis given by the observation session committee, which serves as a reference standard for the diagnosis. On performing this comparison, it was found that the study results matched those of the committee for patients with nuclear autism. However, in patients with autistic-type functioning in response to severe anaclitic-type depression (autistic functioning but not organisation), the inter-rater reliability was low and there was some discrepancy with the diagnoses made by the committee. We believe this to have occurred because very early depression truncates development, producing secondary autistic traits.

Another diagnosis that generated lower Kappa values and greater discrepancy with the committee's diagnosis was borderline personality disorder. In this disorder, both psychotic and neurotic elements coexist, so the results of individual assessment can range from one extreme to the other.

This study looked at reliability among the individual criteria of raters. All observers were child psychiatrists and used an instrument that allowed some responses to the lines of questioning formulated by each rater to be captured objectively. It is not a scale and does not replace each rater's complex diagnostic process. The question of whether low reliability can also be attributed to a fault in the instrument designed remains unresolved. The finding of low inter-rater reliability confirms the need for a committee to diagnose children with suspected autism and child psychosis. The 100% reliability achieved in committees (reference standard) is the result of discussion among the members of an interdisciplinary team (not only psychiatrists). It is indispensable to underline the fact that discrepancies among individual criteria are welcome, and are the main argument for seeking a broad and comprehensive view of the various aspects of psychological functioning that can be present in a developing subject living in a particular context. Thus, we propose personalised treatment that takes into account both the diagnosis and the specific characteristics of the individual and their family.


The diagnosis of a child with complex problems affecting various aspects of their personality must be made by a multidisciplinary team. Nuclear clinical approaches (fragmented, without integration) favour partial comprehension, while a group mechanism allows each to fill in the gaps, complement and shed light on individual blind spots.

The importance of round-table discussions of the diagnostic considerations of each member of a multidisciplinary team for the comprehension of the patient in question's psychological functioning, considering both the pathological and health aspects making up their organisation, is therefore evident.

In-person experience of transference/countertransference is essential to the diagnosis made by clinicians with a psychodynamic focus; it therefore played a fundamental role in the design of the instrument in this study. Nevertheless, each of the assessors expressed the difference in these phenomena as assessed via a video or in person. With the results of this study we can confirm that the observation of a patient using video can lead to a correct diagnosis in cases of psychotic organisation, but the same is not true of borderline-type organisation or severe depressive syndromes. In the Colombian clinical reality, this result supports the possibility of remote consultations (telemedicine) with health professionals to identify which patients need to be seen by a committee and which do not.

This study gives rise to a line of research for new studies to compare inter-rater reliability during in-person and video sessions, a second study on the same group of pairs to assess the same videos several years later and another to compare the diagnostic sensitivity of the CFTMEA 2012 and DSM-V.

Ethical responsibilitiesProtection of people and animals

The authors declare that no experiments were performed on humans or animals for this research.

Data confidentiality

The authors declare that they have followed the protocols implemented in their place of work regarding the publication of patient data.

Right to privacy and informed consent

The authors declare that no patient data appear in this article.

Conflicts of interest

The authors have no conflicts of interest to declare.


To Dr Liliana Betancourt, Child Psychiatrist, Instituto de Ortopedia Infantil Roosevelt. To Dr Juan Omar Carrillo, Clinician, Resident in General Psychiatry, Pontificia Universidad Javeriana. To Dr Diana Carolina Poveda, Clinical Psychologist, Instituto de Ortopedia Infantil Roosevelt. To Dr Fabian Gil, Statistician, Master in Biostatistics, Associate Lecturer, Department of Clinical Epidemiology and Biostatistics, Pontificia Universidad Javeriana. To Dr Nathalie Tamayo, Psychiatric Clinician, Specialist in Liaison Psychiatry, student of the Master in Clinical Epidemiology, Pontificia Universidad Javeriana. To Dr Carlos Gómez-Restrepo, Psychiatric Clinician, Specialist in Liaison Psychiatry, Psychoanalyst, MSc in Clinical Epidemiology, Tenured Professor and Director of the Department of Clinical Epidemiology and Biostatistics, Pontificia Universidad Javeriana.

