Buscar en
International Journal of Clinical and Health Psychology
Toda la web
Inicio International Journal of Clinical and Health Psychology Criteria-Based Content Analysis (CBCA) reality criteria in adults: A meta-analyt...
Información de la revista
Vol. 16. Núm. 2.
Páginas 201-210 (Mayo - Agosto 2016)
Descargar PDF
Spanish PDF
Más opciones de artículo
Vol. 16. Núm. 2.
Páginas 201-210 (Mayo - Agosto 2016)
Original article
Open Access
Criteria-Based Content Analysis (CBCA) reality criteria in adults: A meta-analytic review
Criterios de realidad del Criteria-Based Content Analysis (CBCA) en adultos: una revisión meta-analítica
Bárbara G. Amadoa, Ramón Arcea,
Autor para correspondencia

Corresponding author: Departamento de Psicoloxía Organizacional, Xurídico-Forense e Metodoloxía das Ciencias do Comportamento, Facultade de Psicoloxía, Campus Vida, s/n, 15782 Santiago de Compostela, España.
, Francisca Fariñab, Manuel Vilariñoa
a Universidade de Santiago de Compostela, Spain
b Universidade de Vigo, Spain
Información del artículo
Texto completo
Descargar PDF
Tablas (3)
Table 1. Additional criteria.
Table 2. Results of global meta-analysis for individual reality criteria and total CBCA score, and additional criteria.
Table 3. Results of the meta-analysis of moderators.
Mostrar másMostrar menos

Background/Objective: Criteria-Based Content Analysis (CBCA) is the tool most extensively used worldwide for evaluating the veracity of a testimony. CBCA, initially designed for evaluating the testimonies of victims of child sexual abuse, has been empirically validated. Moreover, CBCA has been generalized to adult populations and other contexts though this generalization has not been endorsed by the scientific literature. Method: Thus, a meta-analysis was performed to assess the Undeutsch Hypothesis and the CBCA checklist of criteria in discerning in adults between memories of self-experienced real-life events and fabricated or fictitious memories. Results: Though the results corroborated the Undeutsch Hypothesis, and CBCA as a valid technique, the results were not generalizable, and the self-deprecation and pardoning the perpetrator criteria failed to discriminate between both memories. The technique can be complemented with additional reality criteria. The study of moderators revealed discriminating efficacy was significantly higher in filed studies on sexual offences and intimate partner violence. Conclusions: The findings are discussed in terms of their implications as well as the limitations and conditions for applying these results to forensic settings.

Criteria-Based Content Analysis

Antecedentes/Objetivo: El Criteria-Based Content Analysis (CBCA) constituye la herramienta mundialmente más utilizada para la evaluación de la credibilidad del testimonio. Originalmente fue creado para testimonios de menores víctimas de abuso sexual, gozando de amparo científico. Sin embargo, se ha generalizado su práctica a poblaciones de adultos y otros contextos sin un aval de la literatura para tal generalización. Método: Por ello, nos planteamos una revisión meta-analítica con el objetivo de contrastar la Hipótesis Undeutsch y los criterios de realidad del CBCA para conocer su potencial capacidad discriminativa entre memorias de eventos auto-experimentados y fabricados en adultos. Resultados: Los resultados confirman la hipótesis Undeutsch y validan el CBCA como técnica. No obstante, los resultados no son generalizables y los criterios auto-desaprobación y perdón al autor del delito no discriminan entre ambas memorias. Además, se encontró que la técnica puede ser complementada con criterios adicionales de realidad. El estudio de moderadores mostró que la eficacia discriminativa era significativamente superior en estudios de campo en casos de violencia sexual y de género. Conclusiones: Se discute la utilidad, así como las limitaciones y condiciones para la transferencia de estos resultados a la práctica forense.

Palabras clave:
Criteria-Based Content Analysis
Texto completo

The credibility of a testimony, primarily the victim's and in particular in relation to crimes committed in private (e.g., sexual offenses, domestic violence), is the key element determining legal judgements (Novo & Seijo, 2010), affecting an estimated 85% of cases worldwide (Hans & Vidmar, 1986). Though an array of tools for evaluating credibility have been designed and tested (Vrij, 2008), Criteria-Based Content Analysis [CBCA] (Steller & Köhnken, 1989) remains the technique of choice, enjoys wide acceptance among the scientific community (Amado, Arce, & Fariña, 2015), and is admissible as valid evidence in the law courts of in several countries (Steller & Böhm, 2006; Vrij, 2008). Though the technique was initially designed to be applied to the testimony of victims of child abuse sexual, its application has been extended to adults, witnesses, offenders, and other case types by Forensic Psychology Institutes in judicial proceedings (Arce & Fariña, 2012). The meta-analysis of Amado et al. (2015) found that the technique underpinning the Undeutsch Hypothesis (Undeutsch, 1967) that contends that memories of self-experienced events differ in content and quality to memories of fabricated or fictitious accounts, was equally valid in other contexts and age ranges up to the age of 18 years. Prior to the present review, empirical studies had already contrasted the validity of the Hypothesis in adult populations and in different contexts (Vrij, 2005, 2008). Moreover, as the Hypothesis was grounded on memory content, it had been theoretically advanced that the Hypothesis would be equally applicable to adults and contexts different to sexual abuse (Berliner & Conte, 1993).

CBCA consists of 19 reality criteria which are grouped into two factors: cognitive (criteria 1 to 13), and motivational (criteria 14 to 18). According to the original formulation, both factors are underpinned by the Undeutsch Hypothesis, but Raskin, Esplin, and Horowitz (1991) have underscored that only 14 conform to the aforementioned Hypothesis (14-criteria version).

CBCA has encompassed additional categories, some applicable to all contexts (Table 1) (Höfer, Köhnken, Hanewinkel, & Bruhn, 1993), and others for specific cases (Arce & Fariña, 2009; Juárez, Mateu, & Sala, 2007; Volbert & Steller, 2014), which may be combined with other techniques with diverse theoretical underpinnings such as memory attributes (Vrij, 2008).

Table 1.

Additional criteria.

• Reporting style (is long-winded when interviewee described irrelevant aspects that were not asked). 
• Display insecurities (uncertainty about the description of an item). 
• Providing reasons lack memory (express reasons for not being able to give a detailed description). 
• Clichés (expressions or utterances that introduce delays into the report). 
• Repetitions (elements already described were repeated without additional details). 

CBCA is extensively used in forensic practice as a tool for discriminating the memories of adults of self-experienced and fabricated events in different case types. However, due to the numerous inconsistencies in the literature (e.g., designs failing to meet the requirements for applying CBCA, conclusions of non-significant effects not substantiated by the data given the poor statistical power of the studies, 1-β<.80), and the contradictory use of CBCA in adults, a meta-analysis was performed to assess the Undeutsch Hypothesis in an adult population; the discriminating efficacy of CBCA and additional reality criteria; and the effect of the context (case type), lie coaching effect, witness status, and the research paradigm.

MethodLiterature search

An extensive scientific literature search was undertaken to identify empirical studies applying content analysis to adult testimony in order to discriminate between self-experienced and fabricated statements, be they deliberately invented or implanted memories. The literature search consisted of a multimethod approach to meta-search engines (Google, Google Scholar, Yahoo); world leading scientific databases (PsycInfo, MedLine, Web of Science, Dissertation Abstracts International); academic social networks for the exchange of knowledge in the scientific community (i.e., Researchgate, Academia.edu); ancestry approach (crosschecking the bibliography of the selected studies); and contacting researchers to request unpublished studies mentioned in published studies. A list of descriptors was generated for successive approximations (i.e., the descriptors of the keywords in the selected articles were included): reality criteria, content analysis, verbal cues, verbal indicators, testimony, CBCA, Criteria Based Content Analysis, credibility, adult, statement, allegation, deception, detection, lie detection, truthful account, Statement Validity Assessment, SVA. These descriptors were used to formulate the search algorithms applied to the literature search.

Inclusion and exclusion criteria

Though reality criteria are mainly applied in judicial contexts to ensure a victim's testimony is admitted as valid evidence, a review of the literature reveals they have been also applied to both witnesses and offenders so both populations were included as the studies were numerically sufficient for performing a meta-analysis. The concept of adult in the judicial context is associated with being 18 years of age, and the vast majority of studies endorsed this legal age; notwithstanding, in a few studies the legal age was set at 17 years. Since this difference in age has no effect on the capacity to give testimony either on cognitive or legal grounds, the studies with 17-year-old adult populations were included. The inclusion criteria for primary studies were that the effect sizes of the reality criteria analysed for discriminating between truthful and fabricated statements were reported, or in their absence, the statistical data allowing for them to be computed, including studies with errors in data analysis that nonetheless enabled the effect sizes to be computed.

The exclusion criteria were data derived from a unit of analysis which was not the statement, or when two CBCA criteria were combined into one new criterion (failing the ‘mutual’ exclusion requirement for creating methodic categorical systems). As for the additional criteria, data that were not formulated as additional to CBCA or were specific to only one context were excluded. Likewise, the duplicate publication of data was eliminated, but not the piecemeal (independent data).

Finally, 39 primary studies fulfilling the inclusion and exclusion criteria were selected. Total CBCA score was calculated using 31 effect sizes, whereas as for the individual criteria, the effect sizes ranged from 5 for criteria 10 and 19, to 35 for criteria 3 and 8.


The procedure observed the stages in meta-analysis by Botella and Gambara (2006). Having performed the literature search and selected the studies for the present meta-analysis, these were coded according to variables that have been found to have a moderating role i.e., previous studies (Fariña, Arce, & Real, 1994; Höfer et al., 1993; Raskin et al., 1991; Volbert & Steller, 2014; Vrij, 2005); previous meta-analysis with a child population (Amado et al., 2015); the research paradigm (field vs. experimental studies) under the US law of precedence (Daubert v. Merrell Dow Pharmaceuticals, 1993); compliance with the Daubert standard publication criterion (DSPC) i.e., peer-reviewed journals for evidence to be admitted as scientifically valid legal evidence; the lie coaching condition in reality criteria; and the version of the categorical system (full reality criteria vs. 14-criteria version). Having applied a procedure of successive approximations for the coding of the primary studies (Fariña, Arce, & Novo, 2002), the following moderators were detected: status of the declarant i.e., testimony target (victim, offender, or witness); event target (self-experienced events or video-observed events/witness), judicial context i.e., case type.

As some researchers had renamed the original criteria (Steller & Köhnken, 1989), a Thurstone style evaluation was used consisting of 10 judges who evaluated the degree of overlapping between the original and reformulated criterion. When the interval between Q1 and Q3 was within the region of criteria independence it was considered additional criteria, whereas when it was in the region of dependence with the original, it was considered original criteria.

The coding of the studies and moderators carried out by two independent researchers showed total coincidence (kappa=1).

Data analysis

The effect sizes were taken directly from the primary studies when these were disclosed, or the effect size d was computed using the means, and standard deviations/standard error of the mean (Cohen's d when N1=N2 and Glass's Δ when N1N2), the t value, or the F value. When the results were expressed as proportions the effect size δ (Hedges & Olkin, 1985) was equivalent to Cohen's d, whereas when they were expressed in 2X2 contingency tables, the phi obtained was transformed into Cohen's d.

The meta-analysis was performed in accordance with the procedure of Hunter and Schmidt (2015), the unit of analysis (n) was the number of statements, the effect sizes were weighted for sample size i.e., the number of statements (dw), and effect sizes were corrected for criterion reliability (δ).

The differences between effect sizes were estimated using the difference between correlations (q statistic; Cohen, 1988), by transforming the effect sizes into correlations. In the study of moderators the average criteria for each moderator was computed.

In order to estimate the practical utility of the results of the meta-analysis in forensic settings, three recommended statistics were employed (Amado et al., 2015): U1, the Binomial Effect Size Display (BESD), and the Probability of Superiority (PS).

Criterion reliability

Not all of the primary studies provided data on inter-rater reliability, or agreement for the reality criteria and for the total CBCA score. Moreover, the informed reliability coefficients varied among studies, and in some studies, several were reported, in which case those approximating the results obtained by Anson, Golding, and Gully (1993) and Horowitz et al. (1997) were taken. Owing to the lack of data on coding reliability in studies on specific criteria, average reliability was estimated for the criteria and for the total CBCA score, bearing in mind that reliability is different for the criteria than for the instrument (Horowitz et al., 1997). Reliability was estimated on the basis of reliability coefficients, since agreement indexes do not measure reliability. Thus, on the basis of 172 reliability coefficients of CBCA criteria in the primary studies, the average reliability for CBCA reality criteria was r=.61 (EEM=.020, 95%CI=0.57, 0.65); and for the total CBCA score the Spearman-Brown prediction formula obtained an r=.97. Moreover, the average reliability for the proposed additional reality criteria was calculated using 7 reliability coefficients with an r of .74 (EEM=0.041, 95%CI=0.66, 0.82). The low average reliability observed was sometimes considered as a methodological weakness of the system. Nevertheless, this potential methodological deficiency is corrected for criteria unreliability in Hunter and Schmidt's (2015) meta-analytical procedure.

Results1Study of outliers

An analysis and initial control of outliers was carried out in each of the reality criteria, and the total CBCA score and conditions. The criterion chosen was the ±3*IQR (extreme cases) of the simple size weighted mean effect size, given that the results of more conservative criteria such as ±1.5*IQR or ±2SD, eliminated more than 10% of the effect sizes, indicating they were more probably moderators than outliers (Tukey, 1960).

Global meta-analysis of reality criteria in adults

The results (Table 2) show a positive (between criteria presence and statement reality), and significant (when the confidence interval had no zero, indicating the effect size was significant) mean true effect size (δ) for the CBCA reality criteria, with the exception of ‘self-deprecation’ and ‘pardoning the perpetrator’ criteria, and the total CBCA score. Nevertheless, these results are not generalizable (criteria 10 and 19 were affected by a second order sampling error, so the results were invalid for this estimate) to future samples (when the credibility interval had zero, indicating the effect size was not generalizable to 80% of other samples). For the additional criteria (Höfer et al., 1993), the meta-analysis revealed a positive, significant and generalizable mean true effect size for ‘reporting style’ and ‘display of insecurities’ criteria. The mean true effect size for the ‘repetitions’ criterion was negative and significant, confirming it was not reality criteria, but no generalizable. As for the ‘providing reasons for lack of memory’ and ‘clichés’ criteria, a non-significant mean true effect size was found. The criteria repetitions and clichés were related to fabricated events, that is, they were not reality criteria in themselves, so they were not included in successive analyses. The 75% rule and the credibility interval (Hunter & Schmidt, 2015) warranted the study de moderators.

Table 2.

Results of global meta-analysis for individual reality criteria and total CBCA score, and additional criteria.

CBCA Criterion  k  n  dw  SDd  SDpre  SDres  δ  SDδ  %Var  95% CId  80% CVδ 
1. Logical structure  30  2,265  0.48  0.6990  0.2503  0.6527  0.62  0.8493  13  0.40, 0.56  −0.46, 1.71 
2. Unstructured production  27  1,987  0.53  0.9241  0.2570  0.8876  0.69  1.1551  0.45, 0.61  −0.79, 2.17 
3. Quantity of details  35  2,714  0.55  0.8294  0.2529  0.7899  0.71  1.0279  0.47, 0.63  −0.60, 2.03 
4. Contextual embedding  29  2,137  0.19  0.6169  0.2372  0.5868  0.24  0.7411  15  0.11, 0.27  −0.70, 1.19 
5. Description of interactions  29  2,243  0.27  0.3742  0.2349  0.2912  0.36  0.3790  39  0.19, 0.35  −0.13, 0.84 
6. Reproduction conversations  34  2,528  0.34  0.4990  0.1780  0.4662  0.44  0.6067  13  0.26, 0.42  −0.33, 1.22 
7. Unexpected complications  29  1,956  0.25  0.3788  0.2498  0.2847  0.32  0.3705  43  0.17, 0.33  −0.15, 0.79 
8. Unusual details  35  2,441  0.31  0.6532  0.2489  0.6039  0.41  0.7859  14  0.23, 0.39  −0.59, 1.42 
9. Superfluous details  27  1,863  0.14  0.5676  0.2437  0.5126  0.18  0.6670  18  0.04, 0.24  −0.67, 1.04 
10. Details misunderstood  376  0.22  0.1208  0.2357  0.0000  0.28  0.0000  100  0.02, 0.42  0.28 
11. External associations  22  1,612  0.26  0.4781  0.2405  0.3268  0.34  0.5376  25  0.16, 0.36  −0.35, 1.02 
12. Subjective mental state  28  2,170  0.18  0.4843  0.2312  0.4256  0.23  0.5538  23  0.10, 0.26  −0.47, 0.94 
13. Perpetrator's mental state  31  2,232  0.09  0.6212  0.2376  0.5741  0.11  0.7470  15  0.01, 0.17  −0.84, 1.07 
14. Spontaneous corrections  29  1,842  0.16  0.5276  0.2545  0.4622  0.20  0.6014  23  0.06, 0.26  −0.56, 0.97 
15. Admitting lack of memory  34  2,305  0.25  0.3823  0.2494  0.2897  0.32  0.3770  42  0.17, 0.33  −0.16, 0.80 
16. Doubts one's testimony  26  1,755  0.20  0.4521  0.2478  0.3781  0.26  0.4919  30  0.10, 0.30  −0.37, 0.89 
17. Self-deprecation  13  948  0.04  0.4629  0.2354  0.3985  0.05  0.5186  26  −0.08, 0.16  −0.61, 0.71 
18. Pardoning the perpetrator  680  −0.02  0.2796  0.2178  0.1753  −0.02  0.2281  61  −0.18, 0.14  −0.31, 0.27 
Details characteristics offence  562  0.28  0.1894  0.1966  0.0000  0.36  0.0000  100  0.12, 0.44  0.36 
TOTAL CBCA SCORE  31  2,124  0.55  0.6759  0.2475  0.6290  0.56  0.6386  13  0.47, 0.63  −0.25, 1.37 
Average (original criteria)  46  3,223  0.25  0.5032  0.2368  0.4269  0.33  0.5614  32  0.17, 0.33  −0.39, 1.05 
Additional Criteria
Reporting style  357  0.41  0.2030  0.1874  0.0781  0.48  0.0909  85  0.20, 0.63  0.36, 0.59 
Display insecurities  297  0.67  0.5540  0.2111  0.5122  0.78  0.5965  14  0.43, 0.90  0.01, 1.54 
Providing reasons lack memory  447  0.15  0.2877  0.1902  0.2158  0.18  0.2514  44  −0.03 0.33  −0.14, 0.50 
Clichés  267  −0.18  0.5145  0.2134  0.4682  −0.21  0.5452  17  −0.41, 0.05  −0.90, 0.49 
Repetitions  417  −0.47  0.5851  0.2011  0.5494  −0.54  0.6399  12  −0.67, −0.27  −1.36, 0.27 
Average (original+additional)  46  3,223  0.27  0.4821  0.2313  0.4053  0.34  0.5275  34  0.19, 0.35  −0.33, 1.01 

Note. k=number of studies; n=total sample size; dw=effect size weighted for sample size; SDd=observed standard deviation of d; SDpre=standard deviations of observed d-values corrected from all artifacts; SDres=standard deviation of observed d-values after removal of variance due to all artifacts; δ=effect size corrected for criterion reliability; SDδ=standard deviation of δ; %Var=variance accounted for by artifactual errors; 95% CId=95% confidence interval for d; 80% CVδ=80% credibility interval for δ.

Study of moderators

The study of moderators (criteria average as dependent variable; Table 3) showed a positive and significant mean true effect size, but not generalizable, in all of the moderators analysed. As for the magnitude of the effect sizes, excluding the witness condition with a medium effect size (δ>0.50), all were small (0.20>δ<0.50). Arce and Fariña (2009) have suggested (and designed) the specifications of categorical systems based on bottom-up rather than ‘top-down’ procedures to ensure only categories that effectively discriminate between memories of experienced events and fabricated memories form part of the system. This maximizes the efficacy of the resulting categorical system by eliminating the noise produced by non-discriminating ‘top-down’ categories. Thus, the meta-analyses were repeated with the categories of content analysis with significant effect size i.e., the confidence interval for d did not contain zero. The results (Table 3) revealed a significant increase in the effect size of field studies, qc=.119, p<.05 (one-tailed; a larger effect size was expected with significant criteria), thus the effect size was significantly larger with significant criteria. Moreover, for significant criteria, the results (not all of the reality criteria were generalizable) became generalizable (the credibility interval had no zero). As for the experimental studies on the remaining moderators, the results did not corroborate the Hypothesis as the reality categories had been initially or subsequently screened to eliminate the non-significant ones.

Table 3.

Results of the meta-analysis of moderators.

Moderator  k  n  dw  SDd  SDpre  SDres  δ  SDδ  %Var  95% CId  80% CVδ 
CBCA significant criteria (17)  46  3,223  0.27  0.5187  0.2380  0.4433  0.36  0.5835  31  0.19, 0.35  −0.39, 1.11 
14-criteria version  45  3,143  0.28  0.5567  0.2394  0.4906  0.36  0.6465  25  0.22, 0.34  −0.47, 1.19 
Daubert standard publication criterion
All criteria (22)  35  2,256  0.20  0.4575  0.2407  0.3733  0.26  0.4786  39  0.12, 0.28  −0.35, 0.87 
Self-experienced events
All criteria (22)  34  2,277  0.26  0.4647  0.2371  0.3879  0.33  0.5022  40  0.18, 0.34  −0.31, 0.97 
Non self-experienced events (witness)
All criteria (13)  11  625  0.39  0.5835  0.2707  0.5032  0.51  0.6548  65  0.23, 0.55  −0.33, 1.35 
All criteria (21)  11  1,067  0.27  0.4662  0.2024  0.3743  0.35  0.4975  41  0.15, 0.39  −0.29, 0.99 
All criteria (18)  11  840  0.27  0.4781  0.2355  0.4012  0.35  0.5221  35  0.13, 0.41  −0.32, 1.02 
Field studies
All field studies (18)  422  0.34  0.4948  0.2385  0.4153  0.45  0.5404  35  0.14, 0.54  −0.24, 1.14 
Significant criteria (10)a  422  0.53  0.4774  0.2458  0.3834  0.69  0.4989  42  0.33, 0.73  0.05, 1.33 
Sexual and IPV field studies
All criteria (17)b  263  0.67  0.3587  0.2871  0.1957  0.87  0.2459  72  0.41, 0.92  0.55, 1.18 
Significant criteria (15)c  263  0.74  0.3654  0.2892  0.2134  0.96  0.2478  72  0.48, 0.99  0.64, 1.28 
Experimental studies
All criteria (22)  39  2,721  0.25  0.4497  0.2336  0.3934  0.32  0.4933  37  0.17, 0.33  −0.31, 0.95 



Significant criteria (CBCA criteria, as for additional criteria, studies were insufficient): 1-3, 5-8, 11, 12 and 19.


Significant criteria (CBCA criteria): 1-9, 11-18.


Significant criteria (CBCA criteria): 1-9, 11-12, 14-17.

The meta-analytical technique does not take into account the theoretical foundations or the reliability of the studies included in the original theories, that is, all of the studies on categories of reality are included. Moreover, the experimental designs of studies on witnesses are not really on witnesses of self-experienced events, but on non-self-experienced events i.e., video-observed events (watched on video, not involving self-experienced events) that do not fulfil the original theory hypothesizing that reality criteria discern between memories of self-experienced real-life events and fabricated or fictitious memories. Only one of the studies on witnesses involved self-experienced events (Gödert, Gamer, Rill, & Vossel, 2005), and for the total reality criteria, were found to discriminate significantly real witness from real offenders giving false testimony, d=0.59, 1-β=.78, and from uninvolved participants, d=0.83, 1-β=.96. Nevertheless, reality criteria also discriminated between both memories of video-observed events and fabricated events. The only study (Lee, Klaver, & Hart, 2008) comparing memories of self-experienced events (truth condition) and video-observed events (lie condition) found CBCA reality criteria, and the total CBCA score discriminated significantly between both memories in line with the Undeutsch Hypothesis.

The high observed variability in effect sizes in field studies, which was mostly due to one study alone, suggested differences in experimental design (the crime context in this study was found to be different to the other studies). As the effect of context has been hypothesized (Köhnken, 1996; Volbert & Steller, 2014), and found (Arce, Fariña, & Vilariño, 2010; Vilariño, Novo, & Seijo, 2011) to mediate the discriminating efficacy of reality categories, the meta-analysis was repeated in field studies on sexual offences and intimate partner violence (IPV) cases (crimes committed in the privacy of one's home according to the categorization of Arce & Fariña, 2005). The results showed a positive, significant and generalizable (not generalizable in all field studies) mean true effect size for studies under this condition. Moreover, the magnitude of the effect sizes were significantly larger in sexual offences and IPV cases than in all field studies in all the reality criteria (0.45 for all field studies vs. 0.87 for sexual offences and IPV cases), qc=.199, p<.01 (one-tailed; a higher effect size was expected in specific criminal contexts), and in the significant criteria, qc=.168, p<.05 (0.69 vs. 0.96). Likewise, reality criteria were significantly more efficacious, qc=.2622, p<.01, in sexual offences and IPV cases than in all other types of cases (0.32 vs. 0.87).

Results (meta-analysis could not be performed because ks and ns were insufficient and research designs incomparable) for the comparison between statements of participants instructed to lie (lie coaching condition) with truthful statements were inconclusive2 in relation to the effectiveness of reality criteria to discriminate between truthful and false statements.


The following conclusions may be drawn from the results of this study. First, the results confirmed the Undeutsch Hypothesis, that is, reality criteria discriminated between memories of self-experienced and fabricated events [File Drawer Analysis (FDA): to bring down this hypothesis to a trivial effect (McNatt, 2000), .05, for the average of the CBCA criteria, it would be necessary 184 studies with null effect; Hunter & Schmidt, 2015. It is unlikely to happen]. Besides fulfilling the DSPC, this Hypothesis was also valid for memories of victims/claimants and offenders (for witness of self-experienced events further research is required); and robust in both experimental studies (high internal validity), and field studies (high external validity). Notwithstanding, the reality criteria also discriminated between memories of video-observed events i.e., non-self-experienced events, and fabricated events for which the Hypothesis was not formulated, and research findings are inconclusive as to the validity of the Hypothesis with lie coached subjects. Second, though the results validated CBCA as a categorical system based on the Undeutsch Hypothesis, neither were all of the criteria validated, nor were they generalizable, and some even contradicted the Hypothesis. Thus, these criteria can be used neither in all types of contexts, nor indiscriminately. Both versions of the CBCA (all criteria or 14 criteria) were exactly the same (δ=0.36) in discriminating between memories of self-experienced and fabricated events. Though the results open the door to the inclusion of new reality criteria, additional criteria have been proposed that fail to fulfil the Undeutsch Hypothesis (significant negative effect sizes i.e., not reality criteria), so they cannot be included in the CBCA. Third, in field studies the discriminating power of reality criteria was significantly higher in sexual offences and IPV cases (FDA: to bring the results in sexual offences and IPV cases down to a trivial effect, it would be necessary 62 and 69 studies with null effect for all criteria and significant criteria, respectively. It is unlikely to occur) in comparison to other types of contexts (FDA: to reduce the efficacy of the reality criteria to discriminate between real and fabricated memories in any context of field studies to a trivial effect it would be necessary 35 studies with null effect. It is unlikely to happen). Succinctly, the areas of both populations do not overlap in 54% (U1=0.54), that is, they were totally independent, thus the efficacy of the reality criteria in discriminating between memories of self-experienced and fabricated events in sexual and IPV cases was total in 54% of the evaluations of credibility. Moreover, 75% of statements of self-experienced events contained more reality criteria than fabricated events (probability of superiority, PS=0.75), the probability of false positives was 28% (BESD). These results were highly robust i.e., not only establishing a positive and significant relation between reality criteria and true statements, but were also generalizable to all types of sexual offences and IPV cases, and were homogeneous (i.e., subject to little variability since the correlation between the effect sizes was .72).

As for the implications for forensic practice, the results of the present meta-analysis reveal that the reality criteria were statistically effective for discriminating between memories of self-experienced and fabricated events, but this does not imply they are directly generalizable to forensic practice. Even under the best discriminating conditions i.e., field studies in sexual and IPV cases, the probability of false positives may reach .22, whilst this probability must be zero in forensic settings (Arce, Fariña, & Fraga, 2000). In general, only significant reality criteria i.e., scientifically attested evidence, were admissible for forensic practice (see note of Table 3), since the results were generalizable, whereas for all criteria they were not. However, as the credibility interval lower limit was 0.05, the practical utility of these categories was almost negligible (PS=.51), that is, in only 51% of true statements there were more reality criteria than in false statements, and under what specific conditions this contingency occurred remains unknown. However, the credibility interval lower limit of the reality criteria applied to cases of sexual offences and IPV, which were also generalizable both in terms of all the criteria and the significant criteria, was larger, PS=.73 and .75 (Hedges and Olkin's δ=0.59 and 0.65, test value=.51), for all the reality criteria and the significant criteria, respectively. However, these conclusions are not directly applicable to forensic practice as the decision criteria which in the forensic context must the ‘strict decision criterion’ in which a type II error (classify a false statement as true) is not admissible i.e., must be equal to zero. Regarding the strict decision criterion, Arce et al. (2010) found up to 13 CBCA reality criteria in fabricated statements of IPV cases, which means that at least 14 reality criteria would have to be detected in a statement to conclude that the testimony was true, with a correct classification of true positives (true statements classified as such) of 36%. Succinctly, the CBCA reality criteria were a poor tool for assigning the credibility of IPV victim testimony. Thus, to enhance efficacy, CBCA reality criteria must be complemented with additional criteria. In this line, Arce and Fariña (2009), Vilariño (2010) and Vilariño et al. (2011) combined CBCA and SRA criteria, memory attributes, and additional reality criteria specific to IPV cases derived from real statements (judicial judgements as ground truth), to create and validate a categorical system specific for IPV cases, including sexual offences, with a strict decision criterion to reduce the rate of false negatives to 2%. In any way, only results with a strict decision criterion can be translated into forensic practice.

In terms of future research, the results of the present meta-analysis underscored the need for further studies with experimental designs assessing the efficacy of reality criteria in discriminating between memories of self-experienced events and video witnessed non-self-experienced events; between self-experienced witnessed events vs. fabricated events; between memories of participants coached to lie and honest; and research driven to find new reality categories (bottom-up), mainly for a specific context i.e., crime victimization.

This meta-analysis is subject to the following limitations. First, previous publications have biased the results in that the non-significant results or predictably inefficacious categories were eliminated (favouring the validation of the Undeutsch Hypothesis). Second, the feigning methodology (experimental studies) had no proven external validity (Sarwar, Allwood, & Innes-Ker, 2014), but only ‘face validity’ (Konecni & Ebbesen, 1992). Third, for some experimental literature, statements are insufficient material for reality content analysis (Köhnken, 2004), which favours the rejection of the Undeutsch Hypothesis. Fourth, there was no control on the effects of the interviewer on the contents of the statement, or on the reliability of the interviews, which were often carried out by poorly trained interviewers. Fifth, few studies comply with SVA standards that are a requirement for applying CBCA. Sixth, the results of some meta-analysis may be subject to a degree of variability, given that Ns<400, did not guarantee stability in sample estimates (Hunter & Schmidt, 2015). Seventh, primary studies did not estimate the reliability of the codings, thus results’ reliability is uncertainty.


This research has been sponsored by a grant of the Spanish Ministry of Economy and Competitiveness (PSI2014-53085-R).

Further reading
[Akehurst et al., 2015]
L. *Akehurst, S. Easton, E. Fullar, G. Drane, K. Kuzmin, S. Litchfield.
An evaluation of a new tool to aid judgements of credibility in the medico-legal setting.
Legal and Criminological Psichology, (2015),
[Beaulieu-Prévost, 2001]
*Beaulieu-Prévost, D.;1; (2001). Analyse de validité de la déclaration (SVA), mensonge et faux souvenirs: Validité et efficacité chez les adultes. (Doctoral dissertation). Retrieved from ProQuest Dissertations & Theses Global. (Order No. MQ60609).
[Bensi et al., 2009]
L. *Bensi, E. Gambetti, R. Nori, F. Giusberti.
Discerning truth from deception: The sincere witness profile.
European Journal of Psychology Applied to Legal Context, 1 (2009), pp. 101-121
[Biland et al., 1999]
C. *Biland, J. Py, S. Rimboud.
Evaluer la sincérité d’un témoin grâce à trois techniques d’analyse, verbales et non verbale.
European Review of Applied Psychology, 49 (1999), pp. 115-122
[Blandón-Gitlin et al., 2009]
I. *Blandón-Gitlin, K. Pezdek, D.S. Lindsay, L. Hagen.
Criteria-Based Content Analysis of true and suggested accounts of events.
Applied Cognitive Psychology, 23 (2009), pp. 901-917
[Bogaard et al., 2013]
G. *Bogaard, E.H. Meijer, A. Vrij.
Using an example statement increases information but does not increase accuracy of CBCA, RM, and SCAN.
Journal of Investigative Psychology and Offender Profiling, 11 (2013), pp. 151-163
[Caso et al., 2006]
L. *Caso, A. Vrij, S. Mann, G. de Leo.
Deceptive responses: The impact of verbal and non-verbal countermeasures.
Legal and Criminological Psychology, 11 (2006), pp. 99-111
[Critchlow, 2011a]
*Critchlow, N. (2011). Applying Criteria Based Content Analysis to assessing the veracity of rape statements (Unpublished doctoral dissertation). Manchester Metropolitan University, Manchester, UK.
[Critchlow, 2011b]
*Critchlow, N. (2011). [A field validation of CBCA when assessing authentic police rape statements: evidence for discriminant validity to prescribe veracity to adult narrative]. Unpublished raw data.
[Dana-Kirby, 1997]
*Dana-Kirby, L. (1997). Discerning truth from deception: Is Criteria-Based Content Analysis effective with adult statements? (Unpublished doctoral thesis). University of Oregon, Oregon.
[Evans et al., 2013]
J. *Evans, S.W. Michael, C.A. Meissner, S.E. Brandon.
Validating a new assessment method for deception detection: Introducing a psychologically based credibility assessment tool.
Journal of Applied Research in Memory and Cognition, 2 (2013), pp. 33-41
[Godoy and Higueras, 2008]
*Godoy, V., & Higueras, L. (2008). El análisis de contenido basado en criterios (CBCA) y la entrevista cognitiva aplicados a la credibilidad del testimonio en adultos. In F. Rodríguez, C., Bringas, F. Fariña, R. Arce, & A. Bernardo (Eds.), Psicología Jurídica: Entorno judicial y delincuencia (pp. 117-125). Retrieved from http://gip.uniovi.es/T5EJD.pdf.
[Honts and Devitt, 1993]
*Honts, C.R., & Devitt, M.K. (1993). Credibility Assessment of Verbatim Statements (CAVS). Retrieved from http://www.dtic.mil/dtic/tr/fulltext/u2/a271575.pdf.
[Johnston et al., 2014]
S. *Johnston, A. Candelier, D. Powers-Green, S. Rahmani.
Attributes of truthful versus deceitful statements in the evaluation of accused child molesters.
Sage Open, 4 (2014), pp. 1-10
[Köhnken et al., 1995]
G. *Köhnken, E. Schimossek, E. Aschermann, E. Höfer.
The cognitive interview and the assessment of the credibility of adults’ statements.
Journal of Applied Psychology, 80 (1995), pp. 671-684
[Leal et al., 2015]
S. *Leal, A. Vrij, L. Warmelink, Z. Vernham, R.P. Fisher.
You cannot hide your telephone lies: Providing a model statement as an aid to detect deception in insurance telephone calls.
Legal and Criminological Psychology, 20 (2015), pp. 129-146
[Merckelbach, 2004]
H. *Merckelbach.
Telling a good story: Fantasy proneness and the quality of fabricated memories.
Personality and Individual Differences, 37 (2004), pp. 1371-1382
[Porter and Yuille, 1996]
S. *Porter, J.C. Yuille.
The language of deceit: An investigation of the verbal clues to deception in the interrogation context.
Law and Human Behavior, 20 (1996), pp. 443-458
[Porter et al., 1999]
S. *Porter, J.C. Yuille, D.R. Lehman.
The nature of real, implanted, and fabricated memories for emotional childhood events: Implications for the recovered memory debate.
Law and Human Behavior, 23 (1999), pp. 517-537
[Rassin and van-der-Sleen, 2005]
E. *Rassin, J. van-der-Sleen.
Characteristics of true versus false allegations of sexual offences.
Psychological Reports, 97 (2005), pp. 589-598
[Schelleman-Offermans and Merckelbach, 2010]
K. *Schelleman-Offermans, H. Merckelbach.
Fantasy proneness as a confounder of verbal lie detection tools.
Journal of Investigative Psychology and Offender Profiling, 7 (2010), pp. 247-260
[Sporer, 1997]
S.L. *Sporer.
The less travelled road to truth: Verbal cues in deception detection in accounts of fabricated and self-experienced events.
[Ternes, 2009]
M. *Ternes.
Verbal credibility assessment of incarcerated violent offenders’ memory reports.
Vancouver, (2009),
[Vrij et al., 2004a]
A. *Vrij, L. Akehurst, R. Soukara, R. Bull.
Detecting deceit via analyses of verbal and nonverbal behavior in children and adults.
Human Communication Research, 30 (2004), pp. 8-41
[Vrij et al., 2004b]
A. *Vrij, H. Evans, L. Akehurst, S. Mann.
Rapid judgements in assessing verbal and nonverbal cues: Their potential for deception researchers and lie detection.
Applied Cognitive Psychology, 18 (2004), pp. 283-296
[Vrij et al., 2001]
A. *Vrij, K. Edward, R. Bull.
People's insight into their own behaviour and speech content while lying.
British Journal of Psychology, 92 (2001), pp. 373-389
[Vrij et al., 2000a]
A. *Vrij, K. Edward, K.P. Roberts, R. Bull.
Detecting deceit via analysis of verbal and nonverbal behavior.
Journal of Nonverbal Behavior, 24 (2000), pp. 239-263
[Vrij et al., 2000c]
A. *Vrij, S. Mann, K. Edward.
I think it was a green scarf but I am not sure. Raising doubts about one's own testimony during lying and truth telling.
Forensic psychology and law. Traditional questions and new ideas, pp. 205-207
[Vrij and Heaven, 1999]
A. *Vrij, S. Heaven.
Vocal and verbal indicators of deception as a function of lie complexity.
Psychology, Crime and Law, 5 (1999), pp. 203-215
[Vrij and Mann, 2006]
A. *Vrij, S. Mann.
Criteria-Based Content Analysis: An empirical test of its underlying processes.
Psychology, Crime and Law, 12 (2006), pp. 337-349
[Vrij et al., 2007]
A. *Vrij, S. Mann, S. Kristen, R.P. Fisher.
Cues to deception and ability to detect lies as a function of police interview styles.
Law and Human Behavior, 31 (2007), pp. 449-518
[Willén and Strömwall, 2011]
R.M. *Willén, L.A. Strömwall.
Offender's uncoerced false confessions: A new application of statement analysis?.
Legal and Criminological Psychology, 17 (2011), pp. 346-359
[Wojciechowski, 2014]
B.W. *Wojciechowski.
Content analysis algorithms: An innovative and accurate approach to statement veracity assessment.
European Poligraph, 8 (2014), pp. 119-128

Indicates the primary studies included in the meta-analysis.

[Amado et al., 2015]
B.G. Amado, R. Arce, F. Fariña.
Undeutsch hypothesis and Criteria Based Content Analysis: A meta-analytic review.
European Journal of Psychology Applied to Legal Context, 7 (2015), pp. 3-12
[Anson et al., 1993]
D.A. Anson, S.L. Golding, K.J. Gully.
Child sexual abuse allegations: Reliability of Criteria-Based Content Analysis.
Law and Human Behavior, 17 (1993), pp. 331-341
[Arce and Fariña, 2005]
R. Arce, F. Fariña.
Peritación psicológica de la credibilidad del testimonio: la huella psíquica y la simulación: El Sistema de Evaluación Global (SEG).
Papeles del Psicólogo, 26 (2005), pp. 59-77
[Arce and Fariña, 2009]
R. Arce, F. Fariña.
Evaluación psicológica forense de la credibilidad y daño psíquico en casos de violencia de género mediante el Sistema de Evaluación Global.
Violencia de género. Tratado psicológico y legal, pp. 147-168
[Arce and Fariña, 2012]
R. Arce, F. Fariña.
Psicología social aplicada al ámbito jurídico.
Psicología social aplicada, pp. 157-182
[Arce et al., 2000]
R. Arce, F. Fariña, A. Fraga.
Género y formación de juicios en un caso de violación [Gender and juror judgment making in a case of rape].
Psicothema, 12 (2000), pp. 623-628
[Arce et al., 2010]
R. *Arce, F. Fariña, M. Vilariño.
Contraste de la efectividad del CBCA en la evaluación de la credibilidad en casos de violencia de género.
Intervención Psicosocial, 19 (2010), pp. 109-119
[Berliner and Conte, 1993]
L. Berliner, J.R. Conte.
Sexual abuse evaluation: Conceptual and empirical obstacles.
Child Abuse and Neglect, 17 (1993), pp. 111-125
[Botella and Gambara, 2006]
J. Botella, H. Gambara.
Doing and reporting a meta-analysis.
International Journal of Clinical & Health Psychology, 6 (2006), pp. 425-440
[Cohen, 1988]
J. Cohen.
Statistical power analysis for the behavioral sciences.
2nd ed., LEA, (1988),
[Daubert, 1993]
Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993).
[Fariña et al., 2002]
F. Fariña, R. Arce, M. Novo.
Heurístico de anclaje en las decisiones judiciales [Anchorage in judicial decision making].
Psicothema, 14 (2002), pp. 39-46
[Fariña et al., 1994]
F. Fariña, R. Arce, S. Real.
Ruedas de identificación: De la simulación y la realidad.
Psicothema, 6 (1994), pp. 395-402
[Gödert et al., 2005]
H.W. *Gödert, M. Gamer, H.G. Rill, G. Vossel.
Statement Validity Assessment: Inter-rater reliability of Criteria-Based Content Analysis in the mock-crime paradigm.
Legal and Criminological Psychology, 10 (2005), pp. 225-245
[Hans and Vidmar, 1986]
V.P. Hans, N. Vidmar.
Judging the jury.
Plenum Press, (1986),
[Hedges and Olkin, 1985]
L.V. Hedges, I. Olkin.
Statistical methods for meta-analysis.
Academic Press, (1985),
[Höfer et al., 1993]
Höfer, E., Köhnken, G., Hanewinkel, R., & Bruhn, C. (1993). Diagnostik und attribution von glaubwürdigkeit. Unpublished final report. University of Kiel, Germany.
[Horowitz et al., 1997]
S.W. Horowitz, M.E. Lamb, P.W. Esplin, T.D. Boychuk, O. Krispin, L. Reiter-Lavery.
Reliability of criteria-based content analysis of child witness statements.
Legal and Criminological Psychology, 2 (1997), pp. 11-21
[Hunter and Schmidt, 2015]
J.E. Hunter, F.L. Schmidt.
Methods of meta-analysis: Correcting error and bias in research findings.
Sage, (2015),
[Juárez et al., 2007]
*Juárez, J.R., Mateu, A., & Sala, E. (2007). Criterios de evaluación de la credibilidad en las denuncias de violencia de género. Retrieved from http://justicia.gencat.cat/web/.content/documents/arxius/sc-3-143-07-cas.pdf.
[Köhnken, 1996]
G. Köhnken.
Social psychology and the law.
Applied social psychology, pp. 257-282
[Köhnken, 2004]
G. Köhnken.
Statement Validity Analysis and the detection of the truth.
The detection of deception in forensic contexts, pp. 41-63 http://dx.doi.org/10.1017/CBO9780511490071.003
[Konecni and Ebbesen, 1992]
V.J. Konecni, E.B. Ebbesen.
Methodological issues on legal decision-making, with special reference to experimental simulations.
Psychology and law. International perspectives, pp. 413-423
[Lee et al., 2008]
Z. *Lee, J.R. Klaver, S.D. Hart.
Psychopathy and verbal indicators of deception in offenders.
Psychology, Crime & Law, 14 (2008), pp. 73-84
[McNatt, 2000]
D.B. McNatt.
Ancient Pygmalion joins contemporary management: A meta-analysis of the result.
Journal of Applied Psychology, 85 (2000), pp. 314-322
[Novo and Seijo, 2010]
M. Novo, D. Seijo.
Judicial judgement-making and legal criteria of testimonial credibility.
European Journal of Psychology Applied to Legal Context, 2 (2010), pp. 91-115
[Raskin et al., 1991]
Raskin, D.C., Esplin, F.W., & Horowitz, S. (1991). Investigative interviews and assessment of children in sexual abuse cases. Unpublished manuscript, Department of Psychology, University of Utah, Utah.
[Sarwar et al., 2014]
F. Sarwar, C.M. Allwood, A. Innes-Ker.
Effects of different types of forensic information on eyewitness’ memory and confidence accuracy.
European Journal of Psychology Applied to Legal Context, 6 (2014), pp. 17-27
[Steller and Böhm, 2006]
M. Steller, C. Böhm.
Cincuenta años de jurisprudencia del Tribunal Federal Supremo alemán sobre la psicología del testimonio. Balance y perspectiva.
Nuevos caminos y conceptos en la psicología jurídica, pp. 53-67
[Steller and Köhnken, 1989]
M. Steller, G. Köhnken.
Criteria-Based Content Analysis.
Psychological methods in criminal investigation and evidence, pp. 217-245
[Tukey, 1960]
J.W. Tukey.
A survey of sampling from contaminated distributions.
Contributions to probability and statistics, pp. 448-485
[Undeutsch, 1967]
U. Undeutsch.
Beurteilung der glaubhaftigkeit von aussagen.
Handbuch der psychologie, Vol. 11: Forensische psychologie, pp. 26-181
[Vilariño, 2010]
*Vilariño, M. (2010). ¿Es posible discriminar declaraciones reales de imaginadas y huella psíquica real de simulada en casos de violencia de género? (Doctoral thesis, Universidad de Santiago de Compostela, Spain). Retrieved from http://hdl.handle.net/10347/2831.
[Vilariño et al., 2011]
M. Vilariño, M. Novo, D. Seijo.
Estudio de la eficacia de las categorías de realidad del testimonio del Sistema de Evaluación Global (SEG) en casos de violencia de género.
Revista Iberoamericana de Psicología y Salud, 2 (2011), pp. 1-26
[Volbert and Steller, 2014]
R. Volbert, M. Steller.
Is this testimony truthful, fabricated, or based on false memory? Credibility assessment 25 years after Steller and Köhnken (1989).
European Psychologist, 19 (2014), pp. 207-220
[Vrij, 2005]
A. Vrij.
Criteria-Based Content Analysis: A qualitative review of the first 37 studies.
Psychology, Public Policy, and Law, 11 (2005), pp. 3-41
[Vrij, 2008]
A. Vrij.
Detecting lies and deceit: Pitfalls and opportunities.
2nd ed., John Wiley and Sons., (2008),
[Vrij et al., 2002]
A. Vrij, L. Akehurst, S. Soukara, R. Bull.
Will the truth come out? The effect of deception, age, status, coaching and social skills on CBCA scores.
Law and Human Behavior, 26 (2002), pp. 261-283
[Vrij et al., 2000b]
A. *Vrij, W. Kneller, S. Mann.
The effect of informing liars about Criteria-Based Content Analysis on their ability to deceive CBCA-raters.
Legal and Criminological Psychology, 5 (2000), pp. 57-70

Additional results and resources at http://www.researchgate.net/profile/Ramon_Arce

Conclusions in the primary studies about non-significant results are inconclusive as the statistical power, 1-β<.80, is insufficient to conclude (d=−0.44, 1-β=.41, Bogaard, Meijer, & Vrij, 2013; d=0.37, 1-β=.26, Vrij, Akehurst, Soukara, & Bull, 2002; d=0.11, 1-β=.06, Vrij, Kneller, & Mann, 2000).

Indicates the primary studies included in the meta-analysis.

Copyright © 2016. Asociación Española de Psicología Conductual
Opciones de artículo