metricas
covid
Spanish Journal of Psychiatry and Mental Health Potential duplicate cases in the official suicide statistics reported by the Spa...
Journal Information
Scientific letter
Full text access
Available online 11 September 2025
Potential duplicate cases in the official suicide statistics reported by the Spanish National Statistics Institute (INE)
Visits
9
Sergio Sanz Gómeza, Julio A. Guijaa,b, Lucas Ginera,
Corresponding author
lginer@us.es

Corresponding author.
a Department of Psychiatry, Universidad de Sevilla, Seville, Spain
b Servicio de Psiquiatría Forense, Instituto de Medicina Legal de Sevilla, Seville, Spain
This item has received
Article information
Full Text
Bibliography
Download PDF
Statistics
Tables (1)
Table 1. Total suicide cases, detected duplicates, and percentage of the total per year (2016–2020).
Tables
Full Text

Suicide is a major global public health problem, as evidenced by the most recent estimates from the World Health Organization (WHO).1 In Spain is one of the leading causes of external death as documented by the Spanish National Statistics Institute (INE).2 Accurate epidemiological monitoring of this phenomenon is a fundamental pillar for understanding its dimensions, identifying temporal and geographical trends, detecting particularly vulnerable groups, and, ultimately, designing and evaluating the effectiveness of prevention strategies.3

In Spain, the INE is the body responsible for publishing official mortality figures, based on the Statistical Death Bulletins. However, for years, certain discrepancies have been observed and debated between the suicide statistics provided by the INE and those registered by forensic sources: the Institutes of Legal Medicine and Forensic Sciences (IMLCF) across Spain, which judicially investigate deaths due to external causes.4

While the reasons for this divergence may be multiple and complex (including differences in coding criteria or recording times), our research group hypothesized that the existence of undetected duplicate records in the INE databases could be a contributing factor to these differences and, more generally, to the quality of the available statistical information. This letter aims to present preliminary evidence supporting this hypothesis, derived from the analysis of data from 2016 through 2020.

To explore this possibility, we analyzed a specific database requested from the INE for a former study.5 This database contained anonymized information on suicide cases from 2016 through 2020. The exact INE variables used for identification were year of death, province of residence, municipality of residence, age, sex, occupation (NCO-11 code), and detailed cause of death (ICD-10 coding). A check for exact duplicates was performed based on the combination of these variables.

The duplicate detection algorithm was implemented following a simple and reproducible logic: (1) all suicide records were grouped by the unique combination of the seven identifying variables. (2) For each resulting group, the number of records was counted. (3) If the count for a group was >1, additional records were considered possible duplicates. Our analysis revealed the presence of multiple records sharing identical values across all these variables within the same year (Table 1).

Table 1.

Total suicide cases, detected duplicates, and percentage of the total per year (2016–2020).

Year  Total suicides (INE)  Possible duplicates  Percentage (%) 
2016  3569  44  1.23 
2017  3679  59  1.60 
2018  3539  55  1.55 
2019  3671  39  1.06 
2020  3941  55  1.40 
Source: Authors’ own elaboration based on INE microdata and official figures on deaths by cause.

While we cannot completely rule out that some of these cases represent real coincidences (two or more individuals with the same demographic characteristics and suicide method dead in the same municipality and year), the chances of such exact matches across multiple variables are low. Furthermore, we observed instances where the same data combination was repeated up to five times in one single year, which reinforces the hypothesis of accidental record duplication during the data compilation or processing by the INE.

In conclusion, our analysis of the INE death microdata from 2016 through 2020 reveals the consistent presence of potential duplicate suicide cases. Although these preliminary findings require definitive further validation in collaboration with the INE, they still suggest the need for a review of the quality control processes for these vital statistics. The presence of duplicate records, even if it affects a relatively small percentage of the total annual cases (between 1.06% and 1.60%), may have significant implications for the accuracy of reported mortality rates and the reliability of derived analyses. It can introduce biases into epidemiological research on risk factors, temporal trends, geographical distributions, or interregional comparisons, hindering a thorough understanding of the phenomenon.6

Because official statistics are a fundamental tool for informing research and guiding public health policy and resource allocation for suicide prevention, ensuring their accuracy is of paramount importance. Therefore, we consider it essential to promote greater transparency in the processes of generating these statistics and recommend the implementation of more comprehensive data validation and cleaning procedures by the INE, in line with European guidelines on mortality data quality,7 possibly incorporating data linkage or periodic audits.8 Improving the quality of official suicide records is an indispensable step to strengthen evidence-based prevention strategies in Spain.

Funding

The work of Sergio Sanz Gómez was supported by VI-PPITUS.

Conflicts of interest

None declared.

References
[1]
World Health Organization.
World Health Statistics 2023: Monitoring Health for the SDGs, Sustainable Development Goals.
(2023), pp. 27
[2]
Instituto Nacional de Estadística.
Inst Nac Estadística, (2024), pp. 1
[3]
G. Zalsman, K. Hawton, D. Wasserman, et al.
Suicide prevention strategies revisited: 10-year systematic review.
Lancet Psychiatry, 3 (2016), pp. 646-659
[4]
L. Giner, J.A. Guija.
Número de suicidios en España: diferencias entre los datos del Instituto Nacional de Estadística y los aportados por los Institutos de Medicina Legal.
Rev Psiquiatr Salud Ment, 7 (2014), pp. 139-146
[5]
S. Sanz-Gómez, A. Alacreu-Crespo, A. Fructuoso, M.I. Perea-González, J.A. Guija, L. Giner.
Pandemics and suicide rates in Spain: from the Spanish flu to COVID-19.
J Clin Psychiatry, 84 (2023), pp. 1-5
[6]
Federal Committee on Statistical Methodology.
A Framework for Data Quality. FCSM 20-04.
(2020), pp. 478-489
[7]
Eurostat.
Causes of Death Statistics Manual – 2024 Edition.
(2024),
[8]
A. Lighterness, M. Adcock, L.A. Scanlon, G. Price.
Data quality-driven improvement in health care: systematic literature review.
J Med Internet Res, 26 (2024),
Copyright © 2025. The Authors
Download PDF
Article options
Tools