Use of artificial intelligence to predict complications in degenerative thoracolumbar spine surgery: A systematic review

Ricciardi, G.; Cirillo Totera, J.I.; Pons Belmonte, R.; Romero Valverde, L.; López Muñoz, F.; Manríquez Díaz, A.

doi:10.1016/j.recot.2025.06.015

Información del artículo

Resumen

Texto completo

Bibliografía

Descargar PDF

Estadísticas

Figuras (3)

Mostrar másMostrar menos

Tablas (3)

Table 1. Research question according to PICO model.

Table 2. Study characteristics.

Table 3. Results of the studies.

Mostrar másMostrar menos

Abstract

Objective

We aim to conduct a systematic review of the literature to evaluate the effectiveness of artificial intelligence prediction models in predicting complications in adult patients undergoing surgery for degenerative thoracolumbar pathology compared with other commonly used prediction techniques.

Methods

A systematic literature review was conducted in Medline/Pubmed, Cochrane Library, and Lilacs/Portal de la BVS to identify machine learning models in predicting complications in patients undergoing surgery for degenerative thoracolumbar spine pathology between January 1, 2000, and May 1, 2023. The risk of bias was assessed using the PROBAST tool. Study characteristics and outcomes focusing on general or specific complications were recorded.

Results

A total of 2341 titles were identified (763 were duplicates). Screening was performed on 1578 titles, and 22 were selected for full-text reading, with 18 exclusions and 4 publications selected for the subsequent review. Additionally, 8 publications were included from other sources (Argentine Association of Orthopaedics and Traumatology Library; manual citation search). In 5 (41.6%) articles, the effectiveness of artificial intelligence predictive models was compared with conventional techniques. All were globally classified as having a very high risk of bias. Due to heterogeneity in samples, outcomes of interest, and algorithm evaluation metrics, a meta-analysis was not performed.

Conclusion

Although the available evidence is limited and carries a high risk of bias, the studies analysed suggest that these models may achieve promising performance in predicting complications, with area under the curve values mostly ranging from acceptable to excellent.

Keywords:

Artificial intelligence

Machine learning

Deep learning

Artificial neural networks

Degenerative pathology

Adult deformity

Resumen

Introducción

El objetivo de los autores es realizar una revisión sistemática de la bibliografía para evaluar la efectividad de los modelos predictivos de inteligencia artificial en la predicción de complicaciones en pacientes adultos tratados mediante cirugía por enfermedad toracolumbar degenerativa, en comparación con otras técnicas predictivas de uso habitual.

Materiales y métodos

Se realizó una revisión sistemática de la bibliografía en Medline/Pubmed, Cochrane Library y Lilacs/Portal de la BVS sobre la efectividad del uso de modelos predictivos de inteligencia artificial para las posibles complicaciones en pacientes operados por enfermedad degenerativa de la columna toracolumbar durante el periodo de 1 de enero de 2000 y 1 de mayo de 2023. El riesgo de sesgo se evaluó con las herramientas ROBINS-I y PROBAST. Se registraron características de los estudios y resultados, contemplando como desenlace complicaciones generales o específicas.

Resultados

Se identificaron 2.321 títulos, 763 eran duplicados. Se realizó el cribado de 1.558 títulos; 22 fueron elegidos para su lectura completa con exclusión de 18 y elección final de 4 publicaciones para la siguiente revisión. Adicionalmente, se incluyeron 8 publicaciones desde otras fuentes (Biblioteca Asociación Argentina de Ortopedia y Traumatología, con búsqueda manual de citas). En 5 artículos (41,6%) se compararon la efectividad de modelos predictivos de inteligencia artificial frente a técnicas habituales. Todos fueron catalogados globalmente con muy alto riesgo de sesgo. Dada la heterogeneidad de las muestras, los resultados de interés y las métricas de evaluación de los algoritmos, no se realizó un metaanálisis.

Conclusión

Si bien la evidencia disponible es limitada y presenta un alto riesgo de sesgo, los estudios analizados indican que estos modelos pueden alcanzar un desempeño prometedor en la predicción de complicaciones, con valores del área bajo la curva que, en su mayoría, oscilan entre aceptables y excelentes.

Palabras clave:

Inteligencia artificial

Aprendizaje mecanizado

Aprendizaje profundo

Redes neuronales artificiales

Patología degenerativa

Deformidad del adulto

Texto completo

Introduction

According to U.S. statistics, the estimated cost of degenerative vertebral disease is around $100 billion annually.1 It is estimated that 2 out of 3 adults will experience low back pain at some point in their lives.2 The complexity of patients with spinal disease and the complications associated with surgery have motivated research into strategies for accurate prediction of these episodes, as well as the anticipated estimate of clinical outcomes. Traditionally, different models of statistical analysis have made it possible to identify predicative factors for complications, with great popularity enjoyed by multivariate analysis models, such as logistic regression, which produces a measurement of risk (odds ratio) for independent variables on a specific effect or outcome.3

The field of artificial intelligence (AI) has had a significant impact on multiple areas of health care, and spinal surgery is no exception.3,4 AI is concerned not only with understanding but also with building “intelligent entities”: machines that can calculate how to act effectively and safely.4 AI comprises a variety of disciplines including: natural language processing, knowledge representation, automated reasoning, machine learning (ML), and robotics. ML is a subarea that enables the system to learn and provide feedback to itself; that is, to develop algorithms that improve with experience. ML involves numerous methods, such as deep learning, based on artificial neural networks.3,4 ML has also made it possible to develop predictive models, and in the last decade numerous articles have been published for their application in specific areas, such as spinal surgery.3,4

The authors aimed to conduct a systematic review of the literature to assess the effectiveness of predictive artificial intelligence models in predicting complications in adult patients treated with surgery for degenerative thoracolumbar disease, compared to other commonly used predictive techniques.

Materials and methods

A systematic review of the literature in the main biomedical databases (Medline/Pubmed, Cochrane Library and Lilacs/VHL Portal) was carried out on the effectiveness of the use of predictive AI models to predict complications in patients operated on for degenerative disease of the thoracolumbar spine during the period between the 1st of January 2000 and the 1st of May 2023.

Eligibility criteria

Studies were selected according to the following eligibility criteria:

Study designs: randomised, controlled clinical trials, prospective non-randomised studies, prospective and retrospective cohort observational studies, cross-sectional studies, and descriptive series with more than 10 cases. Case reports, reviews (systematic, narrative), editorials, letters to the editor, and consensus documents were excluded.

Participants: adult patients (18–65 years) of both sexes, treated for degenerative disease of the thoracolumbar spine (herniated disc, narrow lumbar canal, and adult, sagittal, or coronal deformity). Population studies with idiopathic, neuromuscular, congenital or syndromic scoliosis, osteoporosis/metabolic disease fractures, rheumatoid arthritis, ankylosing spondylitis/diffuse idiopathic skeletal hyperostosis, vertebral oncological disease, and studies on patients treated with blocking as a single treatment procedure (with no surgery) were excluded.

Intervention: use of AI for the creation of predictive models of complications, considering deep learning, machine learning, artificial neural networks, and other novel methods whose development involves the use of artificial intelligence. We excluded studies that used AI models for purposes other than complication prediction, such as patient and imaging assessment, classification, application in navigated surgery, or robotics.

Comparator: other common methods for predicting complications such as statistical methods or scales. Due to the novelty of the topic, studies without a comparator were also considered.

Outcomes: studies that recorded complications in surgical patients due to degenerative thoracolumbar disease, mainly covering intraoperative and early postoperative complications (90 days after surgery). Secondarily, complications over longer periods (6 months, 1 and 2 years) and other outcome variables, such as pain, functional disability, length of hospitalisation, readmissions, and morbidity and mortality.

Time: studies with follow-up time greater than or equal to 90 days.

Language: studies in English, Spanish and Portuguese.

Table 1 summarises the research question according to the PICO model, which enabled us to provide structure for the scientific problem, describing the eligibility criteria and guiding the bibliographical search.

Table 1.

Research question according to PICO model.

PICO	Inclusion	Exclusion
Patients	Surgically treated adult patients (aged 18–65 years) of both sexes with degenerative thoracolumbar spine conditions, including herniated disc, lumbar stenosis, and adult spinal deformity (sagittal and/or coronal).	Conditions such as idiopathic, neuromuscular, congenital or syndromic scoliosis, fractures caused by osteoporosis/metabolic disease, rheumatoid arthritis, ankylosing spondylitis/hyperostosis, diffuse idiopathic skeletal (DISH) disease, spinal oncological conditions, patients who underwent blocking as a sole therapeutic procedure (with no surgery).
Intervention	The use of artificial intelligence in developing predictive models for complications. We took into account methods such as deep learning, machine learning, artificial neural networks, and other new approaches that involve artificial intelligence.	Studies that used artificial intelligence models for purposes other than prediction of complications were excluded.
Comparison	Other frequently used methods to predict complications, such as statistical models or measurement scales, were also considered. Due to the newness of the topic, studies without a comparison group were also included in the analysis.
Outcome	Studies reporting complications, with a focus on intraoperative and early postoperative complications (within 90 days of surgery). Furthermore, we examined complications beyond the 90-day period, up to 6 months, 1 year, and 2 years. We also considered specific complications.	No complications were recorded.
Time	Studies with a follow-up period of 90 days or longer.
Study design	Controlled randomised clinical trials (RCTs), prospective non-randomised studies, prospective and retrospective cohort studies, cross-sectional studies, and descriptive series with over 10 cases.Language: English, Spanish, and Portuguese.	Case reports, systematic and narrative reviews, editorials, letters to the editor, and consensus papers were excluded.

PICO: P=patient; I=intervention; C=comparator; O=outcome.

Sources of information

A bibliographical search strategy was developed using the MEDLINE, Cochrane and LILACS databases (Latin American and Caribbean Literature in Health Sciences) through the Pubmed and Cochrane Library search engines and the Virtual Health Library (VHL) portal. In addition, other sources of bibliographical citations were considered, such as consulting the library of the Argentine Association of Orthopaedics and Traumatology and manually searching the reference lists of the studies included or reviews (narrative/systematic) identified during the search (snowballing).

Search strategy

A search strategy was developed using MESH terms and keywords on the use of artificial intelligence for the prediction of complications in patients treated with degenerative thoracolumbar spinal surgeries. The strategy was developed by the team of researchers and is described below: ((((((artificial intelligence) OR (deep learning)) OR (machine learning)) OR (AI)) OR (artificial intelligence)) AND (spine)) AND ((((thoracolumbar) OR (lumbar)) OR (thoracic)) OR (lumbosacral)). The bibliographical search was limited by language filters (Spanish, English and Portuguese) and by date, considering the period of time as between 1st January 2000 and 1st May 2023. We did not use search filters on study design or type.

Data management

The results of the literature search were uploaded to the Zotero programme, which manages bibliographical citations and facilitate collaboration between reviewers during the study selection process. Abstracts were uploaded and duplicates were deleted. Prior to the formal selection process, training was provided for the members of the review team who were unfamiliar with the programme.

Selection process

The review authors were grouped into 2 groups of 2 members each; both groups independently screened titles and abstracts according to inclusion criteria. Disagreements were resolved through discussion among the reviewers and, eventually, by a third opinion from an additional reviewer, an experienced member of the research team. After the selection of articles eligible for full-text review, all full-text articles were retrieved through library sources. Both groups of reviewers proceeded to assess the full-text articles that had been selected by the other team, and vice versa, (cross-design) to limit possible review selection bias. During the full-text review, the references of the articles were also checked for possible eligibility (snowball). Again, any potential conflicts were resolved first by the reviewers in each group and, if necessary, by the third opinion of an additional experienced reviewer.

Data mining

Data mining was undertaken in duplicate and the review authors in charge worked independently. Data was recorded in tables. A table on the characteristics of the selected studies included the following: author, year, participating countries, disease under study, algorithm used, number of sites participating, sample size, outcome variable (general complications or of a specific type), data source (database), validation, reported results, accuracy (percentage), area under the curve (AUC ROC) and operating characteristics (sensitivity, specificity). Inclusion and exclusion criteria, demographic characteristics of participants, follow-up period, data on funding and possible conflicts of interest were also recorded.

Assessing risk of bias

We assessed the risk of bias of non-randomised observational studies using the ROBINS-I5 tool. To assess the risk of bias in the use of predictive risk models, the PROBAST6 tool was considered. Bias assessment was performed by at least 2 evaluators independently. Conflicts were resolved by consensus.

To ensure consistency, the lead author screened all abstracts and full texts for eligibility, mined the data, and assessed risk of bias in all studies included.

Strategy for data synthesis

Subsequently, all the results of the individual reviewers were combined into one single data table. This table was discussed with the full team of reviewers to reach a consensus over the results of our review.

For the assessment of the performance of the predictive models, the AUC was mainly considered. For its categorisation the following classification was adopted: AUC=0.5 useless, AUC=0.6–0.7 possibly useful; AUC=0.7–0.8 acceptable; AUC=0.8–0.9 excellent and AUC>0.9=exceptional.

On the other hand, other parameters that reflect the performance of the predictive models were considered: accuracy, recall, specificity, positive predictive value (precision).

To assess the effectiveness of predictive models compared to other methods, we consider as alternatives the use of instruments such as scales or scores and comparison with traditional statistical methods, either linear regression or multivariate logistic regression. These methods of statistical analysis mentioned are most typically used to generate predictive clinical models or prognoses and their use can be considered as a benchmark performance indicator. It should be clarified that any type of more advanced algorithm can be considered as a form of ML.

Results

A total of 2321 titles were identified, of which 763 were duplicates. Screening was run on 1558 titles, of which 22 were chosen for complete reading.8–29 A total of 18 articles were excluded according to the proposed selection criteria.9–17,20–28 Finally, 4 articles were chosen for the next review.8,18,19,29 In addition, 8 publications were retrieved from other sources (Library of the Argentine Association of Orthopaedics and Traumatology and manual search for citations or snowballing).30–37 Fig. 1 presents the PRISMA flowchart.

Figure 1.

Flowchart according to PRISMA 2020. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: An updated guide to reporting systematic reviews. BMJ 2021;372:n71. doi:10.1136/bmj.n71. For more information: http://www.prisma-statement.org/.

All studies included describe the development and internal validation of predictive models based on the use of AI for the prediction of complications in thoracolumbar spinal surgery as a result of degenerative disease. We did not find any studies that carry out external validation of previously developed predictive models.

According to the type of degenerative disease, 7 publications (58.3%) included patients with adult scoliosis30–34,37; 4 (33%) included patients with degenerative disease in general (not scoliosis)18,19,29,36 and one (8.33%) with patients with degenerative spondylolisthesis, exclusively.8

Although all publications assess complications as a primary outcome, the variable “complications” had different definitions in all the publications. In 5 articles (41.6%), perioperative complications were assessed as the primary outcome, including clinical and surgical complications, with no consensus on the definition.8,31,37 In 2 articles (16.6%) surgical site infection was considered19,29; in 2 (16.6%) kyphosis or proximal junction failure32,33; one (8.33%) grouped mechanical complications (proximal junction failure, proximal junction kyphosis, implant complications, bar rupture),30 in another (8.33%) pseudoarthrosis34 and in another (8.33%) deep vein thrombosis/pulmonary thromboembolism.18 Table 2 summarises the characteristics of the studies included.

Table 2.

Study characteristics.

Author (year); institutions, country.	Pathology	Selection criteria	Machine learning algorithm	Demographic data	Follow-up	Sample split % training: validation	Outcome	Funding and conflict of interest
Kim et al. (2018)37Multicentre study; US	ASD	Inclusion: Patients aged over 18 years undergoing ASD surgery.Exclusion: Patients with missing preoperative data, emergency cases, class 2, 3 or 4 wounds, open wounds on the body, sepsis, pneumonia, previous surgeries within 30 days, cardiopulmonary resuscitation before surgery, or spinal neoplasm.	LR and ANN	Sample: 5794 – M: 2376 (41%) – F: 3418 (59%) Age; mean 59.5 (DE: NR)	2010–2014	70:30	Complications:- Cardiac complications- PE/DVT- Wound	No
Noh et al. (2023)30Single centre; Korea	ASD	Inclusion: Spine surgery for ASD and one or more radiological criteria (Coronal Cobb angle greater than 20°; sagittal vertical axis greater than 5cm; pelvic tilt greater than 25°; TK>60°; PI-LL>10°; fixation of at least 4 levels); Follow-up for a period of 2 years or more.Exclusion: Syndromic deformity, autoimmune disease, infection, tumour, or any other pathological conditions.	LR; Gradient boosting; Random forest; ANN	Sample: 238 – M: 34 (14%) – F: 204 (86%) Age; mean: NR (training set: 67.8±7.49; validation set: 66.94±6.98 years old)	2009–2017; Follow-up>2 years	70:30	Mechanical complications	No
Yagi et al. (2018)33Single centre; Japan	ASD	Inclusion: ASD patients aged≥50 years, meeting radiological criteria (Cobb angle≥20°; C7 SVA≥5cm; PT≥25°), with fusion of ≥5 levels, and minimum follow-up of ≥ 2years. Exclusion: Poor quality radiographs; syndromic, neuromuscular or other spinal pathologies.	DNDT; To build a Decision-making Tree C5.0	Sample n=145 Sex and age NR. Group Training: n=112 sex M:F (5:107); age (63.9±9.4). Group Validation: n=33 Age and sex NR	Study period: NR; Follow-up: 2 years	70:30	PJK/PJF	NO
Scheer et al. (2016)32Multicentre; USA	ASD	Inclusion: Patients aged over 18 years old; Radiological criteria: coronal Cobb angle≥20°; C7 SVA≥5cm; PT≥25°; and/or thoracic kyphosis greater than or equal to 60°; Fusion of 4 or more levels was required; A minimum follow-up period of 2 years was required.Exclusion: Patients with neuromuscular deformity, infection or malignancy were excluded from the study.	DNDT; Decision-making Tree C5.0	Sample: 510; Sex F:M (396:114); Age. 57.2±13.9 years old.	Period: NR; Follow-up: 2 years	70:30	PJK/PJF	Yesa
Scheer et al. (2018)34Multicentre; USA.	ASD	Inclusion: Participants aged over 18 years oldRadiological criteria: Cobb angle≥20°; C7 SVA≥5cm; PT≥25°; and/or thoracic kyphosis greater than or equal to 60 degrees.Fusion of 4 or more levels was required.A minimum follow-up period of 2 years was required.Exclusion: neuromuscular deformities, infections, and malignancies. Revision surgery was indicated only if there were reasons other than pseudoarthrosis.	DNDT; Decision-making Tree C5.1	Sample: 336; F:M=268:68; Age. mean 57.7±15.1 years old.	Period: NR; Follow-up: 2 years	Validation set n=126 (randomised).	Pseudoarthrosis	Yesa
Pellisé et al. (2019)35Multi-centre: Spain USA, Switzer-land, Turkey, France.	ASD	Inclusion: Age>18 years. Radiological criteria: Cobb coronal≥20°; SVA≥5cm; PT≥25°; and/or thoracic kyphosis greater than or equal to 60 degrees.Exclusion: NR	Random forest	Sample n=1612; F:M NR; Age. mean NR.; Training (n=1289; F:M 1000:289; Age. mean 56.5±17.3); Validation (n=323; F:M 235:88; Age. mean 57.6±17.8)	2008–2016; Follow-up 730 days	80–20	Major complication	Yesa
Xiong (2022)29Single centre; China.	DSD	Inclusion: Patients aged 18 years or older with degenerative lumbar disease which includes herniated disc, lumbar stenosis, spondylolisthesis, or instability and had undergone posterior lumbar interbody fusion (at least one level). Exclusion: history of spinal surgery, active infection or tumour, and deformity.	Boosted Classification Trees, Boosted Logistic Regression, Extreme Gradient Boosting, Stochastic Gradient Boosting, Generalised Linear Model, AdaBoost Classification Treesa, and a Forest.	Sample: 584; F:M 321:263; Age, mean 58.36±13:76 years old; Disc herniation: 284; Lumbar stenosis:137; spondylolisthesis/instability: 163.	2019–2021 Follow-up: 90 days.	50:50	Surgical site infection	No
Fatima (2020)8Multicentre study; USA	DSD	Inclusion: Decompression surgery, arthrodesis or instrumentation of the lumbar spine; lumbar degenerative spondylolisthesis; operated between 2005 and 2016; by neurosurgery or traumatology, under general anaesthesia and inpatients. Exclusion: NR	LR and LASSO: least absolute shrinkage and selection operator	Sample: 80,610; Age, median 58 years old (range:18–89); F:M 38,874:41,654.	2005–2016 Follow-up: 30 days	70:30	Advetrse events	No
Zehnder (2021)36Multicentre study. Switzer-land, UK, Italy.	DSD	Inclusion: spinal surgery for degenerative lumbar disease; Age 18–95 years. Exclusion: cases with missing data.	Shrinkage Algorithm (dfbeta method)	Sample: 23,714; F:M 12,264:11,450; Age. mean 58.9±15.7 years old.	2012–2017 Follow-up until hospital dis-charge.	NR	Surgical complications: perioperative and general.	No
Scheer (2017)31Multicentre study; USA	ASD	Inclusion: Age>18 years Radiological criteria: coronal Cobb≥20°; SVA≥5cm; PT≥25°; or thoracic kyphosis≥60°. Exclusion: neuromuscular deformity, infection or malignant neoplasia.	DNDT; Decision-making Tree C5.0	Sample: 557 F:M=439:118; Age. mean 57.5±15.3 years old.	Period: NR; Follow-up: 6 weeks.	70:30	Major complication	Yesa
Wang (2021)18Multi-centre study. USA	DSD	Inclusion: posterior lumbar fusion (1 level). Exclusion: trauma, tumours, revision surgery.	XGBoost (extreme gradient boosting)	Sample n=13,500 Age. categories n(%): 19–34 years old=490 (3.63); 35–49 years old=2146(15.9); 50–65 years old=5050 (37.41); >65 years old=5814(43.07). F:M 7516:5984.	2010–2017 Follow-up: 30 days.	80:20	PE/DVT	No
Liu (2022)19Single centre; China	DSD	Inclusion: degenerative low back disease (canal stenosis; herniated disc; degenerative spondylolisthesis); single posterior approach surgery; elective surgery. Exclusion: emergency surgery.	RL, multilayer perceptron, decision tree, random forest, gradient boosting machine, and XGBoost (extreme gradient boosting)	Sample: 288; Age. mean: 55.3±12.3 F:M NR	2010–2019Follow-up: NR	70:30	Surgical site infection	Yesa

Abbreviations: ASD=adult spinal deformity; ANN=artificial neural network; DNDT=deep neural decision tree; F:M=female:male; NR=not reported; SD=standard deviation; PE/DVT=pulmonary embolism/deep venous thrombosis; PJK/PJF=proximal junctional kyphosis/failure; SVA: sagittal vertical alignment; PT=pelvic tilt; PI=pelvic incidence; PI-LL=pelvic incidence minus lumbar lordosis.

a

Declare funding and/or at least one financial conflict of interest.

The measures commonly used to assess the performance of predictive models were the area under the curve (n=12; 100%) and the accuracy of the model (n=7; 58.3%). To a lesser extent, sensitivity (recall; n=4; 33%, specificity n=3; 25%) and, rarely, positive predictive value (accuracy) were reported. The performance of the predictive models was variable, depending on the outcome considered (general versus specific complications) and the type of machine learning model used. Taking the model with the best performance of each publication, the area under the curve (AUC) ranged between 0.6 and 1.0; and was excellent or exceptional (AUC>0.8) in more than half of the publications (n=7; 58.3%).19,29–34 In the other 5 publications, the performance according to the AUC was acceptable (AUC=0.7–0.8) in at least one of the outcome variables analysed.8,18,35–37 Half of the studies did not report the estimated AUC accuracy (95%CI). The results of the studies are described in Table 3.

Table 3.

Results of the studies.

Author (year). centres; country.	Pathology	Data origin	Algorithm	Outcome	Model performancea
					Accuracy (CI 95%)	AUC–ROC (CI 95%)	Recall (CI 95%)	Specificity (CI 95%)	Observations
Kim et al. (2018)37Multicentre study; United States	ASD	NSQIP	LR and ANN	Complications:- Cardiac complications- PE/DVT- Wound	NR	Cardiac complications=0.768 (0.76–0.77) PE/DVT=0.542 (0.53–0.55) Wound=0.606 (0.60–0.61)	Wound=0.657(NR)	Wound=0.587 (NR)	Better results with ANN (Except for PE/DVT).
Noh et al. (2023)30Single centre; Korea	ASD	RC	LR; Gradient boosting; Random forest; DNN	Mechanical complications	1.000 (1.000–1.000)	1.000 (1.000–1.000)	1.000 (1.000–1.000)	1.000 (1.000–1.000)	Better results with random forest
Yagi et al. (2018)33Single centre; Japan	ASD	RC	DNDT; To build a Decision-making Tree C5.0	PJK/PJF	0.981 (NR)	1.0 (NR)	NR	NR	Better results including the predictive variable “T-score≤−1.5”
Scheer et al. (2016)32Multicentre study.United States	ASD	RC	DNDT; Decision-making Tree C5.0	PJK/PJF	0.863 (NR)	0.89 (NR)	NR	NR	–
Scheer et al. (2018)34Multicentre study;United States.	ASD	RC	DNDT; Decision-making Tree C5.1	Pseudoarthrosis	0.876 (NR)	0.89 (NR)	NR	NR	–
Pellisé et al. (2019)35Multicentre study;Spain, United States, Switzerland, Turkey, and France.	ASD	RC	Random forest	Major complications	NR	0.717 (0.68–0.75)	NR	NR	–
Xiong (2022)29Single centre; China.	DSD	RC	Boosted Classification Trees, Boosted Logistic Regression, Extreme Gradient Boosting, Stochastic Gradient Boosting, Generalised Linear Model, AdaBoost Classification Treesa, and Random Forest.	Surgical site infection	0.8247 (NR)	0.906 (NR)	0.9375 (NR)	0.818 (NR)	Better results with AdaBoost Classification Tress
Fatima (2020)8Multicentre study; USA.	ESD	NSQIP	LR and LASSO: least absolute shrinkage and selection operator	Adverse events	NR	General: 0.70 (0.62–0.74); Surgical complications 0.70 (NR); Clinical complications 0.70 (NR)	NR	NR	Better results with LR
Zehnder (2021)36Multicentre study. Switzerland, UK, Italy.	DSD	EUROSPINE Spine Tang	Shrinkage Algorithm (dfbeta method)	Surgical complications: perioperative and general.	NR	Generales 0.74 (0.72–0.76); Quirúrgicas 0.64 (0.62–0.65).	NR	NR	–
Scheer (2017)31Multicentre study; USA	ASD	RC	DNDT; Decision-making Tree C5.0	Major Complication	0.876 (NR)	0.89 (NR)	NR	NR	–
Wang (2021)18Multicentre study. USA	DSD	NSQIP	XGBoost (extreme gradient boosting)	PE/DVT	NR	0.716 (0.701–0.731)	NR	NR	–
Liu (2022)19Single centre; China	DSD	RC	LR, multilayer perceptron, decision tree, random forest, gradient boosting machine, and XGBoost (extreme gradient boosting)	Surgical site infection	0.860 (NR)	0.923 (NR)	0.834 (NR)	NR	Better results with XGBoost

Abbreviations: ANN=artificial neural network; ASD=adult spinal deformity; AUC=area under the curve; DNDT=deep neural decision tree; DNN=deep neural network; DSD=degenerative spine disorders; LR=logistic regression; NR=not reported; NSQIP=The National Surgical Quality Improvement Programme; PE/DVT=pulmonary embolism/deep vein thrombosis; PJK/PJF=proximal junctional kyphosis/proximal junctional failure; RC=retrospective cohort; SSIs=surgical site infections.

a

In the case of multiple predictive models, results of the best predictive model were reported.

Effectiveness against other predictive methods

In 5 publications (41.6%), the effectiveness of predictive AI models for the prediction of general or specific complications was compared.8,18,19,30,37

Kim et al. compared the performance of the artificial neural network (ANN)-based machine learning predictive algorithm with logistic regression and the American Society of Anesthesiologists (ASA) pre-anaesthesia assessment scale for the prediction of 3 outcome variables (cardiac complications, deep vein thrombosis/lung thromboembolism/wound complications. AUC performance of the AI predictive algorithm was superior in 2 of the 3 outcomes estimated by logistic regression (except for the prediction of deep vein thrombosis/lung thromboembolism) and in all with respect to the ASA scale. Additionally, the sensitivity of ANN was higher than logistic regression in predicting wound complications37: [ANN AUC: cardiac complications 0.768 (95%CI 0.76–0.77); DVT/PTE: 0.542 (95%CI 0.53–0.55); wound complications 0.606 (95%CI 0.60–0.61). Logistic regression AUC: cardiac complications 0.690 (95%CI 0.68–0.69); DVT/PTE: 0.547 (95%CI 0.54–0.55); wound complications 0.575 (95%CI 0.56–0.58); wound complications 0.575 (95%CI 0.56–0.58): 0.56–0.58); ASA AUC: cardiac complications 0.469 (95%CI: 0.46–0.47); DVT/PTE: 0.485 (95%CI: 0.47–0.49); wound complications 0.508 (95%CI: 0.50–0.51)].

In the publication by Wang et al. on the prediction of deep vein thrombosis/pulmonary thromboembolism, the AUC for the predictive model (0.716; 95% CI: 0.701–0.731) of machine learning was significantly higher (p<0.001) than the AUC for the ASA and the Charlson Comorbidity Index.18

Noh et al. compared 3 predictive machine learning models (gradient boosting, random forest and deep neural network) with logistic regression. The random forest AI model [AUC=1.000 (95%CI: 1.000–1.000)] achieved the best predictive performance.30

Fatima et al. compared the predictive machine learning model (LASSO) with 2frailty indices (mFI-5 and mFI-11) and with the logistic regression method. The performance of the AI-based predictive model [AUC: 0.65; 95% CI: 0.61–0.69] was lower than that of logistic regression [AUC=0.70; 95% CI: 0.62–0.74] for the general prediction of adverse events and for specific events. However, the performance was significantly better (p<0.001) than for the 2 frailty indices [mFI-5 AUC=0.50 (95% CI: 0.47–0.53); mFI-11 AUC=0.56 (95% CI: 0.54–0.59)].8

Liu et al. compared the performance of 6 predictive models including logistic regression (AUC=0.871) and determined that the extreme gradient boosting model had the best predictive performance (AUC=0.923).19

Risk of bias

Using the Robins-E (The Risk Of Bias In Non-randomised Studies of Exposure) tool for the assessment of risk of bias in non-randomised observational studies, all articles included were globally catalogued as having very high risk of bias, high or very high risk in almost all domains of the tool (confounding, exposure measurement, selection of participants, data lost (Fig. 2).

Figure 2.

Stacked bar chart. Distribution of articles by domains of the ROBINS-E tool for the assessment of the risk of bias.

With the PROBAST (Prediction Model Risk Of Bias Assessment Tool) tool, all studies (n=12; 100%) were at high risk of bias in at least one of the 4domains that make up the scale (selection bias; bias associated with predictive factors; bias in outcome assessment; analysis bias). Patient selection and outcome endpoint assessment were the 2 most frequently assessed domains at high risk of bias (Fig. 3).

Figure 3.

Stacked bar chart. Distribution of articles according to domains of the PROBAST tool for risk of bias assessment in predictive modelling studies.

Given the heterogeneity of the samples (cohorts or databases), the results of interest (definition of complications) and the evaluation metrics of the algorithms, a meta-analysis was not performed.

Discussion

The field of AI includes a variety of areas with current or potential applications in health care. Among these are ML (the focus of this review); natural language processing used in chatbots; augmented, mixed and virtual reality; and robotic surgery. These technologies not only impact spinal surgery but also broad areas of medical practice and other disciplines.3,4,38

Machine learning is a branch of AI that enables computers to learn. It involves the development of algorithms that improve their performance with experience, and the incorporation of new data into the system enables them to improve their performance.7 Machine learning has a wide range of applications, one of these being the development of multivariable predictive models.3,4 A multivariate prediction model is a mathematical equation that relates multiple predictors (risk factors, predictive, independent variables, covariates) for a particular individual to the probability or risk of the presence (diagnosis) or future occurrence (prognosis) of a particular outcome.38 The development of predictive models involves the selection of predictors and their combination in a multivariate model. Traditionally, the estimation of multivariate prognostic outcomes was based on statistical techniques, such as logistic regression and Cox regression.37 The use of AI techniques makes it possible to address a limiting factor of traditional statistical methodology, which is the condition that statistical power decreases as the dimension of multivariate analysis increases. In addition, machine learning does not necessarily propose a predetermined hypothesis at the beginning of the study and algorithms can correlate information and associations, which might otherwise have been overlooked or unnoticed due to their complexity and multifactorial origins.3

In this review, the authors set out to assess the effectiveness of AI-based predictive models for predicting complications in patients treated with degenerative thoracolumbar spinal surgery. As a result, we found no robust evidence in favour of the performance of AI-based algorithms, compared to other traditional predictive methods. Studies of development and internal validation of predictive models with good performance according to the AUC predominated, which ranged mostly between acceptable and excellent. However, only 5 (41%) studies compared their performance with traditional statistical techniques or with scales or scoring systems.8,18,19,30,37

The evidence was weak, due to the high risk of bias in all studies, with bias predominating in the assessment of the outcome variable and the selection of patients. In the retrieved publications, there was a heterogeneity in the definition of the outcome variable “complications” that prevented the synthesising of the data and guiding a recommendation. Sometimes, the definition of perioperative complication included those that occurred during the intraoperative and immediate postoperative periods, which, according to the researchers, is a weakness, since these can be conditioned by different risk variables and grouping them together adds to the possibility of confounding bias.8,31,35,36 On the other hand, in some of the publications, the estimate of the complication was made based on the information available in national databases, previously set up for a different purpose and with limited follow-up time (30 days).18,37

It should be noted that, in a surgical specialty whose performance may be conditioned by the environment, the experience of the surgeons and institutions, and the resources and characteristics of the health care system in each country or region, it is difficult to express the benefits of predictive algorithms of surgical complications on samples made up of retrospective cohorts in a single centre, non-representative multicentre cohorts, databases prepared for a different purpose, or samples obtained by non-probabilistic sampling techniques subject to selection bias. In addition, we could mention other main sources of bias in the publications included in this review such as: the lack of prospective studies or samples of randomly selected cases, or the absence of external validation studies of predictive algorithms that enable make it possible to estimate their performance with data outside the database used for their development, training and validation. Only half of the articles published the points estimated (e.g. the AUC) with their respective confidence intervals, which made it impossible to assess the accuracy of these estimates.

Despite the above and the evident low quality of the available evidence, the authors observed a trend towards a benefit of the use of AI-based predictive models as a tool to establish the individual risk of complications of spinal surgery in patients with degenerative thoracolumbar vertebral disease. In the near future, these techniques could guide the decision-making of spinal surgeons. Estimating the surgical risk in a given patient represents a real challenge due to the large number of variables that interact in a complex manner and impact on the overall risk. Variables include some characteristics that can be generalised along with others that are specific to the environment. Therefore, the recording of local and regional data is the basis for the development of future predictive algorithms that enable us to recognise the risk of our patients with accuracy and precision.

The predominant limitations of this review are that some relevant literature may not have been retrieved because the search was done exclusively in the MEDLINE, Cochrane Library and Lilacs databases. The search was restricted to articles in English, Spanish and Portuguese. In addition, the grey bibliography was not consulted. There is consensus, however, on the adequate reporting of predictive algorithm research, which would enable a more rigorous selection of articles for data synthesis. Nevertheless, the scarcity of available studies and the lack of previous systematic reviews on the topic led the authors of the present review to adopt more flexible eligibility criteria.

Conclusions

This systematic review provides an up-to-date view of the application of predictive AI models, in particular, machine learning, for the identification of the risk of complications in patients treated with surgery for degenerative disease of the thoracolumbar spine. Although the available evidence is limited and at high risk of bias, the studies analysed indicate that these models may have a promising performance in predicting complications, with AUC values, ranging mostly from acceptable to excellent. Future research with regional databases, more robust methodologies and external validations are needed to improve the reliability and applicability of these models.

Level of evidence

Level of evidence iii.

Ethical considerations

The following paper is a systematic review of the literature, based on data from published primary studies, and is therefore exempt from evaluation by an ethics committee. It does not include primary data from patients or animals.

Funding

No external funding.

Conflict of interest

The authors have no conflicts of interest to declare.

Acknowledgements

The authors thank Dr. Víctor Barrientos, from the Hospital del Trabajador (Santiago, Chile) for his help with the methodology.

References

[1]

S. Dagenais, J. Caro, S. Haldeman.

A systematic review of low back pain cost of illness studies in the United States and internationally.

Spine J, 8 (2008), pp. 8-20

http://dx.doi.org/10.1016/j.spinee.2007.10.005 | Medline

[2]

G.B. Andersson.

Epidemiologic features of chronic low-back pain.

Lancet, 354 (1999), pp. 581-585

http://dx.doi.org/10.1016/S0140-6736(99)01312-4 | Medline

[3]

S.R. Browd, C. Park, D.A. Donoho.

Potential applications of artificial intelligence and machine learning in spine surgery across the continuum of care.

Int J Spine Surg, (2023 Jun 8), pp. 8507

http://dx.doi.org/10.14444/8507

[4]

N.J. Lee, J.M. Lombardi, R.A. Lehman.

Artificial intelligence and machine learning applications in spine surgery.

Int J Spine Surg, 16 (2023), pp. 8503

http://dx.doi.org/10.14444/8503

[5]

L. Bero, N. Chartres, J. Diong, A. Fabbri, D. Ghersi, J. Lam, et al.

The risk of bias in observational studies of exposures (ROBINS-E) tool: concerns arising from application to observational studies of exposures.

Syst Rev, 7 (2018), pp. 242

http://dx.doi.org/10.1186/s13643-018-0915-2 | Medline

[6]

R.F. Wolff, K.G. Moons, R.D. Riley, P.F. Whiting, M. Westwood, G.S. Collins, et al.

PROBAST: a tool to assess the risk of bias and applicability of prediction model studies.

Ann Intern Med, 170 (2019), pp. 51-58

http://dx.doi.org/10.7326/M18-1376 | Medline

[7]

J.N. Mandrekar.

Receiver operating characteristic curve in diagnostic test assessment.

J Thorac Oncol, 5 (2010), pp. 1315-1316

http://dx.doi.org/10.1097/JTO.0b013e3181ec173d | Medline

[8]

N. Fatima, H. Zheng, E. Massaad, M. Hadzipasic, G.M. Shankar, J.H. Shin.

Development and validation of machine learning algorithms for predicting adverse events after surgery for lumbar degenerative spondylolisthesis.

World Neurosurg, 140 (2020), pp. 627-641

http://dx.doi.org/10.1016/j.wneu.2020.04.135 | Medline

[9]

G.K. Harada, Z.K. Siyaji, G.M. Mallow, A.L. Hornung, F. Hassan, B.A. Basques, et al.

Artificial intelligence predicts disk re-herniation following lumbar microdiscectomy: development of the “RAD” risk profile.

Eur Spine J, 30 (2021), pp. 2167-2175

http://dx.doi.org/10.1007/s00586-021-06866-5 | Medline

[10]

A.V. Karhade, H.A. Fogel, T.D. Cha, S.H. Hershman, T.P. Doorly, J.D. Kang, et al.

Development of prediction models for clinically meaningful improvement in PROMIS scores after lumbar decompression.

Spine J, 21 (2021), pp. 397-404

[11]

D. Müller, D. Haschtmann, T.F. Fekete, F. Kleinstück, R. Reitmeir, M. Loibl, et al.

Development of a machine-learning based model for predicting multidimensional outcome after surgery for degenerative disorders of the spine.

Eur Spine J, 31 (2022), pp. 2125-2136

http://dx.doi.org/10.1007/s00586-022-07306-8 | Medline

[12]

C.F. Pedersen, M.Ø. Andersen, L.Y. Carreon, S. Eiskjær.

Applied machine learning for spine surgeons: predicting outcome for patients undergoing treatment for lumbar disc herniation using PRO data.

Global Spine J, 12 (2022), pp. 866-876

[13]

Z. Ghogawala, M.R. Dunbar, I. Essa.

Lumbar spondylolisthesis: modern registries and the development of artificial intelligence.

J Neurosurg Spine, 30 (2019), pp. 729-735

http://dx.doi.org/10.3171/2019.2.SPINE18751 | Medline

[14]

G. Purohit, M. Choudhary, V.D. Sinha.

Use of artificial intelligence for the development of predictive model to help in decision-making for patients with degenerative lumbar spine disease.

Asian J Neurosurg, 17 (2022), pp. 274-279

[15]

J.S. Kim, R.K. Merrill, V. Arvind, D. Kaji, S.D. Pasik, C.C. Nwachukwu, et al.

Examining the ability of artificial neural networks machine learning models to accurately predict complications following posterior lumbar spine fusion.

Spine (Phila Pa 1976), 43 (2018), pp. 853-860

http://dx.doi.org/10.1097/BRS.0000000000002442 | Medline

[16]

A. Wirries, F. Geiger, A. Hammad, L. Oberkircher, I. Blümcke, S. Jabari.

Artificial intelligence facilitates decision-making in the treatment of lumbar disc herniations.

Eur Spine J, 30 (2021), pp. 2176-2184

http://dx.doi.org/10.1007/s00586-020-06613-2 | Medline

[17]

K.U. Lewandrowski, N. Muraleedharan, S.A. Eddy, V. Sobti, B.D. Reece, J.F. Ramírez León, et al.

Artificial intelligence comparison of the radiologist report with endoscopic predictors of successful transforaminal decompression for painful conditions of the lumber spine: application of deep learning algorithm interpretation of routine lumbar magnetic resonance imaging scan.

Int J Spine Surg, 14 (2020), pp. S75-S85

http://dx.doi.org/10.14444/7130 | Medline

[18]

K.Y. Wang, I. Ikwuezunma, V. Puvanesarajah, J. Babu, A. Margalit, M. Raad, et al.

Using predictive modeling and supervised machine learning to identify patients at risk for venous thromboembolism following posterior lumbar fusion.

Global Spine J, (2021),

http://dx.doi.org/10.1177/21925682211019361

[19]

W.C. Liu, H. Ying, W.J. Liao, M.P. Li, Y. Zhang, K. Luo, et al.

Using preoperative and intraoperative factors to predict the risk of surgical site infections after lumbar spinal surgery: a machine learning-based study.

World Neurosurg, 162 (2022), pp. e553-e560

http://dx.doi.org/10.1016/j.wneu.2022.03.060 | Medline

[20]

A.A. Shah, S.K. Devana, C. Lee, A. Bugarin, E.L. Lord, A.N. Shamie, et al.

Prediction of major complications and readmission after lumbar spinal fusion: a machine learning-driven approach.

World Neurosurg, 152 (2021), pp. e227-e234

http://dx.doi.org/10.1016/j.wneu.2021.05.080 | Medline

[21]

G. Ren, L. Liu, P. Zhang, Z. Xie, P. Wang, W. Zhang, et al.

Machine learning predicts recurrent lumbar disc herniation following percutaneous endoscopic lumbar discectomy.

Global Spine J, 2 (2022),

[22]

N. Agarwal, A.A. Aabedi, A.K. Chan, V. Letchuman, S. Shabani, E.F. Bisson, et al.

Leveraging machine learning to ascertain the implications of preoperative body mass index on surgical outcomes for 282 patients with preoperative obesity and lumbar spondylolisthesis in the Quality Outcomes Database.

J Neurosurg Spine, 38 (2023), pp. 182-191

[23]

M.S. Shamim, S.A. Enam, U. Qidwai.

Fuzzy Logic in neurosurgery: predicting poor outcomes after lumbar disk surgery in 501 consecutive patients.

Surg Neurol, 72 (2009), pp. 565-572

http://dx.doi.org/10.1016/j.surneu.2009.07.012 | Medline

[24]

V.E. Staartjes, V. Stumpo, L. Ricciardi, N. Maldaner, H.A. Eversdijk, M. Vieli, et al.

FUSE-ML: development and external validation of a clinical prediction model for mid-term outcomes after lumbar spinal fusion for degenerative disease.

Eur Spine J, 31 (2022), pp. 2629-2638

http://dx.doi.org/10.1007/s00586-022-07135-9 | Medline

[25]

S. Dong, Y. Zhu, H. Yang, N. Tang, G. Huang, J. Li, et al.

Evaluation of the predictors for unfavorable clinical outcomes of degenerative lumbar spondylolisthesis after lumbar interbody fusion using machine learning.

Front Public Health, 10 (2022), pp. 835938

http://dx.doi.org/10.3389/fpubh.2022.835938 | Medline

[26]

M. Yagi, T. Michikawa, T. Yamamoto, T. Iga, Y. Ogura, A. Tachibana, et al.

Development and validation of machine learning-based predictive model for clinical outcome of decompression surgery for lumbar spinal canal stenosis.

Spine J, 22 (2022), pp. 1768-1777

http://dx.doi.org/10.1016/j.spinee.2022.06.008 | Medline

[27]

V.E. Staartjes, M.P. de Wispelaere, W.P. Vandertop, M.L. Schröder.

Deep learning-based preoperative predictive analytics for patient-reported outcomes following lumbar discectomy: feasibility of center-specific modeling.

Spine J, 19 (2019), pp. 853-861

http://dx.doi.org/10.1016/j.spinee.2018.11.009 | Medline

[28]

P.S. Page, G.P. Greeneway, S.G. Ammanuel, D.K. Resnick.

Creation and validation of a predictive model for lumbar synovial cyst recurrence following decompression without fusion.

J Neurosurg Spine, 37 (2022), pp. 851-854

http://dx.doi.org/10.3171/2022.5.SPINE22504 | Medline

[29]

C. Xiong, R. Zhao, J. Xu, H. Liang, C. Zhang, Z. Zhao, et al.

Construct and validate a predictive model for surgical site infection after posterior lumbar interbody fusion based on machine learning algorithm.

Comput Math Methods Med, 2022 (2022), pp. 2697841

http://dx.doi.org/10.1155/2022/2697841 | Medline

[30]

S.H. Noh, H.S. Lee, G.E. Park, Y. Ha, J.Y. Park, S.U. Kuh, et al.

Predicting mechanical complications after adult spinal deformity operation using a machine learning based on modified global alignment and proportion scoring with body mass index and bone mineral density.

Neurospine, 20 (2023), pp. 265-274

http://dx.doi.org/10.14245/ns.2244854.427 | Medline

[31]

J.K. Scheer, J.S. Smith, F. Schwab, V. Lafage, C.I. Shaffrey, S. Bess, et al.

Development of a preoperative predictive model for major complications following adult spinal deformity surgery.

J Neurosurg Spine, 26 (2017), pp. 736-743

http://dx.doi.org/10.3171/2016.10.SPINE16197 | Medline

[32]

J.K. Scheer, J.A. Osorio, J.S. Smith, F. Schwab, V. Lafage, R.A. Hart, et al.

Spine (Phila Pa 1976), 41 (2016), pp. E1328-E1335

http://dx.doi.org/10.1097/BRS.0000000000001598 | Medline

[33]

M. Yagi, N. Fujita, E. Okada, O. Tsuji, N. Nagoshi, T. Asazuma, et al.

Fine-tuning the predictive model for proximal junctional failure in surgically treated patients with adult spinal deformity.

Spine (Phila Pa 1976), 43 (2018), pp. 767-773

http://dx.doi.org/10.1097/BRS.0000000000002415 | Medline

[34]

J.K. Scheer, T. Oh, J.S. Smith, C.I. Shaffrey, A.H. Daniels, D.M. Sciubba, et al.

Development of a validated computer-based preoperative predictive model for pseudarthrosis with 91% accuracy in 336 adult spinal deformity patients.

Neurosurg Focus, 45 (2018), pp. E11

http://dx.doi.org/10.3171/2018.9.FOCUS18381 | Medline

[35]

F. Pellisé, M. Serra-Burriel, J.S. Smith, S. Haddad, M.P. Kelly, A. Vila-Casademunt, et al.

Development and validation of risk stratification models for adult spinal deformity surgery.

J Neurosurg Spine, 28 (2019), pp. 1-13

http://dx.doi.org/10.3171/2017.5.SPINE16736 | Medline

[36]

P. Zehnder, U. Held, T. Pigott, A. Luca, M. Loibl, R. Reitmeir, et al.

Development of a model to predict the probability of incurring a complication during spine surgery.

Eur Spine J, 30 (2021), pp. 1337-1354

http://dx.doi.org/10.1007/s00586-021-06777-5 | Medline

[37]

J.S. Kim, V. Arvind, E.K. Oermann, D. Kaji, W. Ranson, C. Ukogu, et al.

Predicting surgical complications in patients undergoing elective adult spinal deformity procedures using machine learning.

Spine Deform, 6 (2018), pp. 762-770

[38]

A. Combalia, M.V. Sánchez-Vives, T. Donegan.

Immersive virtual reality in orthopaedics – a narrative review.

Int Orthop, 48 (2024), pp. 21-30

http://dx.doi.org/10.1007/s00264-023-05911-w | Medline

Indexada en:

Síguenos:

Indexada en:

Síguenos:

Suscríbase a la newsletter