Glucose metabolism disorders (GMDs) are a serious global public health issue, characterized by a high incidence rate and youthfulness. GMDs contribute to the occurrence of major adverse cardiovascular events (MACEs), meanwhile with the rest of the remaining cardiovascular risk factors (CVRFs) act synergistically to influence MACEs. Although numerous studies have explored CVRFs and their prognostic role in cardiovascular diseases, there is a lack of predictive models for novel cardiovascular risk factors and MACEs in the young population, particularly among patients with GMDS. This study aims to investigate important CVRFs affecting MACEs in young patients with GMDs by means of LASSO regression, randomized forest, and XGBoost, providing new evidence to support the early adoption of proactive and comprehensive interventions and management.
MethodsThe study included 411 young patients with GMDs who visit the First People's Hospital of Anqing Affiliated to Anhui Medical University, between September 2022 and June 2023. The patients were randomly divided into a training set and a testing set in a 7:3. Comprehensive analysis was performed using LASSO regression, random forest, and XGBoost methods. The performance of the models was evaluated and validated, and the important cardiovascular risk factors identified by the three models were compared.
ResultsAfter one year of follow-up, the incidence of events was higher in men compared to women (85.2% vs. 14.8%). LASSO regression analysis identified BP, TYg, BMI, FBG, and Tg/HDL as significant variables associated with MACEs. The random forest method highlighted BMI, TYg, Tg/HDL, Tg, and FBG as key factors related to MACEs. The XGBoost model also emphasized the important roles of BMI, TYg, TG, and BP. Combining the results from all three models, BMI, TYg, and BP consistently demonstrated significant importance across all models.
ConclusionsTo some extent, males are more likely to experience adverse cardiovascular events compared to females. Combining the three major predictive models suggests that traditional CVRFs (BP, BMI) and novel CVRFs (TYg) play an important role in the development of distant MACEs in young GMDs, and that interventions for them may have implications for preventing DM transformation.
Los trastornos del metabolismo de la glucosa (TMG) constituyen un grave problema de salud pública a nivel mundial, caracterizado por una elevada tasa de incidencia y juventud. Los TMG contribuyen a la aparición de eventos cardiovasculares adversos mayores (MACE por sus siglas en inglés), mientras que con el resto de los factores de riesgo cardiovascular (FRCV) actúan sinérgicamente para influir en los MACE. Aunque numerosos estudios han explorado los FRCV y su papel pronóstico en las enfermedades cardiovasculares, se carece de modelos predictivos para los nuevos factores de riesgo cardiovascular y los MACE en la población joven, especialmente entre los pacientes con TMG. Este estudio pretende investigar importantes FRCV que afectan a los MACE en pacientes jóvenes con TMG mediante regresión Lasso, bosque aleatorio y XGBoost, proporcionando nuevas evidencias para apoyar la adopción temprana de intervenciones y gestión proactivas e integrales.
MétodosEl estudio incluyó a 411 pacientes jóvenes con TMG que acudieron al Primer Hospital Popular de Anqing Afiliado a la Universidad Médica de Anhui entre septiembre de 2022 y junio de 2023. Los pacientes se dividieron aleatoriamente en un grupo de entrenamiento y un grupo de prueba en una proporción de 7:3. Se realizó un análisis exhaustivo mediante los métodos de regresión Lasso, bosque aleatorio y XGBoost. Se evaluó y validó el rendimiento de los modelos y se compararon los FRCV importantes identificados por los 3 modelos.
ResultadosTras un año de seguimiento, la incidencia de eventos fue mayor en hombres que en mujeres (85,2 frente a 14,8%). El análisis de regresión Lasso identificó la PA, el TyG, el IMC, la FBG y los TG/HDL como variables significativas asociadas a los MACE. El método de bosque aleatorio destacó el IMC, el TyG, los TG/HDL, los TG y la FBG como factores clave relacionados con los MACE. El modelo XGBoost también destacó el importante papel del IMC, el TyG, los TG y la PA. Combinando los resultados de los 3 modelos, el IMC, los TG y la PA demostraron sistemáticamente una importancia significativa en todos los modelos.
ConclusionesEn cierta medida, los varones son más propensos a experimentar eventos cardiovasculares adversos en comparación con las mujeres. La combinación de los 3 principales modelos predictivos muestra que los FRCV tradicionales (PA, IMC) y los FRCV novedosos (TyG) desempeñan un papel importante en el desarrollo de MACE distantes en jóvenes TMG, y que las intervenciones sobre ellos pueden tener implicaciones para prevenir la transformación de la diabetes mellitus.
Glucose metabolism disorders (GMDs) stand out as one of the most prevalent metabolic irregularities, and encompasses two primary states: diabetes mellitus (DM) and prediabetes mellitus (PDM). The latter includes impaired fasting glucose (IFG), impaired glucose tolerance (IGT), or a blend of both. A recent research indicated that the worldwide prevalence of PDM stands at approximately 7.3%. Alarmingly, in China, the prevalence of PDM has surged to 35.2%, marking a significant increase, and suggesting a trend towards younger demographics.1 PDM, which is alternatively termed intermediate hyperglycemia or non-diabetic hyperglycemia,2 is regarded as a high-risk condition that predisposes individuals to type-2 DM,3 hypertension,4 subclinical atherosclerosis,5 and cardiovascular diseases.6 PDM is phenotypically similar to patients with DM, but this is characterized by relatively higher body mass index (BMI), blood pressure, and dyslipidemia.7 A study indicated that approximately 5–10% of PDM patients progress to overt type-2 DM annually, with type-2 DM being associated to significant cardiovascular risks.3
It has been commonly understood that DM is an independent risk factor for cardiovascular disease,8 and that PDM is a strong predictor for cardiovascular conditions.9 Over the past few years, extensive studies that examined the association between PDM, and risk of cardiovascular disease and mortality.10–12 According to the surveys conducted by the ADA and World Health Organization, IFG or IGT was associated to increased all-cause mortality and cardiovascular risk in the general public,6 and this might be associated to certain correlations with microvascular and macrovascular damages.13 Increasing evidences have suggested that PDM exerts serious adverse effects on cardiovascular health and prognosis,14 and individuals in a hyperglycemic state have been considered to have a high risk for cardiovascular events.15 The study conducted by Yong et al. indicated that PDM is independently correlated to major adverse cardiovascular events (MACEs) following PCI, when compared to patients with normal blood glucose levels.16 Furthermore, related studies have reported that younger patients with GMDs exhibited a faster and more destructive progression, when compared to patients who developed the condition later, resulting in lower quality of life.17,18
Cardiovascular risk factors (CVRFs) are defined as factors that are capable of increasing the incidence and mortality of atherosclerotic cardiovascular diseases. The aggregation of CVRFs is particularly prominent in DM, and it is associated to cardiovascular diseases. Traditional CVRFs include hypertension, dyslipidemia, diabetes, smoking, and obesity. Novel cardiovascular risk factors include hyperhomocysteinemia, hyperuricemia, vitamin D levels, C-reactive protein (CRP), fibrinogen, triglyceride-glucose index (Tyg), obstructive sleep apnea (OSA), gut microbiota, physical activity, and others. Recent domestic and international researches have underscored the significant impact of clustered CVRFs on the occurrence of MACEs. As the number of CVRFs increase, the associated risk increases,19 and this is particularly heightened by factors, such as hypertension, dyslipidemia, overweight/obesity, smoking, and elevated blood glucose levels.20,21 Although numerous studies have explored cardiovascular risk factors and their prognostic role in cardiovascular diseases, there is a lack of predictive models for novel cardiovascular risk factors and major adverse cardiovascular events in the young population, particularly among patients with metabolic dysregulation. This study aims to investigate important CVRFs affecting MACEs in young patients with GMDs by means of LASSO regression, randomized forest, and XGBoost, providing new evidence to support the early adoption of proactive and comprehensive interventions and management.
Materials and methodsMaterialsSubjectsThis study included 468 GMD patients aged 18–45 who attended the First People's Hospital of Anqing Affiliated to Anhui Medical University, between September 2022 and June 2023. After 12 months of follow-up, 56 patients were lost to follow-up, ultimately, 411 patients completed follow-up or experienced endpoint events.
Inclusion criteriaThe inclusion criteria were, as follows: (1) patients who were 18–45 years old; (2) patients who met the diagnostic criteria for GMDs; (3) signed informed consent and voluntarily participated in the follow-up study; (4) having complete clinical data.
Exclusion standardsThe exclusion criteria were, as follows: (1) patients with severe complications of glucose metabolism disorders, such as diabetic ketoacidosis, hyperosmolar hyperglycemic syndrome, and organ failure due to diabetes-related complications; (2) patients with acute myocardial infarction (MI) or New York Heart Association (NYHA) functional class III/IV; (3) patients with concomitant severe respiratory, renal, hepatic, pancreatic, neoplastic, or related diseases; (4) patients with autoimmune diseases; (5) refusal to sign the informed consent form, and inability to continue follow-up.
MethodsInclusion indicators of CVRFsInclusion criteria included traditional cardiovascular risk factors such as hypertension, smoking, overweight/obesity, and dyslipidemia; and novel cardiovascular risk factors such as C-reactive protein, uric acid, fibrinogen, triglyceride-glucose index, Tg/HDL ratio, uric acid/HDL ratio, and the ZJU index.
Diagnosis of MACEsThis study primarily observed the occurrence of MACEs in patients, including: (1) unstable angina: patients typically present with chest pain or discomfort, which is more frequent and prolonged; (2) non-fatal myocardial infarction: patients usually exhibit elevated troponin levels or changes in electrocardiogram (ECG); (3) heart failure: patients commonly experience shortness of breath, fatigue, and may have an NYHA classification of III/IV; (4) arrhythmias: common arrhythmias include premature ventricular contractions (PVCs), atrial fibrillation (AF), and supraventricular tachycardia (SVT); (5) stroke: often presents with neurological deficits or impaired brain function.
Construction and evaluation of predictive modelsThe research sample was randomly divided into a training set (n=287) and a test set (n=124) at a 7:3 ratio using R (version 4.3.3) (Fig. 1). The Kaplan–Meier curves are used to describe and compare the survival analysis of patients of different genders. A comprehensive analysis was conducted using three methods: least absolute shrinkage and selection operator (LASSO) regression, random forest, and eXtreme Gradient Boosting (XGBoost), to compare the CVRFs included in each model. Important variables in the LASSO regression model were visualized with a nomogram, while variable importance in the random forest model was depicted using a ranking plot. The key variables in the XGBoost model were identified through the SHapley Additive exPlanations (SHAP). Furthermore, the performance of these three models was evaluated and validated using the area under the receiver operating characteristics (ROC) curve.
Statistical analysisThe data was analyzed using SPSS (version 25.0) and R (version 4.3.3). Normally distributed continuous variables were presented in mean±standard deviation, and the differences between groups are compared using independent samples t-test. Non-normally distributed continuous data were presented in median and interquartile range (IQR), and the differences were analyzed using the Mann–Whitney test. Categorical variables were presented in percentage, and the comparisons between two groups were conducted using χ2-test or Fisher exact test. Cumulative risk curves were generated using the Kaplan–Meier method, and the log-rank test was employed to compare the cumulative incidence rates between the two groups. The results were considered statistically significant when p<0.05.
ResultsComparison of baseline dataA total of 411 patients were enrolled from September 2022 to June 2023. The patients were randomly assigned to the training group (287 cases) and the testing group (124 cases) in a 7:3 ratio. No statistically significant differences were observed between the two groups in demographic characteristics such as age and gender. Furthermore, no significant differences were found between the groups in terms of cardiovascular risk factors (Table 1).
Comparison of baseline data.
| Variable | Divide into groups | Statistic | p-Value | |
|---|---|---|---|---|
| Train (n=287) | Validation (n=124) | |||
| Sex | χ2=0.429 | 0.512 | ||
| Female | 16 (5.57) | 9 (7.26) | ||
| Male | 271 (94.43) | 115 (92.74) | ||
| Age | 36.12±6.55 | 36.65±5.82 | t=−0.817 | 0.415 |
| BMI | 24.66±4.50 | 24.68±5.44 | t=−0.030 | 0.976 |
| FBG | 133.74 (119.16, 180.27) | 132.30 (117.72, 189.76) | Z=0.209 | 0.834 |
| BP | χ2=3.580 | 0.058 | ||
| No | 206 (71.78) | 100 (80.65) | ||
| Yes | 81 (28.22) | 24 (19.35) | ||
| CRP | 5.78 (3.11, 13.52) | 5.00 (2.54, 10.81) | Z=1.936 | 0.053 |
| Uric_acid | 5.32 (4.32, 6.75) | 5.43 (4.32, 6.52) | Z=0.377 | 0.706 |
| Fibrinogen | 2.78 (2.19, 3.78) | 2.56 (2.16, 3.23) | Z=1.919 | 0.055 |
| TC | 177.88 (154.68, 220.42) | 181.75 (162.41, 216.55) | Z=−0.482 | 0.630 |
| TG | 149.64 (90.76, 259.88) | 138.13 (82.13, 247.93) | Z=1.046 | 0.295 |
| HDL | 21.42 (18.81, 25.38) | 20.79 (17.77, 24.71) | Z=1.432 | 0.152 |
| LDL | 100.54 (81.21, 122.78) | 93.39 (76.86, 120.65) | Z=1.430 | 0.153 |
| Smoking | χ2=1.723 | 0.189 | ||
| No | 206 (72.03) | 97 (78.23) | ||
| Yes | 80 (27.97) | 27 (21.77) | ||
| ZJU | 331.58 (252.43, 460.37) | 317.71 (249.33, 434.01) | Z=0.565 | 0.572 |
| Tg/HDL | 7.56 (3.73, 13.56) | 6.27 (3.57, 13.19) | Z=1.316 | 0.188 |
| UHR | 0.26 (0.19, 0.35) | 0.24 (0.17, 0.35) | Z=0.790 | 0.429 |
| Tyg | 9.41±0.93 | 9.36±1.00 | t=0.544 | 0.587 |
BMI, body mass index; FBG, fasting blood glucose; BP, blood pressure; CRP, C-reactive protein; TC, total cholesterol; TG, triglycerides; HDL, high-density lipoprotein; LDL, low-density lipoprotein; Tyg, triglyceride-glucose index.
After one year of follow-up, a total of 27 MACEs were recorded, with 23 occurring in men and 4 in women. The incidence of events was higher in men compared to women (85.2% vs. 14.8%). However, survival analysis based on gender revealed that women had a significantly higher risk of cardiovascular adverse events compared to men (p=0.039) (Fig. 2).
LASSO regression model construction and evaluationWe conducted variable selection using LASSO regression on 17 variables, including traditional MACEs (like hypertension, dyslipidemia, diabetes, smoking, and obesity) and novel cardiovascular risk factors (CRP, uric acid, fibrinogen, Tyg, Tg/HDL, ZJU, UHR). Through the application of the LASSO algorithm, five important variables were identified: BP, TYg, BMI, FBG, and Tg/HDL. The model's performance was visualized using a nomogram (Fig. 3c). And evaluation showed excellent predictive performance (AUC=0.917) (Fig. 3d).
LASSO regression model construction and evaluation. (a) LASSO regression path plot. The changes in the coefficients of different variables in the LASSO regression model. When the λ is set to 1×10−4 (the blue line in the figure), the coefficient trends for each variable remain relatively stable. (b) Bar chart of important variable coefficients in the LASSO algorithm. (c) The nomogram constructed based on LASSO regression. (d) The ROC curve analysis plot. The blue curve represents the ROC curve for the training set, while the red curve represents the ROC curve for the testing set. Abbreviations: BMI, body mass index; BP, blood pressure; CRP, C-reactive protein; FBG, fasting blood glucose; HDL, high-density lipoprotein; LDL, low-density lipoprotein; TC, total cholesterol; TG, triglycerides; Tyg, triglyceride-glucose index.
The random forest algorithm was applied to select from 17 variables, and the results identified BMI, Tyg, Tg/HDL, Tg and FBG as key factors associated with MACEs (Fig. 4a). The random forest feature importance plot revealed that BMI had the highest importance score, while Tyg showed the highest accuracy, indicating that these two variables are the most significant and reliable in the model (Fig. 4b). Additionally, 10-fold cross-validation of the model demonstrated relatively low performance, with an AUC of 0.582 (Fig. 4c).
Random forest model construction and evaluation. (a) The Gini index-based variable importance ranking plot derived from the random forest model. (b) The feature importance ranking plot based on the random forest model. (c) The cross-validation ROC curve plot based on the random forest model. Abbreviations: BMI, body mass index; BP, blood pressure; CRP, C-reactive protein; FBG, fasting blood glucose; HDL, high-density lipoprotein; LDL, low-density lipoprotein; TC, total cholesterol; TG, triglycerides; Tyg, triglyceride-glucose index.
The XGBoost was applied to select from 17 variables, revealing that BMI, Tyg, TG, fibrinogen, and Tg/HDL play significant roles in MACEs (Fig. 5a). To better understand the impact of these variables on outcomes, SHAP values were computed for each feature, and the top 10 were selected. Based on the ranking of mean absolute SHAP values, the five most important variables – BMI, Tyg, Tg, BP, and FBG – were identified (Fig. 5b). Fig. 5c shows the violin plots for each feature, illustrating the correlation between feature values and their corresponding SHAP values. Additionally, model evaluation demonstrated strong performance (AUC=0.743) (Fig. 5d). By integrating the results from the LASSO regression and random forest models, it was evident that BMI, Tyg, and BP were consistently important across all models.
XGBoost model construction and evaluation. (a) The feature importance scatter plot based on the XGBoost model. (b) The feature importance bar plot based on SHAP values. (c) The variable impact scatter plot based on model output. (d) The ROC curve and AUC value based on the XGBoost model. Abbreviations: BMI, body mass index; BP, blood pressure; CRP, C-reactive protein; FBG, fasting blood glucose; HDL, high-density lipoprotein; LDL, low-density lipoprotein; TC, total cholesterol; TG, triglycerides; Tyg, triglyceride-glucose index; SHAP, SHapley Additive exPlanation.
This study indicates that, to some extent, males are more susceptible to experiencing MACEs than females. Additionally, traditional CVRFs (BP and BMI) and novel risk factors (TYg) play a significant role in the occurrence of long-term cardiovascular events in young patients with GMDs. Intervening in these factors may be crucial for preventing the progression to DM.
There are discrepancies among studies regarding the gender differences in the relationship between CVRFs and MACEs. A study by Tsao and Vasan22 found that women with diabetes have a higher risk of cardiovascular disease and a higher mortality rate after onset compared to men. However, the study by Muthalaly et al.23 shares the same view as the present study, suggesting that the association between cardiovascular events and males is stronger than that with females. It was noted in that study that for every 1% increase in males, there were 6.4 additional cardiovascular events per 10,000 people. In this study, although the risk of MACEs is higher in females compared to males, the total number of events in males remains higher. Besides the influence of the larger sample size of males at baseline, this may also be related to the higher smoking rate in males. Relevant studies have indicated that smoking is associated with myocardial infarction and sudden death, and smoking cessation can significantly reduce the occurrence of severe cardiovascular adverse events.24
Blood pressure and obesity, as traditional cardiovascular risk factors, are well recognized for their impact on MACEs. Studies such as the Framingham study,25 MRFIT,26 and many others27 have consistently demonstrated that as blood pressure rises, the risk of myocardial infarction and stroke increases significantly. Similarly, obesity has been found to promote inflammatory responses, contribute to insulin resistance, and is closely associated with an elevated risk of adverse cardiac metabolic conditions and increased mortality.28 Tyg, as a novel MACE, has been recognized as a reliable surrogate index for assessing insulin resistance. In the study by Tai et al.,29 the relationship between cumulative Tyg and the risk of MACEs in patients with type 2 diabetes was explored. The results indicated that with the increase in cumulative Tyg, the incidence of MACEs significantly increased, suggesting that monitoring Tyg could be useful in predicting MACEs. Additionally, related studies30 have shown that elevated Tyg is closely associated with insulin resistance (IR). In this study, Tyg plays a significant role in all three models, highlighting its important association with cardiovascular adverse events.
In recent years, several epidemiological surveys have revealed that elevated systolic or diastolic blood pressure in young patients increases the risk of long-term MACEs.31 This is primarily attributed to the vascular remodeling induced by the elevated blood pressure. Extensive lipid deposition leads to vascular fibrosis and the formation of atherosclerotic plaques. This would result in luminal narrowing, and ultimately induce the ischemic necrosis of various organs throughout the body, further increasing the occurrence of MACEs.32 A recent research highlighted the pivotal role of insulin resistance and hyperinsulinemia in the onset of hypertension, which is primarily linked to the upregulation of the renin–angiotensin–aldosterone system and heightened sympathetic nervous system activity.33 At the same time, hypertension, as a potent factor in damaging the vascular wall, leads to endothelial dysfunction, ultimately culminating in the occurrence of clinical vascular events.34
Currently, various models have been proposed both domestically and internationally to comprehensively assess cardiovascular metabolic risks, such as the Framingham Heart Risk Prediction Model, the U.S. pooled cohort equations, and the recently developed China-PAR model. However, research on clinical prediction models for cardiovascular metabolic risks in young individuals with GMDs remains insufficient. Therefore, there is an urgent need to enhance attention and research on this specific population, which is crucial for the prevention and management of their long-term prognosis.
There were some limitations in the present study. First, it is a single-center, prospective study with a small sample size and a short follow-up period. Second, the conclusion regarding the Tyg, which is fundamentally derived from the product of Tg and fasting plasma glucose (FPG), raises the question: does this suggest that novel CVRFs have a less significant impact on adverse events compared to traditional CVRFs? Third, does intervening in CVRFs potentially reduce the conversion risk to DM? To address these issues, further multi-center studies with larger sample sizes, extended follow-up durations, and interventions targeting risk factors are needed, along with the inclusion of additional novel CVRFs for further evaluation.
ConclusionsIn this study, we predicted important CVRFs affecting MACEs in young patients with GMDs by means of LASSO regression, randomized forest, and XGBoost. The results showed that traditional CVRFs (BP, BMI) and novel CVRFs (TYg) play an important role in the occurrence of long-term MACEs in young GMDs, and interventions for them may be significant in preventing DM transformation.
Ethical considerationsThe present study was approved by the Institutional Ethics Committee for Clinical Research of the First People's Hospital of Anqing Affiliated to Anhui Medical University (No. AQYY-YXLL-KJXM-21). This study was conducted in compliance with the ethical standards of the responsible institution on human subjects as well as with the Helsinki Declaration. An informed consent was provided by all patients.
FundingThis research was supported by the self-raised fund science and technology project of Anqing City in 2021, “Risk Factors for Coronary Microvascular Dysfunction in HFpEF Patients,” led by Dr. Jiecheng Peng (Project No. 2021Z3002), and the 2023 Anhui Medical University Graduate Research and Practice Innovation Project, “A Single-Center Prospective Study of Radiofrequency Ablation for Persistent Atrial Fibrillation Complicated with Heart Failure with Preserved Ejection Fraction,” directed by Ms. Yangyang Zhao (Project No. YJS20230205).
Conflict of interestThe authors declare that there is no conflict of interest.
We are deeply grateful to all the patients who participated in this study, as well as to everyone who contributed to the writing of this paper. Special thanks to those who provided invaluable insights into the study design, data acquisition and analysis, and manuscript revision. We acknowledge with appreciation the financial support from Dr. Jiecheng Peng's 2021 Self-raised Fund Scientific and Technological Project of Anqing City (Project No. 2021Z3002) and the 2023 Anhui Medical University Graduate Research and Practice Innovation Project directed by Ms. Yangyang Zhao (Project No. YJS20230205). We have obtained permission from all individuals mentioned in this section to include their names and contributions.









