Systematic review and meta-analysis of explainable machine learning models for clinical depression detection

Trelles Urgiles, Francisco Ariosto

Por favor, use este identificador para citar o enlazar este ítem: http://repositorio.utmachala.edu.ec/handle/48000/25373

Título :	Systematic review and meta-analysis of explainable machine learning models for clinical depression detection
Autor :	Trelles Urgiles, Francisco Ariosto
Director(es):	Fontaines Ruiz, Tomás Iván
Palabras clave :	MACHINE LEARNING;DEPRESSION;EXPLAINABILITY
Fecha de publicación :	2025
Citación :	Trelles Urgiles, F. A. (2025) Systematic review and meta-analysis of explainable machine learning models for clinical depression detection. [Trabajo de titulación, Universidad Técnica de Machala]. Repositorio Institucional-Universidad Técnica de Machala.
Resumen :	Depression is among the most prevalent mental disorders, and its early detection is essential to improving therapeutic outcomes in psychotherapy. This systematic review and meta-analysis evaluated the accuracy, interpretability, and generalizability of supervised algorithms (SVM, Random Forest, XGBoost, and GCN) for clinical detection of depression using real-world data. Following PRISMA guidelines, 20 studies published between 2014 and 2025 were analyzed across major scientific databases. Extracted metrics included F1- Score, AUC-ROC, interpretability methods (SHAP/LIME), and cross-validation strategies, with statistical analyses using ANOVA and Pearson correlations. Results showed that XGBoost achieved the best average performance (F1-Score: 0.86; AUC-ROC: 0.84), although differences across algorithms were not statistically significant (p > 0.05), challenging claims of algorithmic superiority. SHAP was the predominant interpretability approach (70% of studies). Studies implementing combined SHAP+LIME showed higher F1-Score values (F(1,7) = 8.71, p = 0.021), although this association likely reflects greater overall methodological rigor rather than a direct causal effect of interpretability on predictive performance. Clinical surveys and electronic health records demonstrated the most stable predictive outputs across validation schemes, whereas neurophysiological data achieved the highest point estimates but with limited sample representation. F1-Score strongly correlated with AUC-ROC (r = 0.950, p < 0.001). Considerable heterogeneity was observed for both metrics (I2 = 74.37% for F1; I2 = 71.49% for AUC), and Egger’s test indicated a publication bias for AUC (p = 0.0048). Overall, findings suggest that algorithmic performance depends more on data quality, context, and interpretability than on the choice of model, with explainable approaches offering practical value for personalized and collaborative clinical decision-making.
URI :	http://repositorio.utmachala.edu.ec/handle/48000/25373
Aparece en las colecciones:	Maestría en Psicología Clínica

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
Trelles Ariosto. Artículo.pdf		703,69 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro Dublin Core completo del ítem

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons