
Abstracts of the 2025 Annual Meeting of the ALEH
More infoAutoimmune liver diseases (AILDs) present significant diagnostic and management challenges. Following our initial evaluation of Large Language Models (LLMs), we developed and assessed three specialized Retrieval-Augmented Generation (RAG) systems. These systems incorporated comprehensive clinical guidelines and medication safety information to enhance decision support accuracy. Our aim was to evaluate the effectiveness of Retrieval-augmented AI systems in providing evidence-based recommendations for AILD management.
Materials and MethodsWe engineered three distinct RAG systems: HepaChat, RAG-ChatGPT, and RAG-Claude. Each system integrated 13 international clinical guidelines spanning AIH, PBC, and PSC management. Additionally, we incorporated a comprehensive database containing 12,465 FDA medication warnings to ensure safety protocol adherence. Ten liver specialists (six European, four American) evaluated system responses to 56 standardized clinical questions using a 1-10 Likert scale. Questions addressed disease comprehension, therapeutic approaches, and clinical decision-making across all three major AILDs.
ResultsQuantitative analysis revealed HepaChat's superior performance (mean score 7.58±1.48) with 33 best-rated responses, compared to RAG-Claude (7.22±1.58, 12 best-rated) and RAG-ChatGPT (7.21±1.67, 9 best-rated). Geographic stratification unveiled variations in evaluation patterns (Americas: 7.97 vs Europe: 6.40). Disease-specific analysis demonstrated HepaChat's excellence in AIH (Europe: 7.12, Americas: 8.17) and PSC management in Europe (6.89), while achieving optimal performance in AIH and PBC in the Americas (8.17 and 8.37, respectively). All three systems showed marked improvement over conventional LLMs (2023 benchmark: 6.72±1.67).
ConclusionsThis evaluation demonstrates that specialized RAG systems incorporating clinical guidelines and safety protocols can significantly enhance AILD management support. Geographic variations in assessment highlight the importance of considering regional clinical perspectives in AI system development.
Conflict of interest: None





