RDGuru: A Conversational Intelligent Agent for Rare Diseases

Intelligent Conversational Agent for Rare Diseases—RDGuru: Cutting-edge Technology Powers a New Revolution in Clinical Diagnosis

Academic Background and Research Motivation

Rare Diseases (RD) refer to disease categories that affect fewer than 6.5 to 10 individuals per 10,000 population. Their individual rarity, complex clinical features, and diverse mechanisms of onset make clinical diagnosis extremely challenging. Patients with rare diseases often endure a long and arduous “diagnostic odyssey” due to high clinical heterogeneity and overlapping symptoms, leading to delayed diagnosis, increased misdiagnosis rates, and treatment delays. Although professional knowledge bases such as Orphanet and OMIM have been established, clinicians still face significant obstacles in practical information retrieval and utilization. This situation makes it all the more urgent to improve the efficiency and accuracy of rare disease diagnosis.

Meanwhile, rapid progress in Artificial Intelligence (AI) and Large Language Model (LLM) technologies is reshaping many industries. Tools like ChatGPT have triggered dramatic transformations in the healthcare sector. LLMs can understand natural language and generate high-quality text, gradually taking on roles such as medical knowledge Q&A and diagnostic support. However, existing general-purpose LLMs, due to limitations in their training corpora, are plagued by issues such as “hallucination” (the generation of false or erroneous information) and insufficient credibility—problems that pose acute risks particularly in the rare disease domain. Moreover, current LLMs lack evidence traceability associated with expert knowledge bases and the clinical interpretability of specialized diagnostic tools.

To address these challenges, the research team merges AI with medical expertise, transforming LLMs into advanced tools tailored for rare disease diagnosis and knowledge retrieval, not only enhancing response credibility but also markedly improving diagnostic accuracy and clinical utility for rare diseases. This drive underpins the study in question.

Authors and Source

The paper, titled “RDGuru: A Conversational Intelligent Agent for Rare Diseases,” is authored by Jian Yang, Liqi Shu, Huilong Duan, and Haomin Li, affiliated with the Clinical Data Center at Children’s Hospital, Zhejiang University School of Medicine, the College of Biomedical Engineering and Instrument Science at Zhejiang University, and Rhode Island Hospital of Brown University in the USA. The study was published in the IEEE Journal of Biomedical and Health Informatics (September 2025), standing as an original research milestone in the field of medical AI.

Workflow and Technical Innovations Explained

1. Overall Research Workflow

RDGuru is an intelligent conversational agent system for rare diseases, built on the LangChain (an open-source agent development framework) and powered by GPT-3.5-turbo LLM. Its core functions comprise two major modules: evidence-traceable knowledge Q&A, and professional medical consultation including differential diagnosis of difficult diseases. The workflow spans several critical steps:

a) Development of Rare Disease Knowledge Q&A System

  • Innovative Application of the RAG Framework
    The team adopts the RAG (Retrieval-Augmented Generation) framework, embedding structured knowledge retrieved from authoritative databases into the LLM generation process, greatly enhancing both accuracy and authority of the responses. Data sources include Orphanet, OMIM, GARD, and proprietary Orphadata.

  • Customization and Integration of LangChain Toolchain
    The system integrates a suite of tool modules, including knowledge chunk loading for Web-Html files, text chunking & embedding, vector retrieval (FAISS algorithm), and biomedical ontology parsing. Its unique disease entity recognition module (Orpha retriever) intelligently matches non-standard disease descriptions, ensuring flexible and accurate retrieval results.

  • Enhanced Multi-Faceted Q&A Tools
    Tailored tools are developed for queries of different types (i.e., genetic etiology, phenotypic features, epidemiological data), enabling precise extraction and aggregation of diverse knowledge fragments and boosting Q&A coverage and relevance.

b) Clinical Medical Consultation and Differential Diagnosis (DDX)

  • Automated Phenotype Annotation and Context Analysis
    Employing Human Phenotype Ontology (HPO) as the standard, the system integrates the BioPortal’s NCBO Annotator to automatically extract and normalize phenotypes from case descriptions. With the FastContext algorithm (based on an n-trie rule engine), RDGuru smartly identifies phenotype context (affirmed/negated, certain/uncertain, temporality, etc.), enhancing parsing accuracy and clinical applicability.

  • Innovative Phenotype-Driven Disease Recommendation Algorithm
    PHELR (Phenotype-driven Likelihood Ratio Analysis) leverages Bayesian methods to quantify the relationship between phenotypes and diseases, improving diagnostic interpretability.

  • Multi-Round Differential Diagnosis via Intelligent Dialogue
    Leveraging the RDMaster system, the agent uses proprietary Adaptive Information Gain and Gini Index (AIGGI) scoring methods to auto-select the most diagnostically valuable phenotypes, posing cross-system, phenotype-based smart questions at each consultation round. Upon user feedback, diagnoses are updated in real time, generating new Q&A recommendations.

  • Development of Multi-Source Fusion Diagnostic Model—MixDiagDQN
    One of the study’s key innovations: merging PHELR, GPT-4, and phenotype frequency-based matching via DQN (Deep Q-Network) reinforcement learning. The system continuously interacts with its environment over multiple rounds, optimizing the mixed diagnosis list to enhance true diagnostic recall. Training involved 10,000 simulated cases from Orphadata, with performance tested on 238 published real-world rare disease cases.

2. Experimental Procedures and Data Processing

  • Knowledge Q&A Module Testing
    Eight dimensions of knowledge questions were designed, utilizing 23 question templates to create 4000 items covering symptoms, diagnostic methods, epidemiology, etc. Performance is compared against GPT-3.5 and GPT-4 base models, evaluating RDGuru on metrics such as text similarity, phrase accuracy, and trustworthiness.

  • Phenotype Annotation Evaluation
    From the 238 literature cases, 102 textual cases were manually extracted, yielding 1,018 known phenotypes and 97 unobserved phenotypes as the gold standard. RDGuru (NCBO&FastContext) was compared against NCR&FastContext and Doc2HPO (and others) using precision, recall, and F1-score.

  • Multi-Source Fusion Diagnosis Assessment
    Across all 238 real cases (4257 candidate rare diseases), each diagnostic method’s recall rates for Top 1, Top 5, and Top 10 ranking positions are tallied, with in-depth analysis of source composition, disease overlap, and the mechanism advantages of MixDiagDQN.

  • Dynamic Assessment of Multi-Round Symptom Q&A
    During simulated multi-round consultations, the effect of phenotype-oriented queries by RDGuru on true diagnosis ranking is tracked and the relationship between each symptom capture and diagnostic accuracy is analyzed.

Main Results Detailed

Knowledge Q&A Module

RDGuru substantially outperforms traditional GPT models under all evaluation schemes. Its Rouge-1 Recall and NP-ARE recall show clear superiority in handling symptom and disease natural history Q&A, and its precision (conciseness and consistency) is consistently higher, especially when tackling complex or ambiguous disease queries. Under the Ragas framework, RDGuru achieves high scores in context retrieval and generation metrics (Context Precision/Recall, Faithfulness, etc.), underlining its robust protection of knowledge provenance and authority.

Regarding tool invocation, the system failed to automatically invoke tools in only 6.13% of 800 Q&A cases; in the overwhelming majority, it achieved automatic disease parsing and optimal tool selection, ensuring reproducibility and stability in responses.

Medical Consultation and Differential Diagnosis Module

RDGuru excels in automatic phenotype annotation: NCBO&FastContext leads in precision, recall, and F1-score versus Doc2HPO and other mainstream tools, especially for positive phenotype extraction—striking a balance between accuracy and coverage.

In the multi-source fusion diagnosis evaluation, the MixDiagDQN model achieves a Top 5 recall rate of 63.87%, surpassing standalone PHELR by 5.47 percentage points (the latter at 58.4%, with GPT-4 only at 42%). It also excels in the Top 10 recall. This fusion strategy takes advantage of complementary strengths—PHELR dominates first-position recommendations, while GPT-4 adds diversity in lower-ranked diagnoses.

In multi-round phenotype-oriented Q&A, RDGuru captures 59.1% of valid symptom information (far exceeding the theoretical limit of random questioning), improving true diagnosis rankings, enriching patient data, and effectively narrowing diagnostic results to the correct answer.

Research Conclusions and Impact

The research team has successfully created RDGuru, a rare disease intelligent conversational agent by integrating RAG, LLM, and reinforcement learning technologies, achieving authority and evidence traceability in medical knowledge Q&A and accurate, interpretable clinical differential diagnosis. The innovative MixDiagDQN algorithm sets a new standard for diagnostic accuracy and brings a paradigm shift to AI in medicine and rare disease diagnosis.

Scientific and Practical Significance

RDGuru’s deployment not only provides a practical AI diagnostic assistant in the rare disease domain, but also serves as a technological blueprint for future autonomous disease Q&A, automated phenotype annotation, and intelligent multi-source integrated diagnosis. Its open, modular, and upgradable design offers a stable platform for ongoing LLM improvements. Whether for clinicians seeking knowledge, patients seeking precise medical guidance, or researchers exploring new approaches in medical AI, RDGuru is a valuable resource.

Research Highlights and Innovative Features

  1. Technological Fusion Innovation: For the first time, RDGuru deeply integrates Retrieval-Augmented Generation (RAG), multi-source fusion reinforcement learning (DQN), professional knowledge bases, and LLMs into a unified intelligent agent framework.
  2. Best-in-class Multi-Source Diagnostic Performance: The MixDiagDQN fusion model substantially outperforms single algorithms, creating new records in rare disease diagnostic accuracy.
  3. Evidence Traceability and Clinical Interpretability: Every knowledge answer and diagnostic recommendation can be traced to authoritative databases, with interpretable algorithms ensuring transparency of results.
  4. Openness and Usability: All system modules are open-source, and real case data is openly shared, supporting future research, clinical reuse, and ongoing improvement.
  5. Multi-Round Intelligent Interaction: RDGuru is capable of multi-round symptom collection and dynamic diagnosis adjustment, boosting intelligence and personalization in the entire diagnostic process.

Potential Limitations and Future Directions

The study honestly acknowledges current limitations and areas for improvement. For example, the LangChain framework relies on predefined tools and may struggle with unforeseen new demands; much valuable genetic and multi-omic data for rare diseases are not yet within the scope of automatic intelligent interpretation; large-scale clinical validation is still pending; and the system currently does not support diagnosis of non-rare diseases, restricting application scope. Future work may push forward in directions such as genetic variant analysis, real-world clinical deployment, and broader disease generalization.

Summary

The development and validation of RDGuru demonstrate the tremendous potential of AI in assisting medical diagnosis, especially for rare diseases. Its outstanding performance in achieving “authoritative knowledge traceability,” “high clinical diagnostic accuracy,” and “autonomous interpretability” actively propels the advance of medical AI, offering a powerful tool for clinicians and patients. As technology evolves and clinical application deepens, RDGuru and its underlying concepts will empower wider swathes of medicine, ushering in a new era for AI-driven health management.