Large-scale plasma proteomic profiling unveils diagnostic biomarkers and pathways for Alzheimer's disease
1. Research Background and Academic Significance
Alzheimer’s Disease (AD) is the most common form of dementia worldwide, accounting for about 60–80% of all dementia cases. The primary affected population is individuals over 65 years old, with characteristic pathological features including the deposition of amyloid-β plaques, neurofibrillary tangles, and widespread neuronal loss. Although recent advances in neuroimaging, cerebrospinal fluid (CSF) testing, and genomics have brought considerable progress to AD research, early diagnosis and objective monitoring of disease progression remain limited by invasive detection methods (such as CSF puncture or brain PET imaging) and the limited repertoire of biomarkers. Plasma, owing to the ease of collection and high patient compliance, is viewed as an ideal matrix for future noninvasive AD diagnosis and dynamic monitoring. However, previous plasma-based proteomic studies have generally had small sample sizes and limited numbers of detected proteins, restricting the systematic discovery and validation of plasma biomarkers.
Although several prior studies (such as those by Walker et al., Sung et al., Guo et al.) have used proteomic approaches to explore AD-related plasma proteins, most included only several hundred AD or dementia patients, and analyzed a limited range of proteins, resulting in a limited number of identified biomarkers and insufficient external validation. Given these shortcomings, there is an urgent need for large-scale, systematic studies with rigorous external validation to reveal a more comprehensive and robust plasma protein molecular map, facilitating early screening, diagnosis, and disease tracking for AD.
2. Source of the Paper and Introduction of the Research Team
This study was published in the June 2025 issue of Nature Aging, DOI: https://doi.org/10.1038/s43587-025-00872-8. The research was led by the Knight Alzheimer Disease Research Center at Washington University in St. Louis, in collaboration with multiple centers. The corresponding author is cruchagac@wustl.edu. The research team brings together multidisciplinary experts in neurodegenerative diseases, proteomics, and machine learning. The results are not only supported by well-known US clinical cohorts of neurodegenerative disease but also integrated with multiple international collaborative datasets, making this one of the largest AD plasma proteomics studies to date.
3. Study Design, Experimental Workflow, and Technical Innovations
1. Three-Stage Study Design and Sample Collection
(1) Stage One: Discovery
- Data source: Knight ADRC cohort, 2,131 plasma samples.
- Composition: 1,381 cognitively normal controls (CO), 750 clinically diagnosed AD patients.
- Detection platform: SomaScan 7k platform, covering 6,905 protein molecules (corresponding to 6,106 unique proteins).
(2) Stage Two: Replication & Meta-analysis
- Replication samples: Joint Knight ADRC and Stanford ADRC cohorts, a total of 1,235 plasma samples (715 controls + 520 AD).
- Purpose: To verify the directionality and statistical significance of candidate proteins identified in the first stage. Only proteins reaching nominal significance (p<0.05) and passing FDR correction in both cohorts moved on to the next round.
(3) Stage Three: Large External Cohort Validation and Cross-comparison
- Dataset one: ROSMAP (Religious Orders Study + Rush Memory and Aging Project), external samples of 322 AD cases and 150 controls.
- Dataset two: Global Neurodegeneration Proteomics Consortium (GNPC), external samples of 1,733 AD cases and 4,833 controls.
- The study also comprehensively compared newly discovered proteins with findings from six previous plasma and three CSF proteomics studies to verify specificity and commonality.
2. Key Experiments and Data Analysis Workflow
- Absolute Protein Quantification: High-throughput SomaScan 7k protein microarray, using oligonucleotide aptamers to recognize and quantify plasma proteins.
- Statistical and Machine Learning Analysis:
- Multivariate linear regression screening of protein significance (association between clinical diagnosis and protein levels).
- Logistic regression and Cox proportional hazards models assessed associations between proteins and AD progression rate.
- Integrated machine learning models (such as SVM, random forest, Lasso regression, etc.) to select proteins with high predictive value, train AD prediction classifiers, and cross-validate on independent cohorts and different platforms (Olink, Alamar).
- Advanced Analyses:
- Classification statistics of novel and known biomarkers.
- Pathway enrichment analysis (KEGG, GO, Reactome databases) to systematically elucidate the molecular functional networks underlying proteomic aberrations.
- Clinical subgroups/comorbidity benchmarking, testing the model’s ability to distinguish dementia with Lewy bodies (DLB), frontotemporal dementia (FTD), and Parkinson’s disease (PD).
4. Detailed Main Results
1. Initial Screening and Replication Validation
- In the initial screening among 2,131 samples with 6,905 proteins tested, a total of 1,540 proteins (1,646 aptamers) were found to be associated with clinical AD diagnosis.
- After the second round of replication, 416 proteins (456 aptamers) were consistently directional and met p<0.05 in both cohorts, remaining significant after FDR correction.
- Grouped by protein levels (highest versus lowest tertile), positively associated proteins had a mean AD odds ratio of 1.50, negatively associated proteins had 0.72, with some proteins reaching a maximum OR of 3.58, indicating strong biological effects.
2. Validation in Large External Cohorts and Comparison with Previous Studies
- In the ROSMAP and GNPC external cohorts, 99% (453⁄456) of aptamers were covered, 78% (353⁄456) were directionally consistent, 77% were nominally significant (p<0.05), and 74% (333⁄456) remained significant after multiple testing correction (correlation coefficient r=0.675).
- A systematic comparison with six previous plasma and three CSF studies showed that of the 416 newly identified proteins, only 52 (16%) were also significantly identified previously, and 212 proteins were newly reported (mainly limited in prior studies by sample size and analytical depth).
3. Cross-Body Fluid Proteomics Comparison: Similarities and Differences between Plasma and CSF
- Cross-validated with the latest large CSF proteomics study (n=2,286), of the 416 plasma proteins, 445 were also detectable in CSF, but only 174 remained associated with AD after multiple testing correction; the top-ranked plasma proteins (such as SPC25, CTF1, ACHE) were also significant markers in CSF, but most significant CSF proteins (such as YWHAG, TMOD2, etc.) showed no signal in plasma.
- Only about 8% of CSF-AD associated proteins were positively associated in plasma, suggesting that the protein molecular dysregulation in the two fluids has independent or complementary mechanistic implications, with different biomarker sets helping to understand the disease comprehensively.
4. AD Progression-Related Risk Proteins and Longitudinal Analysis
- Including 761 samples with longitudinal follow-up (mean follow-up 3.5 years), a total of 625 proteins were nominally associated with clinical AD progression (p<0.05).
- Among the 416 diagnostic-related proteins, 20 were also directionally consistent with AD progression, further confirming that these molecules can reflect both current diagnosis and potentially dynamic disease processes. Two proteins, MIA and COL10A1, had opposite directions, indicating some proteins may participate in disease protection or compensation.
5. Pathway Enrichment and Molecular Network Reveal Conserved and Novel Mechanisms in AD
- Enrichment analysis indicated the significant proteins were mainly involved in five main aspects: (1) lipid metabolism (core regulatory proteins such as APOE, CLU achieved multiple significant associations for the first time); (2) immune and hemostasis (complement pathways, platelet activation, etc.); (3) extracellular matrix and blood-brain barrier; (4) neuronal activities and nervous system features (axon guidance, myelination, synaptogenesis and neurotransmitter GABA/Dopamine signaling); (5) general metabolic pathways.
- Multiple signaling pathways (such as 14-3-3 mediation, FoxO transcriptional regulation, GPVI cascade) were identified for the first time as dysregulated in large-scale plasma profiling of AD, expanding our understanding of basic disease mechanisms.
- Many proteins such as SMOC1, SPAR, etc., previously unreported in plasma or CSF, provide rich clues for future pathogenic molecular function research and drug target development.
6. High-Predictive Machine Learning Model and Differentiation among Disorders
- By screening a combination of 7 proteins with machine learning models, the model achieved an AUC>0.72 in classifying AD, and up to 0.88 for biomarker-defined AD. The model performed excellently in cross-validation on ROSMAP, GNPC cohorts, and heterogeneous platforms (Olink, Alamar).
- The same model showed specific differentiation between Lewy body dementia, frontotemporal dementia, and Parkinson’s disease, suggesting that plasma protein profiles can serve as a new tool for differential diagnosis of neurodegenerative diseases.
5. Conclusion, Scientific, and Practical Value
This study systematically charted the largest AD plasma proteomic molecular atlas to date, providing solid data and a theoretical foundation for early AD screening, noninvasive diagnosis, disease progression assessment, and novel target discovery.
- Scientific significance: For the first time, under the validation of ultra-large cohorts and multiple platforms/centers, the study identified a large number of new AD plasma protein biomarkers and several new relevant biological pathways, laying the foundation for research on AD pathogenesis and cross-body fluid comparison of proteomic biomarkers.
- Application value: The plasma protein profile has translational potential, which can drive the development of new AD early screening or dynamic monitoring assay kits, increase patient benefit, and also aid in precision medicine and drug discovery.
- Methodological innovation: With high-throughput SomaScan, machine learning algorithms, enrichment analysis, and multi-center big data validation, the robustness and clinical generalizability of the study’s findings are greatly improved.
- Summary of highlights: Large-scale, multi-center, cross-body fluid, machine-learning-based screening, multiple external validations, and the discovery of abundant novel biomarkers and mechanistic insights are the key features and academic contributions of this study.
6. Research Outlook and Potential Impact
With the development of plasma proteome sequencing technology and bioinformatics, such large-sample cohort studies with systematic validation are expected to continuously advance the practical application of noninvasive AD diagnosis, and open a new chapter for clinical subtyping and individualized management of multiple neurodegenerative diseases. This study also presents a deeply integrated example of proteomics and artificial intelligence in medical translation, setting a paradigm for future precision medicine research.