Development and Validation of a Proteomic Signature of Healthspan

1. Academic Background: From Lifespan Extension to the Enhancement of Healthspan

With the improvement of global medical and socioeconomic standards since the 20th century, overall human lifespan has significantly increased, especially in developing countries. However, healthspan—the number of years an individual lives in a state of complete health, free from major chronic diseases and functional impairment—has not grown in parallel with lifespan. This has resulted in a global “healthspan-lifespan gap,” where more and more people live longer but spend their late years with chronic diseases, disabilities, and loss of function, imposing a huge social, economic, and medical burden.

In response to this challenge, the field of anti-aging biology has seen the emergence of the research paradigm known as “Geroscience”. Unlike the traditional focus on preventing and treating single diseases, Geroscience emphasizes targeting the core biological mechanisms underlying aging (the “hallmarks of aging,” such as inflammation, immune imbalance, metabolic disruption, and cellular dysfunction) in order to simultaneously delay the onset of multiple systemic chronic conditions and fundamentally extend healthspan.

Driven by such research, various biomarkers of aging have been proposed, which can be used to measure aging status and even predict disease or mortality risk, including clinical physiological parameters, blood biochemical indices, multi-omics features (epigenomics, proteomics, metabolomics), and composite scoring models. However, at present, the academic and medical communities mainly focus on biomarkers of lifespan or mortality predictors, and biomarkers that directly reflect healthspan are still lacking. This is mainly because there is not yet a unified definition of healthspan, and biomarker development relies on long-term, large-sample longitudinal follow-up and comprehensive data collection on healthy individuals until various endpoints occur, which is far more difficult than simply predicting lifespan or a single disease.

In this context, developing a biomarker of “healthspan” that can accurately predict changes in individual health status and is applicable across populations, diseases, and interventions has become a key challenge and cutting-edge hotspot in translational and anti-aging research.

2. Source of the Paper and Author Background

This is an original research article titled “A proteomic signature of healthspan,” published in the Proceedings of the National Academy of Sciences (PNAS) in 2025, vol. 122, issue 23. The study was conducted by a team led by Chia-Ling Kuo (first and corresponding author, University of Connecticut Health Center), Peiran Liu, Gabin Drouard, Eero Vuoksimaa, Jaakko Kaprio, Miina Ollikainen, Zhiduo Chen, Luke C. Pilling, Janice L. Atkins, Richard H. Fortinsky, George A. Kuchel, and Breno S. Diniz. The contributors are from top institutions including the University of Connecticut Health Center (USA), University of Helsinki (Finland), and University of Exeter (UK).

The manuscript was officially published on June 6, 2025, as a “direct submission” to PNAS, edited by renowned scholar Ana Maria Cuervo. The main data sources were the UK Biobank and Finnish Twin Cohort, with proteomic measurements performed using the Olink Explore 3072 high-throughput proteomics platform. The study design is rigorous, with a very large sample size and innovative technical approaches, giving it outstanding academic value and broad translational prospects.

3. Detailed Study Design and Methodological Workflow

1. Study Objectives and Overall Design

The aim of the study was to develop a large-scale proteomics-based biomarker—the Healthspan Proteomic Score (HPS)—to accurately predict individual healthspan (i.e., the years before a first major health event, including cancer, diabetes, heart failure, myocardial infarction, stroke, chronic obstructive pulmonary disease, dementia, or death), and systematically validate its predictive performance across different populations, health outcomes, and metrics for its clinical utility.

The specific design included:

  • Developing the HPS model using the large UK Biobank Pharma Proteomics Project (PPP) cohort;
  • Conducting cross-validation and external validation with independent subsamples and the Finnish “Essential Hypertension Epigenetics Study”;
  • Utilizing cross-sectional and longitudinal data; multi-level comparisons with various protein- and epigenetic-based biological aging models;
  • Correlating protein levels, clinical health indicators, and multiple major outcomes to reveal potential biological mechanisms.

2. Main Experimental Workflow and Steps

(1) Sample Sources and Baseline Characteristics

  • UKB PPP Cohort: 53,018 active UK Biobank participants (recruited 2006–2010, aged 40–70) were included. Among them, 43,119 had no diagnosis of any major disease within the healthspan definition at baseline and were the main study population, randomly divided into a 70% training set (30,184) and 30% test set (12,935).
  • Proteomics Measurement: Plasma levels of 2,923 proteins were measured in UKB samples using the Olink Explore 3072 platform, covering cardiovascular, immune, neurological, oncological, and other functional systems; strict batch normalization and missing value imputation (KNN with K=10) were performed.

(2) Development and Modeling of the HPS

  1. Establishing the Healthspan Definition
    Referring to state-of-the-art studies, the authors defined healthspan as the number of years from birth until the first diagnosis of any of the following: malignant tumor (excluding nonmelanoma skin cancer), diabetes (type I/II/malnutrition-related), heart failure, myocardial infarction, stroke, chronic obstructive pulmonary disease, dementia, or death.

  2. Feature Selection and Model Building

    • First, a LASSO (Least Absolute Shrinkage and Selection Operator) Cox regression model was used to select the most predictive variables from 2,920 proteins and age, tightly controlling model complexity, ultimately retaining 86 proteins and age as predictors that together maintained nearly optimal deviance.
    • On this basis, a Gompertz survival model was assembled to estimate each individual’s probability of a first healthspan-defining event within the next 10 years (risk Rh), with 1–Rh defined as the HPS—a lower score reflects higher risk.
  3. Training–Testing and Cross-Validation

    • The model was developed in the training set and independently evaluated in the test set. Performance was compared against various biological age indices (Proteomic Aging Clock, PAC; PhenoAge; BioAge; Protage-EN) and phenotypic traits, testing its predictive power and generalizability for various diseases and mortality.

(3) Main Data Analysis and Statistical Approaches

  • Multiple statistical strategies were used, including Spearman correlation, Wilcoxon rank-sum tests, Aalen’s additive hazards model, and Harrell’s C statistic.
  • In addition to standard variables—age, sex, ethnicity, education—further adjustments were made for BMI, smoking status, hypertension, hypercholesterolemia, occupational background, and cohort origin to minimize confounding.
  • Multiple comparisons were rigorously controlled using FDR (Benjamini-Hochberg procedure).
  • All protein signals were normalized and inverse-normally transformed; bioinformatics analyses such as pathway enrichment and functional annotation were performed using tools including FUMA.

(4) External Validation

  • The Finnish Essential Hypertension Epigenetics Study included 401 participants from the Finnish Twin Cohort (with blood pressure phenotypes, death outcomes, medium/long-term follow-up, and complete proteomics and epigenetic data), systematically assessing HPS applicability and scalability in another European population and directly comparing it with DNA methylation and metabolomics-based biological age indices.

3. Main Results and Logical Progression

(1) Healthspan Outcomes and Sample Phenotypes

  • In the UKB PPP training–test cohort, approximately 28.8% of participants who were free of major diseases at baseline had a first healthspan-defining event over 13.5 years of follow-up; cancer, myocardial infarction, diabetes, and COPD were the main first events, with a mean age at first occurrence of 66.7 years.
  • By HPS stratification, patients with chronic disease or at-risk groups (male, older, smokers, obese, hypertensive, or hypercholesterolemic) had significantly lower HPS scores at baseline, indicating discriminative ability for biological aging and sensitivity to early biological changes related to disease.

(2) HPS Associations with Biological Age and Clinical Outcomes

  • HPS was moderately negatively correlated with chronological age (r = –0.73), strongly negatively correlated with proteomic clocks like PAC (r = –0.87), and composite phenotypic age measures (e.g., PhenoAge, BioAge; r = –0.79, –0.74), and weakly negatively correlated with the frailty index.
  • Multivariable regression and survival analyses showed: The lower the HPS, the higher the future risk of a major chronic disease or death within 10 years—every 0.1 decrease corresponded to 1,600 more new cases per 100,000 person-years. The low HPS group manifested poorer survival/health prognosis across multiple disease outcomes and mortality (e.g., distinctly separated Kaplan-Meier curves, with a 55% increase in mortality risk for each 0.1 drop in HPS).

(3) Proteomics Biological Mechanisms and Logical Support for HPS

  • 1,398 proteins up- or down-regulated specifically in the low HPS group (Bonferroni-corrected) were highly enriched in 26 hallmark pathways involved in immune response, inflammation, cellular signaling, and metabolic regulation, reinforcing the substantive link between the proteome and the biology of aging.
  • When directly compared with traditional lifespan clocks such as PAC, HPS demonstrated equivalent or superior discriminative power for various organ-specific chronic diseases, mortality, and multiple health outcomes; combining the two yielded even stronger predictive efficacy and broader applicability.

(4) External Validation and Generalizability

  • In the Finnish EH-EPI cohort, HPS was significantly associated with mortality, diabetes, and metabolomic markers; participants with lower scores had higher mortality rates and stable performance. In head-to-head validation with DNA methylation clocks and other proteomics indices, HPS showed equal or stronger prospective biological predictive power, demonstrating its potential for application in diverse ethnic groups.

(5) Joint and Interaction Analyses

  • Further exploration found that individuals with low HPS + high PAC constituted the “worst biological aging group,” with healthy individuals (high HPS + low PAC) and single-indicator abnormal groups in between. Joint stratification can better identify clinically healthy yet biologically at-risk populations undergoing significant systemic unfavorable changes, thus aiding targeted preventive geroscience interventions.

4. Research Conclusions, Scientific Value, and Application Prospects

1. Main Conclusions

  • As a proteomics-based healthspan score, HPS can accurately predict the risk of major chronic diseases or death (end of healthspan) in healthy, middle-aged, and young adults, and it performs well in stratifying risk across subgroups (age, sex, smoking, obesity, baseline chronic disease).
  • HPS reflects the state of systemic biological aging, capturing the dysfunction processes related to the hallmarks of aging, rather than a single organ or event.
  • HPS is not only related to traditional proteomic and epigenetic “lifespan scores”, but also independently provides a new dimension for health risk stratification; using multiple measures together will enhance the prediction of health outcomes and precision medicine stratification.
  • Application of HPS will greatly promote population screening, efficacy evaluation, and surrogate endpoints in anti-aging intervention trials, shorten follow-up, improve trial efficiency, and facilitate the translation of Geroscience into clinical practice and healthy aging initiatives.

2. Scientific and Application Significance

  • Scientific Value: Fills the gap in healthspan biomarkers, providing a new, precise tool for healthy aging societies and lifelong chronic disease prevention, and promotes deep integration of proteomics and geriatric medicine.
  • Application Prospects: As a next-generation biological aging measure, HPS can be used for disease risk assessment, high-risk population screening, intervention effect monitoring, drug development, insurance, and public health policy, with great clinical and industrial translational potential.

5. Research Highlights and Innovations

  • First longitudinally-developed proteomic score model for healthspan based on large-scale population and strictly defined health outcomes, filling an international gap.
  • The model covers a wide proteomic signature set, highly enriched in aging-related inflammation, immunity, and signaling pathways, elucidating shared biological mechanisms of chronic diseases.
  • Large sample, multidimensional data spanning diverse populations and sources from the UK and Finland; strong generalizability and external applicability.
  • Direct and systematic head-to-head comparison with state-of-the-art proteomic and epigenetic age models, providing high technical reliability and authority.
  • Multi-stratification approach enables more precise biological health classification, delivering robust scientific support for individualized anti-aging interventions, early screening, and precision therapies.

6. Additional Information and Future Prospects

  • Model code and analytic workflow are open-sourced on GitHub, freely available for academic use and extension.
  • The authors and teams are internationally leading experts in geriatrics, proteomics, and population cohort studies; the project is supported by top scientific and medical research grants in the UK, Finland, and USA, ensuring high authority.
  • The study includes restrictive discussion of the healthspan definition, acknowledging that electronic medical records may not capture all chronic functional impairments, but the model effectively predicts mainstream, common health outcomes with wide applicability.
  • Future work should further validate the clinical translational value of HPS in multi-ethnic, multi-age, and broader medical data populations, and promote its integration into international healthy aging policies and precision medicine.

7. Conclusion

As a pioneering proteomic exploration into healthspan, this study not only advances precise classification of biological age and clinical translation “from bench to bedside,” but also provides a solid scientific foundation for the global goal of extending healthspan in aging societies. HPS and its combined strategies are poised to guide future innovations in anti-aging interventions, chronic disease prevention, and health policy formulation, laying the theoretical and technical groundwork for precision health management in the new era.