Exploring Diverse Approaches for Predicting Interferon-Gamma Release: Utilizing MHC Class II and Peptide Sequences
Academic Background and Research Significance
In recent decades, therapeutic proteins have gained prominence as a research focus in the biopharmaceutical industry due to their huge potential in medicine. With their high targeting ability, therapeutic protein drugs are considered to offer solutions for many acute or chronic diseases (such as certain autoimmune diseases and cancers) that were previously difficult to treat. From the discovery of serum therapy in the 1880s to the launch of the first monoclonal antibody drug muromonab-CD3 in 1986, the therapeutic protein market has continued to expand and is expected to reach nearly $47.4 billion by 2032. However, the issue of immunogenicity triggered by therapeutic proteins has long troubled drug developers. Immune responses can bring about harmful side effects or activate therapeutic mechanisms; for example, vaccines provide immune protection by inducing immune responses in the body.
Within the molecular mechanisms of the immune response induced by protein drugs, the antigen presentation pathway of MHC (major histocompatibility complex) class II molecules is vital. MHC-II binds to peptides generated from the cleavage of proteins to form pMHC-II complexes, presenting them to T cells and triggering immune responses. The binding ability of different MHC-II alleles to different peptides varies greatly, meaning that individual or population genetic differences can significantly affect immune responses. Thus, understanding the interaction between drug-derived peptides and MHC-II, and evaluating their ability to trigger the release of key cytokines such as interferon-gamma (IFNγ), is of landmark significance for evaluating the efficacy and safety of drug design.
However, current experimental evaluation methods (such as cytokine release assays and T cell proliferation assays) have limitations including high cost, time-consumption, and insufficient capacity for large-scale batch screening, making it difficult to address the vast combinations of peptides and alleles. Therefore, developing efficient, universal, and interpretable computational prediction models has become an urgent challenge in this field. This research was conducted in response to this scientific challenge, aiming to build a computational classification model based on peptide and MHC-II sequences to efficiently predict IFNγ release, and to explore the model’s interpretability and generalization ability.
Source and Author Information
This paper, titled “exploring diverse approaches for predicting interferon-gamma release: utilizing mhc class ii and peptide sequences”, was conducted by Abir Omran, Alexander Amberg, and Gerhard F. Ecker, affiliated with the Department of Pharmaceutical Sciences at the University of Vienna and the Preclinical Safety Department of Sanofi, respectively. The paper was published in 2025 in Volume 26, Issue 2 of the Oxford University Press journal Briefings in Bioinformatics (DOI: https://doi.org/10.1093/bib/bbaf101). The article is open access, aiming to advance the frontiers of bioinformatics and computational immunology.
Overall Research Workflow and Experimental Details
Data Collection and Preprocessing
The research team first collected human host, MHC-II-related IFNγ release experimental data (including positive/negative assays) from the Immune Epitope Database (IEDB). Each pMHC-II pair (i.e., a combination of a specific peptide sequence and a specific MHC-II allele pseudo-sequence) was labeled based on the majority of assay results in the database. For example, if a pair had 5 measurement records and 3 were negative, it was labeled as “inactive”.
The research further limited peptide sequence lengths to 12-24 amino acids, as the literature indicates this is the most common MHC-II peptide binding range. Duplicates were processed so that only absolutely identical unique instances were kept, while other types of duplicates were removed. After this series of filtering steps, the final sample size was 7,266 pMHC-II pairs, with inactive samples accounting for 30%, resulting in a clearly imbalanced dataset.
Dataset Splitting and Processing
At the model development stage, the authors used 10-fold cross-validation (CV). Considering possible confounding factors like class imbalance and peptide length distribution, stratified splitting was used to ensure that the distribution of class and peptide length in the training and test sets of each fold remained consistent. Peptide length distribution analysis showed that 15-mers were dominant, accounting for 70%.
Sequence Representation and Feature Engineering
Three descriptors were used for the representation of the peptide and MHC-II sequences:
- LBE (Letter-based encoding): The amino acid sequence is digitized, and sequences shorter than the longest peptide (25 AA) are padded with zeros to facilitate modeling.
- ProtBert embedding features: Uses the ProtBert variant of BERT, trained on 217 million protein sequences, to obtain semantic vectors that capture contextual information, thereby enriching sequence representation.
- Z-scale descriptors: Physicochemical property descriptors used only for fixed-length sequences (so analyzed only for the main 15-mer samples) to reflect properties like hydrophobicity, steric factors, and electronic properties of amino acids.
For each pMHC-II sample, the features of the peptide and MHC-II allele pseudo-sequence were concatenated and used as the model input. The above feature engineering provided a multi-dimensional foundation for the model.
Integrated Modeling and Algorithm Development
The modeling mainly used traditional “tree-based” machine learning algorithms, specifically:
- Random Forest (RF): Well-known for high interpretability, suitable for investigating feature importance.
- Support Vector Machine (SVM)
- Gradient Boosting Machine (GBM)
To address class imbalance, the researchers optimized classification thresholds (testing various probabilities to balance sensitivity and specificity, ultimately choosing 0.65) and used active learning (AL): in each round, the 10 samples with the highest uncertainty were added to the training set to help the model enhance its ability to recognize the minority class. Additionally, to save computing resources, hyperparameter tuning adopted randomized search, and cross-validation was performed separately for each feature representation.
Performance Evaluation and Model Generalization Verification
Key evaluation metrics included balanced accuracy, Matthews correlation coefficient (MCC), precision, sensitivity, and specificity. To further test model generalization, the authors also collected T cell proliferation experimental data from IEDB, excluding any samples overlapping with IFNγ release results (yielding 711 samples: 600 active and 111 inactive), to externally validate the top model’s prediction ability.
Interpretability Analysis and Model Insights
To gain deeper understanding of the model’s decision process, the group conducted interpretability analyses including:
- Feature importance analysis: Based on the RF model, identified the top five most important amino acid positions in the 15-mer peptide, then statistically analyzed the distribution differences of amino acids at these sites between the two classes.
- Virtual single amino acid mutation experiments: For each position in the peptide sequence in the test set, substituted all 20 amino acids one by one and observed the changes in model prediction (using change in ERROR RATE, ΔER), identifying amino acids at specific positions that have the greatest (or smallest) impact on predictions. They also assessed how the effect of these mutations varied under different allele backgrounds, clarifying the influence of MHC background on prediction outcomes.
Main Results and Data Analysis
Model Performance Evaluation
A total of 11 different combined models were built. Among all algorithm/feature combinations, random forest (RF) consistently performed best. The most basic LBE model (using only simple numeric encoding without complex embeddings) was the top performer, with the following key metrics in 10-fold cross-validation:
- Balanced accuracy: 0.78
- MCC: 0.53
- Precision: 0.88
- Sensitivity: 0.78
- Specificity: 0.77
The Z-scale and LBE-15mer models performed almost identically; the ProtBert model had the lowest sensitivity but the highest specificity; active learning gave LBE a minor performance improvement (max MCC 0.51), but after over 350 iterations, performance plateaued. Overall, these complex, information-rich descriptors did not bring improvement with the current sample size, likely due to high input dimensionality causing feature sparsity.
External Validation—T Cell Proliferation Experiments
The LBE top model’s performance on the t-cell proliferation external dataset:
- Balanced accuracy: 0.61
- MCC: 0.21
- Precision: 0.88
- Sensitivity: 0.87
- Specificity: 0.35
Although the model did not fully succeed at filtering inactive samples, its ability to identify active samples remained strong. Given that the external dataset is highly imbalanced, these results indicate the model has some endpoint generalization capability.
Model Interpretation Analysis
In the 15-mer RF peptide model, the top 5 positions in feature importance ranking were p3, p14, p2, p8, and p13. Among these, p2/p3/p8 are known as TCR binding sites, while p13/p14 are not directly involved in binding, but have been shown to significantly affect pMHC-II complex stability.
Amino acid distribution frequency analysis showed there were no obvious specificity differences at these five positions between the two classes (e.g., leucine was the most common AA in both active and inactive classes), suggesting that model decisions are not based solely on the frequency of a single AA’s appearance but are instead informed by coordinated patterns across multiple positions in the sequence.
Virtual single-point mutation experiments further revealed: p2, p3, p8, p13, and p14 were most influential on predictions. For example, mutating amino acids at p2 or p14 to tyrosine (Y) significantly altered the error rate (max ΔER = 0.017), and some mutations in different MHC backgrounds led to positive, negative, or even reversed predictions (e.g., G→Y in HLA-DRB10901 background caused reversal of activity prediction). These findings again confirm the model has learned the multi-step, complex interplay between MHC background, peptide sequence, and immune response.
Research Conclusions and Scientific/Application Value
This study systematically compared multiple sequence feature-based computational methods, demonstrating that even the simplest letter-based encoding, when leveraged with algorithms such as RF, can effectively predict the IFNγ release induced by pMHC-II complexes. By integrating active learning and interpretability tools, the researchers achieved not only strong predictive performance but also deeper insight into the molecular mechanisms underlying the predictions. The results confirm the model can be generalized to other related T cell functional experimental data, providing a theoretical and methodological foundation for high-throughput and universal drug immunogenicity risk assessment.
Research Highlights
- Diverse feature descriptor comparison: The study considered physicochemical, natural language processing (ProtBert), and traditional numeric encoding descriptors, providing practical modeling references for the field.
- Active learning strategy: Through applying active learning to optimize model performance and enhance minority class recognition, the work put algorithmic innovation into practice.
- Comprehensive explainability experiments: By combining feature importance and virtual mutation, the study revealed the biological information truly leveraged by the model, enhancing its usability and credibility.
- Endpoint generalization validation: For the first time, the model was applied to different but related T cell functional experiments, laying a foundation for deployment in real-world drug R&D pipelines.
- Open data and code: All data and code were released on GitHub, setting a benchmark for community reproducibility and further improvement.
Existing Challenges and Outlook
- The dataset is heterogeneous in terms of experimental formats and detection method kinetics, which were not fully incorporated into modeling features.
- Extreme class imbalance and uneven allele distribution may affect model generalization to rare genotypes.
- The latest generation of pre-trained protein BERT models may further enhance performance if optimized for this task.
Summary
This study effectively explored and integrated approaches for high-throughput prediction of immunogenic risks in protein drugs, providing a solid methodological and theoretical foundation for future personalized immunogenicity prediction, drug design optimization, and preclinical screening. The work excels in algorithmic innovation, interpretability, and operational design, offering great scientific and practical value. In future directions, integration of larger-scale data and multimodal information remains to be further developed. This work is of significant enlightening value to the fields of bioinformatics, computational immunology, and the broader biopharmaceutical industry.