DockEM: An Enhanced Method for Atomic-Scale Protein–Ligand Docking Refinement Leveraging Low-to-Medium Resolution Cryo-EM Density Maps
Academic Background and Research Motivation
In recent years, protein–ligand docking has rapidly developed as a core technology for virtual drug screening and structure-based drug discovery. Despite improvements in drug discovery efficiency through large-scale high-throughput screening technologies, new drug development still faces high costs, long cycles, and limited conversion rates. Traditional small molecule docking methods primarily rely on the evaluation of protein and ligand 3D structures and energy functions, but how to further improve docking accuracy remains a key technical challenge in the field.
Meanwhile, cryo-electron microscopy (cryo-EM) has become an important tool in structural biology due to its ability to resolve membrane proteins and macromolecular complexes without the need for crystallization. Although some cryo-EM density maps can reach atomic resolution, the vast majority of public data (e.g., Electron Microscopy Data Bank, EMDB) remain at medium to low resolution (3–10 Å), posing new challenges for leveraging density maps to enhance docking accuracy. Effectively integrating cryo-EM density map information into the virtual docking workflow under limited resolution to compensate for the shortcomings of traditional methods in utilizing structural information has become a bottleneck urgently needing to be addressed in drug discovery.
Against this backdrop, this study focuses on the cutting-edge question: “How to improve accurate protein–small molecule ligand docking using 3–15 Å medium-to-low-resolution cryo-EM density maps.” The authors have designed and developed DockEM—a method integrating local density map extraction, physical energy optimization, and advanced sampling algorithms—to overcome the precision limitations of traditional docking methods and the insufficient applicability of current cryo-EM docking workflows under restricted resolution.
Paper Source and Author Introduction
The paper, entitled “DockEM: an enhanced method for atomic-scale protein–ligand docking refinement leveraging low-to-medium resolution cryo-EM density maps”, was published in the internationally renowned journal Briefings in Bioinformatics (2025, Vol. 26, Issue 2, bbaf091). The work was jointly completed by Jing Zou, Wenyi Zhang, Jun Hu, Xiaogen Zhou, and Biao Zhang, with the first two authors as co-first authors and Biao Zhang, Xiaogen Zhou, and Jun Hu serving as corresponding authors.
The research team is mainly affiliated with the College of Information Engineering at Zhejiang University of Technology, Westlake AI Therapeutics Lab, and Chinese Academy of Medical Sciences Suzhou Institute of Systems Medicine. The manuscript was submitted on November 15, 2024, and officially accepted for publication on February 18, 2025.
Overall Research Scheme and Workflow
This study is an original methodological research centered on the DockEM algorithm system, featuring innovative designs, experimental evaluations, and comparative validations. The research workflow can be divided into the following stages:
1. Dataset Construction and Density Map Simulation
- Target Selection: The dataset covers 121 protein–ligand docking targets, with protein structures mainly sourced from the DUD-E and COACH databases. To ensure structural diversity and coverage of drug target space, all proteins were predicted with AlphaFold2, with an average TM-score of 0.983, indicating high structural reliability.
- Simulated Density Map Generation: Using software such as EMAN2 and UCSF Chimera, each protein–ligand complex was used to generate simulated cryo-EM density maps ranging from 3–15 Å resolution, thus covering the mainstream experimental resolution range.
2. DockEM Energy Function System and Key Algorithm Design
- Innovative Energy Function: The DockEM energy function Etotal is a weighted sum of four terms: (1) density map matching energy (ecc), (2) protein–ligand van der Waals and electrostatic energy (eintra), (3) intra-ligand van der Waals energy (einter), and (4) local density map distance constraint energy (edis). The ecc term innovatively introduces a correlation coefficient for the fit between the local density map and ligand, while the edis term ensures that the ligand is precisely located in the high-suitability space of the local map.
- Sampling Strategy: The core search and alignment process adopts Replica Exchange Monte Carlo (REMC) simulation, which significantly increases conformational sampling space and avoids getting stuck in local minima.
- Two-step Rigid and Flexible Docking: The method first performs rigid docking (overall ligand translation and rotation) to rapidly locate the binding site, followed by flexible docking refinement where the 20 lowest-energy initial conformations are individually subjected to large-scale rotations based on rotatable bonds.
3. Adaptive Extraction of Local Density Maps and Docking Localization
- The global protein model is first aligned with the overall density map under the guidance of ecc; coupled with known or predicted binding sites, a cubic region twice the ligand’s maximum atomic distance is clipped to form the local density map, providing the basis for high-precision initial ligand fitting.
- Using 500 steps of Monte Carlo fitting, the conformation with the highest fit (lowest total energy) is selected, and the local density map center and boundaries are updated to further narrow the search space.
4. Flexible Protein–Ligand Refinement and Accuracy Evaluation
- In the flexible docking phase, the REM algorithm is used to perform full-atom rotation on the ligand’s rotatable bonds to explore extremely low-energy conformations;
- For each candidate result, DockRMSD is used to calibrate atomic order and calculate precise RMSD (root mean square deviation), thus ensuring unified evaluation criteria.
5. Performance Comparison and In-depth Analysis
- Large-scale systematic comparisons were conducted with four mainstream international protein–ligand docking methods, including ChemEM, Emerald, CB-Dock2, and EDock.
- Using professional tools such as RDKit, multiple physical interaction metrics—including electrostatic energy, van der Waals energy, solvation energy, and hydrogen bond count—were comprehensively assessed to dissect the structural and energetic advantages of DockEM.
6. Case Validation and Expansion to Experimental Density Maps
- Representative complexes were selected for visual analysis and detailed comparison of fitting effects among different methods;
- The systematic validation was extended to two groups of real experimental cryo-EM density maps to test methodological adaptability and practical value.
Main Research Results and Findings
1. Significant Overall Performance Enhancement
- Docking Accuracy: DockEM’s average RMSD reached 1.87 Å, significantly outperforming Emerald (2.06 Å), ChemEM (3.75 Å), CB-Dock2 (2.88 Å), and EDock (3.99 Å), representing improvements of 10%–53% over mainstream methods.
- Success Rate: Using flexible RMSD Å as the standard, DockEM successfully docked 110 out of 121 cases, achieving a success rate of 90.9%, far exceeding Emerald (78.5%), CB-Dock2 (58.7%), etc.
- Statistical Significance: Paired t-tests confirmed DockEM’s docking accuracy is significantly higher than other methods (p-values as low as 1.2×10⁻² to 2.3×10⁻²³).
2. Advantage in Adaptive Extraction and Docking Based on Local Density Maps
- After rigid docking, the average deviation between the ligand center and the true structure center was only 1.75 Å, a 65.4% reduction compared to the predicted binding site center (5.06 Å).
- After flexible refinement, the deviation between the DockEM-fitted ligand center and the true structure center was only 0.94 Å, a further reduction of 14%–65% compared to traditional methods.
3. Multi-Energy Cooperative Optimization Improves Physical Credibility
- On multiple energy metrics—including electrostatic energy (elc), van der Waals energy (vdw), hydrogen bond count, and solvation energy (slv)—DockEM’s fitted results ranked first or second overall, balancing ligand conformation rationality and physical interaction credibility.
- The balance between hydrogen bond count (hb) and vdw energy was better than other methods, effectively avoiding overfitting or structural clashes.
4. Excellent Robustness Under Medium/Low Resolution
- Performance was robust under 3–15 Å resolution; even in the extreme 10–15 Å low-resolution condition, DockEM remained comparable to the best reference method CB-Dock2, with average RMSD around 2.39 Å, superior to ChemEM, Emerald, and EDock.
5. Feasibility in Experimental Density Maps
- On two groups of real cryo-EM experimental density maps, DockEM achieved highly accurate ligand docking with RMSDs of 0.90 Å (resolution 7.0 Å) and 0.40 Å (resolution 3.14 Å), providing pivotal technical validation for structure-based drug design.
Research Conclusions and Scientific Value
By efficiently integrating cryo-EM density map information into the protein–ligand docking workflow, this study proposes and implements DockEM—a novel method with adaptive focusing on local density maps, multi-energy physical optimization, and REM global sampling. Systematic evaluation demonstrates that DockEM outperforms current mainstream methods in ligand docking accuracy and physical rationality under 3–15 Å resolution and shows good adaptability on real density map platforms.
In terms of scientific significance, this approach bridges structural biology and computational drug discovery, substantially advancing the efficient use of medium-to-low-resolution cryo-EM data in drug discovery for the first time. It creates conditions for accurate docking in practical scenarios such as large molecular size, inaccurate binding site predictions, or limited resolution. Additionally, the proposed energy function system, local density map trimming and characterization, and the REM sampling framework provide practical paradigms for spatial optimization of complex biomolecules and large-scale computational screening.
In terms of application value, DockEM is suitable for traditional small molecule drug screening and can be extended to complex protein–peptide docking problems. It also lays a solid foundation for future advanced applications such as deep learning-based parameter optimization and automated high-throughput screening. The software tool has been fully open sourced for industry and academia to validate and further develop.
Research Highlights and Innovative Features
- Innovatively integrates local density map analysis, flexible sampling, and multi-energy physical optimization as a whole scheme to optimize docking accuracy across the workflow;
- Exhibits leading performance under low/medium resolution density map scenarios, setting an example for the in-depth utilization of extensive cryo-EM structural data in drug research;
- Proposes a sharpening strategy for local binding site localization and introduces the adaptation-related correlation coefficient energy term (ecc), improving the correlation between docking poses and true structures;
- Utilizes sampling and optimization strategies fully leveraging the REM algorithm, avoiding traditional local minimum traps and enhancing the efficiency of exploring polarized conformational space;
- Open source code and detailed evaluation data facilitate reproducibility and collaborative innovation in the community and research sector.
Other Supplementary Information
- DockEM runs without needing a GPU; typical model docking time is around 60 minutes, suitable for most mainstream experimental and industrial scenarios;
- All data, code, and method documentation are open sourced on GitHub, greatly facilitating subsequent customized experiments and cross-validation;
- The research team suggests future work could incorporate deep learning frameworks for parameter selection and energy function optimization, further enhancing DockEM’s suitability and automation across different targets and ligand scenarios.
Conclusion
This study not only broadens the application boundary of protein–ligand docking in cryo-EM low/medium resolution settings but also brings dual theoretical and methodological innovations to the field of molecular docking. Its open platform and comprehensive experimental validation provide a solid foundation for subsequent structure-based drug design, protein interaction mechanism studies, and high-throughput virtual screening. The advent of DockEM will strongly propel the intelligent and automated evolution of drug development pipelines and add new highlights to China’s rising voice in international structural biology and AI-enabled drug development.