MAEST: Accurate Spatial Domain Detection in Spatial Transcriptomics with Graph Masked Autoencoder
Spatial Transcriptomics: Cutting-Edge Technology for Deciphering Spatial Heterogeneity in Tissues
Spatial transcriptomics (ST) is an emerging sequencing technology that has rapidly developed in recent years. Its core advantage lies in the simultaneous acquisition of gene expression and spatial location information at the tissue section level, providing an unprecedented data foundation for revealing the spatial architecture, functional partitioning, and disease microenvironments of multicellular biological tissues. With the maturation of platform technologies such as 10x Visium, Slide-seq, Stereo-seq, seqFISH, and MERFISH, scientists can now obtain high-resolution, spatially traceable, large-scale gene expression data, greatly advancing fields such as developmental biology, neuroscience, and tumor biology.
Spatial domain identification is a central step in the analysis of ST data. Its goal is to group nearby spots (cells) with similar expression patterns into biologically meaningful spatial domains, thus reconstructing the histological structure and functional organization of complex tissues. However, most existing methods either overly rely on gene expression data and ignore critical spatial neighborhood information, or their robustness is limited in the face of high-noise and high-missing-rate original data, making it difficult to ensure the continuity and accuracy of domain partitioning.
Paper Source and Author Background
The research team, led by Pengfei Zhu, Han Shu, and Yongtian Wang, includes members from the School of Computer Science and the Key Laboratory of Big Data Storage and Management at Northwestern Polytechnical University, the School of Computer Science and Artificial Intelligence at Zhengzhou University, the School of Computer Science and Engineering at Xi’an University of Technology, and the Affiliated Hospital of Northwest University—demonstrating strong interdisciplinary capabilities. The paper was officially published by Oxford University Press in the 2025 issue of Briefings in Bioinformatics (Volume 26, Issue 2, bbaf086) and the source code is available openly (https://github.com/clearlove2333/maest).
Research Design and Technical Approach
This study proposes a novel graph neural network (GNN) based spatial domain identification method—MAEST (Masked AutoEncoder for Spatial Transcriptomics)—specifically designed to tackle challenges unique to ST data, such as high missingness, high noise, and complex spatial structures.
1. Overall Workflow
MAEST follows a multi-step, highly integrated analytical workflow:
(1) Data Preprocessing and Graph Construction
- Data Cleaning and Standardization: As with STAGATE and other studies, outliers in the ST raw data are first excluded; the gene expression matrix is then log-transformed and normalized, and the top 3,000 genes with the most variation are selected as the main features for subsequent analysis.
- Spatial Adjacency Graph Generation: The entire tissue is modeled as a graph structure G = (V, A, X), where each node v represents a spot and its feature vector x consists of the normalized gene expressions. For each node, k nearest neighbors (k=3, empirically optimal) are defined as its spatial neighbors, and bidirectional edges are formed to create an undirected spatial adjacency graph.
(2) Graph Masked Autoencoder Module Construction
This innovative core module of MAEST aims to solve problems of noise, redundancy, and missingness:
- Random Feature Masking: Some nodes are randomly masked, their features set to all zeros, and then input into the GNN encoder, which attempts to reconstruct the masked node features based on the unmasked nodes and spatial adjacency relations.
- Multiple Random Re-Maskings: To enhance robustness, multiple random masks are applied to the hidden layer, each requiring the decoder to recover the input features—thus significantly improving the model’s resilience against local perturbations.
- Regularization Mechanism: By introducing a projector (an MLP network), the loss function is constrained so that node representations under masked scenarios can closely recover the output from the unmasked scenario, thereby accelerating convergence and improving parameter stability.
(3) Graph Contrastive Learning Module
This compensates for the local representational capacity of the autoencoder, boosting recognition of global spatial relationships:
- Positive and Negative Sample Generation: An augmented view of the original attribute graph is produced by randomly shuffling gene expression vectors (while keeping the edges; only matrix x’ is permuted); the same GNN encoder is used, followed by a shared MLP to produce final representations z and z’.
- Feature Discrimination Learning: Using a binary cross-entropy function, the network discriminates between original and augmented (shuffled) graph nodes, thus bringing together positive pairs (matching a node’s original features with its global representation) and differentiating negative pairs (shuffled features vs. original graph global representation), which enforces a more uniform and discriminative feature distribution.
(4) Multi-hop Information Integration
- One-hop and Multi-hop Aggregation: To balance local and long-range spatial dependencies, the model’s output fuses features obtained by both one-hop (nearest neighbors only) and three-hop aggregation (through a parameter-free multi-layer aggregation module fn), thereby achieving multi-scale enhancement of node spatial relations.
(5) Clustering and Spatial Domain Assignment
- Mclust Gaussian Mixture Clustering: The integrated output feature matrix is clustered using the Mclust algorithm, assigning spatial domain labels; for datasets with manual annotations, the number of clusters matches the annotations, and for others, the number of clusters is determined with reference to similar methods and histological characteristics.
2. Study Subjects and Dataset Overview
MAEST was systematically validated on five authoritative publicly available ST datasets, comprehensively assessing generalizability across species, anatomical sites, platform technologies, and resolutions, including:
- Human dorsolateral prefrontal cortex (LIBD DLPFC, 10x Visium, 12 slices, 3460–4789 spots/slice, 33,538 genes)
- Mouse olfactory bulb (Stereo-seq, 1 slice, 19,109 spots, 14,376 genes)
- Mouse hippocampus (Slide-seq v2, 1 slice, 52,869 spots)
- Mouse embryo development atlas (Stereo-seq, 4 slices from e11.5–e14.5, 30,124–92,928 spots/slice)
- Mouse brain tissue (10x Genomics, 2 groups of anterior/posterior slices)
3. Algorithm Evaluation and Ablation Experiments
- MAEST’s clustering accuracy was comprehensively assessed using Accuracy, Adjusted Rand Index (ARI), and Normalized Mutual Information (NMI).
- Simulated dropout experiments (dropout from 0 to 0.9) were carried out to test robustness.
- Ablation experiments were designed to incrementally remove/add individual components, quantifying their contribution to the overall performance.
Main Research Outcomes and Data Support
1. Superior Spatial Domain Identification in Human DLPFC
Across 12 prefrontal cortex slices, MAEST achieved the highest median scores on ACC, ARI, and NMI (ACC=0.77, ARI=0.62, NMI=0.71) compared to all existing methods including GraphST, STAGATE, and DeepST. Its spatial partitioning continuity and consistency were significantly increased, with much clearer delineation between complex cortical layers and white matter, matching manual annotation with high fidelity. UMAP and PAGA embeddings indicated that the representations learned by MAEST best reflected spatial locations and trajectories.
2. Outstanding Fine-Scale Resolution of Mouse Tissue Substructures in High-Resolution Data
In the Slide-seq v2 mouse hippocampus data, MAEST accurately reconstructed major anatomical regions (forebrain bundle, dentate gyrus, CA pyramidal layers, etc.), down to precise delineation of the third ventricle and neighboring subnuclei. Not only were global partitions accurate, but by adjusting the cluster number, MAEST could separate highly similar subregions (e.g., lateral posterior thalamic nucleus and lateral geniculate nucleus) and stratify cortical layers, with local results highly consistent with the expression patterns of spatial marker genes, reflecting the high-dimensionality and interpretability of the model structure.
In the Stereo-seq mouse olfactory bulb data, MAEST finely distinguished anatomical boundaries such as olfactory nerve layer, granule cell layer, external/internal plexiform layers, and rostral migratory stream (RMS)—outperforming GraphST (which produced only coarse partitions) and STAGATE (which failed to resolve micro-layers). Further validation with region-specific gene markers confirmed high overlap between MAEST clustering results and zone-specific gene expression.
3. Robust Modeling of Spatiotemporal Dynamics in Mouse Embryo Development
For four developmental timepoint whole-slice mouse embryo datasets, MAEST accurately reconstructed the main structural domains such as liver, heart, cartilage, muscle, and brain, and at e14.5 uniquely identified two fine-grained functional subdomains in the brain, marked by astrocyte-specific and neuronal growth regulatory genes, respectively, reflecting real functional specialization. Main structure/domain assignments at other timepoints also matched manual annotation closely, significantly contributing to insights into developmental timing and spatial dynamics.
4. Consistent Horizontal Integration Across Slices
In two groups of mouse anterior/posterior brain sections, MAEST achieved seamless structural continuity in the horizontal direction, accurately recovering the five-layer neocortex, dorsal/ventral hippocampal horns, and structural transitions at “slice boundaries”—in contrast to the segmentation artifacts at boundaries observed in STAGATE, and even surpassing GraphST’s alignment-based stitching.
5. Robustness and Parameter Sensitivity Analysis
Across varying dropout rates, MAEST demonstrated excellent anti-missingness robustness, maintaining high accuracy up to a dropout rate of 0.8—far surpassing peer methods. Ablation experiments validated the independent and combined contributions of the masked autoencoder, regularization, contrastive discrimination, and multi-hop integration modules, each incrementally raising the ARI metric. Parameter sensitivity analysis identified optimal ranges for mask rate, integration hops, and lambda hyperparameters, and showed that rational tuning can significantly enhance overall performance.
Conclusions, Significance, and Highlights
Scientific and Applied Value
MAEST is tailored to the intrinsic characteristics of ST—high missingness and noise—by leveraging a novel graph masked autoencoder, node contrastive learning, and multi-scale feature integration to overcome the limitations of traditional clustering methods. It achieves precise tissue partitioning from coarse to fine scales, greatly enriching the biomolecular discovery toolkit for spatial omics. Its generalizability is high, supporting multiple platforms, species, tissue types, and horizontal multi-slice integration. MAEST not only serves foundational purposes such as structural/functional annotation, but also provides a robust algorithmic foundation for microenvironment analysis, developmental spatiotemporal dynamics, and tumor heterogeneity research, with broad future application prospects.
Technical Innovations and Distinctiveness
- Innovative Graph Masked Autoencoder: Employs deep neural network self-supervised learning within the spatial adjacency graph for denoising, reconstruction, and effective prevention of feature collapse.
- Node Contrastive Discrimination Module: Complements the local information acquisition of the autoencoder, promotes even distribution in the representation space, and significantly enhances model robustness.
- One-hop + Multi-hop Information Integration: Multi-scale feature aggregation, enabling rich modeling of complex, long-range spatial dependencies.
- Generalizable Unsupervised End-to-End Pipeline: Requires no manual feature design or supervised labels, making it suitable for large-scale, diverse spatial omics projects.
Research Highlights and Special Contributions
- Excellent cross-platform and cross-species generalizability and robustness, adaptable to diverse real-world ST data;
- Leading performance under high missingness and noise, providing strong support for practical complex biological sample scenarios;
- First to achieve consistent natural stitching across multiple horizontal slices, opening new avenues for spatial integration;
- Stable performance across multiple clustering algorithms and 40 random seeds, indicating high scalability and reproducibility.
Other Information and Future Prospects
The full source code and procedures have been made publicly available for community use, enabling full pipeline traceability. Although there remains room for improvement in identifying very small boundary domains (e.g., confusion of marginal spots or demarcating extremely small subregions), this work lays a solid foundation for future high-resolution spatial bioinformatics research. The team plans to focus on even higher-resolution, better-boundaried spatial domain identification, and deeper integration across platforms moving forward.
The publication of the MAEST work highlights the comprehensive strength of China’s computational and life sciences intersection in advancing both algorithmic innovation and application of spatial transcriptomics, offering vast new potential for spatially-integrated tissue biology, mechanistic disease research, and precision medicine.