MASA-TCN: Multi-Anchor Space-Aware Temporal Convolutional Neural Networks for Continuous and Discrete EEG Emotion Recognition

A Breakthrough in EEG Emotion Recognition: The Proposal and Experimental Analysis of the MASA-TCN Unified Model

Academic Background and Research Motivation

Human emotion recognition has long been a popular research direction in neuroscience, artificial intelligence, and human-computer interaction. Automatic identification of individual emotional states can serve psychological health management, intelligent assistive systems, and provide more natural human-computer interaction. It also offers effective intervention and monitoring for patients with mental disorders such as depression, anxiety, and autism spectrum disorder. However, the development of emotion recognition technology has mainly focused on signals based on speech and facial expressions. Although these signals are easy to collect, they are easily subject to voluntary control or concealment by subjects and lack the precise capture of the brain’s true emotional state.

In contrast, electroencephalogram (EEG), as a non-invasive, low-cost, high temporal resolution brain imaging tool, can directly reflect the brain’s intrinsic emotional neural activity, granting it unique advantages in the field of emotion recognition. EEG-based emotion recognition tasks mainly comprise two types: Discrete Emotion Classification (DEC) and Continuous Emotion Regression (CER). The former assigns a classification label to each sample, while the latter makes time-continuous regression predictions of emotion, more closely capturing the real dynamic process of emotional change. However, despite the extensive research on DEC-related methods, studies and data for CER remain scarce, and methods for continuous emotion regression based on EEG signals are in particular short supply.

Therefore, the authors of this paper aimed to address two core issues: (1) How to improve the effectiveness of EEG-based emotion continuous regression tasks, especially given the current difficulties in effectively learning the spatial features of EEG signals; (2) Is it possible to propose a unified model that, while accounting for spatial, spectral, and temporal features, can be applied to both CER and DEC tasks, achieving “integrated” emotion recognition?

Paper Source and Author Information

The paper, titled “MASA-TCN: Multi-Anchor Space-Aware Temporal Convolutional Neural Networks for Continuous and Discrete EEG Emotion Recognition,” was published in the IEEE Journal of Biomedical and Health Informatics (Vol. 28, Issue 7, July 2024). The authors include Yi Ding, Su Zhang, Chuangao Tang, and Cuntai Guan, all well-known scholars in the field of EEG signals and brain-computer interfaces. The authors are affiliated with Nanyang Technological University (Singapore) and Nanjing Institute of Technology (China). The study was supported by A*STAR (Agency for Science, Technology and Research, Singapore) and relevant funding programs.

Research Design and Technical Workflow Details

This work is an original algorithm study that systematically solves spatial feature learning and task fusion problems in EEG emotion recognition by proposing MASA-TCN (Multi-Anchor Space-Aware Temporal Convolutional Neural Networks). The following outlines the detailed technical workflow:

1. Task Definition and Data Annotation Approach

  • CER Task: Each EEG trial sample is segmented into multiple short time windows, with labels as continuously varying values over time (such as emotional valence), using a sliding window to synchronize EEG and labels.
  • DEC Task: Each EEG trial sample is assigned a discrete emotion classification label, with all segments within the same trial sharing the same label.

2. Overall Network Architecture Design

MASA-TCN consists of four main modules:

(1) Feature Extraction Block - After preprocessing EEG signals, the mean Relative Power Spectral Density (rPSD) in 65 frequency bands is calculated for each sub-segment, constructing a 192160-dimensional input feature vector (depending on the dataset’s channel number and number of frequency bands).

(2) Space-Aware Temporal Layer (SAT) - One key innovation, including two types of convolution kernels: a) Context Kernels extract spectral features channel by channel; b) Spatial Fusion Kernels learn spatial patterns across all channels. By setting different strides and dilation rates, causal convolution in the temporal dimension is realized, effectively expanding the receptive field and improving feature discrimination, thus avoiding redundancy caused by sliding window overlap.

(3) Multi-Anchor Attentive Fusion Block (MAAF) - The second innovation, employing three parallel SAT modules of different kernel lengths (3, 5, 15) to adapt to multi-scale temporal changes in emotion. The outputs of the three branches are concatenated, then fused by a 1×1 convolution (“attentive fusion”)—dynamically weighting the contribution of multi-scale features, enhancing model robustness.

(4) Temporal Convolutional Network Block (TCN) - Multiple layers of causal convolution are stacked, combined with residual connections and normalization, progressively learning higher-level temporal features. By adjusting depth and width (number of convolutional kernels), the model flexibly controls its temporal receptive field.

(5) Output Regression/Classification Module - For CER tasks, a linear regressor predicts emotion at each time point; for DEC tasks, the mean of all sub-segment outputs is taken as the overall label, achieving regression-to-classification task conversion.

3. Dataset and Preprocessing Pipeline

  • MAHNOB-HCI: Used for CER; 30 subjects, 239 trial segments from 24 subjects, 32-channel EEG data at 256Hz, labels at 4Hz, valence annotated by averaging multiple experts’ ratings.
  • DEAP: Used for DEC; 32 subjects, 40 one-minute music video trials per subject, each with subjective ratings, EEG data at 32 channels and 512Hz, downsampled to 128Hz. The original 9-level continuous scores are collapsed to binary classes.

Preprocessing includes removal of non-stimulus periods, bandpass filtering, reference calibration, sliding window segmentation, and rPSD calculation. Procedures are strictly aligned to guarantee comparability across datasets.

4. Experimental and Evaluation Procedures

  • CER Evaluation Metrics: Root Mean Square Error (RMSE), Pearson Correlation Coefficient (PCC), Concordance Correlation Coefficient (CCC), using CCC as the loss optimization objective.
  • DEC Evaluation Metrics: Accuracy (ACC) and F1 score, employing 10-fold cross-validation and independent subject testing strategies.
  • Hyperparameter settings, training strategies, and baseline methods are strictly aligned for fair comparison.

Main Research Results and Data Details

1. CER Task Results Analysis

MASA-TCN significantly outperforms all comparison methods on the MAHNOB-HCI dataset, including traditional RNN, LSTM, GRU, TCN, and recently published methods. Key results: - Test set RMSE drops by 14.29%; PCC increases by 0.043; CCC increases by 0.046 (compared to TCN). - Compared to the previously reported state-of-the-art [8], RMSE is 9.09% lower, PCC is 0.033 higher, and CCC is 0.04 higher.

2. Component Ablation and Model Analysis

Incremental addition of SAT and MAAF modules continually improves CER performance metrics, confirming their effectiveness. - SAT only: RMSE drops; PCC increases by 0.022; CCC increases by 0.023. - With addition of MAAF: RMSE further drops to 0.060, PCC increases to 0.507, CCC increases to 0.417.

3. Effect of Initial Dilation Rate and Kernel Size

  • Initial dilation rate of 2 achieves optimal performance, effectively expanding the temporal receptive field and reducing model redundancy.
  • Increasing kernel length from 3 to 15 continually improves PCC and CCC, indicating that multi-scale modeling is crucial for accurately capturing emotional dynamics.

4. Effect of Depth and Width (Number of Kernels)

  • Performance does not notably improve and even slightly declines when depth surpasses 4; best accuracy is achieved at a width of 64, while a wider model (width 128) is difficult to train and performs worse.
  • This reflects an optimal balance between spatial feature learning and sufficiently broad temporal receptive field.

5. Fusion Strategies and Order of Spatial Feature Learning

  • Attentive fusion greatly outperforms simple concatenation or mean fusion; MASA-TCN excels in fusion method over all similar models.
  • “Early” spatial feature learning (performed in SAT module) is markedly superior to “late” learning, with significant performance differences—late spatial learning cannot achieve comparable results.

6. DEC Task Results and Classifier Structure Analysis

MASA-TCN also achieves the highest accuracy and F1 score in DEC tasks (valence and arousal dimensions) on the DEAP dataset (respectively outpacing competitors by 1.63% and 2.7%), surpassing methods such as SVM, DeepConvNet, EEGNet, TSception, and the latest transformer-based model MEET. The mean fusion mechanism further enhances classification robustness and generalization.

Conclusions, Scientific Value, and Application Significance

MASA-TCN successfully breaks through bottlenecks in spatial feature learning and model fusion between CER/DEC tasks for EEG emotion recognition, presenting the first unified modeling solution of its kind. Core scientific merits include:

  • Methodological Innovation: Space-aware temporal convolution and multi-anchor attentive fusion, along with multi-scale feature modeling, effectively address difficulties in feature learning posed by EEG’s complex spatial-temporal-spectral characteristics;
  • Next-generation Unified Model: MASA-TCN supports both continuous regression and discrete classification, helping to resolve long-standing issues of data scarcity and label asynchrony, opening new paths for generalization and practical application;
  • Highly Generalizable Experiments: SOTA results on two public datasets, with reproducible code and evaluation standards provided for future research;
  • Significant Scientific and Application Impact: Societal value in psychological health monitoring, intelligent assistive systems, human-computer interaction, and affective computing, with broad application prospects.

Research Highlights and Future Prospects

This study’s highlights:

  • Introduction of innovative SAT spatial feature module and MAAF multi-anchor fusion, filling a technical gap in spatial learning for EEG emotion recognition;
  • Bold integration of model architecture and task type, overcoming fragmentation and lack of cross-task reuse in the field;
  • Support for high-performance continuous emotion regression, contributing to empirical validation of “dynamic features” and “continuous processes” in emotion cognition theory;
  • In-depth analysis of fusion strategies, dilation rates, and model width, providing scientific basis for future algorithm development and parameter tuning;
  • Full release of code and experimental configuration, promoting standardization of data and methods in the field.

Nevertheless, current CER task data remain scarce, with highly demanding continuous annotation requirements for labels—future work should focus on expanding public datasets. Regarding the mechanism whereby early spatial learning outperforms late learning, the authors call for more theoretical analysis and the introduction of interpretable AI methods. In terms of loss function, exploring multi-metric joint optimization may further enhance regression capability for extreme values and subtle dynamics.

Summary

Overall, this research advances EEG emotion recognition by a major leap forward. The proposal and thorough experimental validation of the MASA-TCN model lay a solid methodological foundation for subsequent studies in affective computing, cognitive neuroscience, and clinical psychological health. This paper not only presents a new paradigm for joint spatial-temporal modeling of EEG signals but also provides a practical guide and algorithmic cornerstone for industry applications.