Sample-Cohesive Pose-Aware Contrastive Facial Representation Learning

Enhancing Pose Awareness in Self-Supervised Facial Representation Learning Research Background and Problem Statement In the field of computer vision, facial representation learning is a crucial research task. By analyzing facial images, we can extract information such as identity, emotions, and poses, thereby supporting downstream tasks like facial...

A Mutual Supervision Framework for Referring Expression Segmentation and Generation

A Mutual Supervision Framework for Referring Expression Segmentation and Generation

A Mutual Supervision Framework for Referring Expression Segmentation and Generation Research Background and Problem Statement In recent years, vision-language interaction technology has made remarkable progress in the field of artificial intelligence. Among these advancements, referring expression segmentation (RES) and referring expression generat...

Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection

Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection

GL-MCM: Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection Research Background and Problem Statement In real-world applications, machine learning models often face changes in data distribution, such as the emergence of new categories. This phenomenon is known as “Out-of-Distribution Detection (OOD).” To ensure the...

Lidar-guided Geometric Pretraining for Vision-centric 3D Object Detection

Lidar-guided Geometric Pretraining for Vision-centric 3D Object Detection

Lidar-Guided Geometric Pretraining Enhances Performance of Vision-Centric 3D Object Detection Background Introduction In recent years, multi-camera 3D object detection has garnered significant attention in the field of autonomous driving. However, vision-based methods still face challenges in precisely extracting geometric information from RGB imag...

An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-training

An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-training Academic Background In recent years, self-supervised learning (SSL) has made significant progress in the field of computer vision. In particular, the successful application of masked image modeling (MIM) pre-training methods on large-sca...