Selected work
A Unified Masked Jigsaw Puzzle Framework for Vision and Language Models
Consistency-Aware Anchor Pyramid Network for Crowd Localization
H2OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
Deep Learning-Based Object Pose Estimation: A Comprehensive Survey
Hallucination Early Detection in Diffusion Models