Selected work
Safe Vision-Language Models via Unsafe Weights Manipulation
Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning
Wasserstein-Aligned Hyperbolic Multi-View Clustering
ConViS-Bench: Estimating Video Similarity Through Semantic Concepts
ImageNet-trained CNNs are not biased towards texture: Revisiting feature reliance through controlled suppression
Increasing the Utility of Synthetic Images through Chamfer Guidance
Towards a General Attention Framework on Gyrovector Spaces for Matrix Manifolds
AlignCAT: Visual-Linguistic Alignment of Category and Attributefor Weakly Supervised Visual Grounding
Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection
FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
On Large Multimodal Models as Open-World Image Classifiers
Superpowering Open-Vocabulary Object Detectors for X-ray Vision
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models