Selected work
Continual Vision-Language Action Learning in Robotic Manipulation
From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition
How to Take a Memorable Picture? Empowering Users with Actionable Feedback
PoInit-of-View: Poisoning Initialization of Views Transfers Across Multiple 3D Reconstruction Systems
Specificity-aware reinforcement learning for fine-grained open-world classification
TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation
The Devil Is in Gradient Entanglement: Energy-Aware Gradient Coordinator for Robust Generalized Category Discovery
Token Reduction via Local and Global Contexts Optimization for Efficient Video Large Language Models
Finetune Like You Pretrain: Boosting Zero-shot Adversarial Robustness in Vision-language Models
Large Multimodal Models as General In-Context Classifiers
Organizing Unstructured Image Collections using Natural Language
SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models
NullFace: Training-Free Localized Face Anonymization
Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
Riemannian High-Order Pooling for Brain Foundation Models