Papers published at top conferences and journals.
AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding
Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection
FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
On Large Multimodal Models as Open-World Image Classifiers
Superpowering Open-Vocabulary Object Detectors for X-ray Vision
Training-Free Personalization via Retrieval and Reasoning on Fingerprints
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Automatic benchmarking of large multimodal models via iterative experiment programming
Diversified in-domain synthesis with efficient fine-tuning for few-shot classification
Evaluating Attribute Confusion in Fashion Text-to-Image Generation
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
Compositional Caching for Training-free Open-vocabulary Attribute Detection
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
Seeing the abstract: Translating the abstract language for vision language models
3D Part Segmentation via Geometric Aggregation of 2D Visual Features
One vlm to keep it learning: Generation and balancing for data-free continual visual question answering