Research

Publications

Papers published at top conferences and journals.

ACM Multimedia

2025 4 papers
AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding

AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding

Yidan Wang, Chenyi Zhuang, Wutao Liu, Pan Gao, Nicu Sebe
PDF
Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection

Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection

FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors

Chenxi Li, Weijie Wang, Qiang Li, Nicu Sebe, Bruno Lepri, Weizhi Nie

Unveiling Open-set Noise: Theoretical Insights into Label Noise

Chen Feng, Nicu Sebe, Georgios Tzimiropoulos, Miguel R. D. Rodrigues, Ioannis Patras

ICCV

2025 10 papers
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models

FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models

Mainak Singha, Subhankar Roy, Sarthak Mehrotra, Ankit Jha, Moloud Abdar, Biplap Banerjee, Elisa Ricci

Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery

Xiao Liu, Nan Pu, Haiyang Zheng, Wenjing Li, Nicu Sebe, Zhun Zhong

Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation

Jiahua Dong, Hui Yin, Wenqi Liang, Hanbin Zhao, Henghui Ding, Nicu Sebe, Salman Khan, Fahad Shahbaz Khan
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing

LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing

Federico Girella, Davide Talon, Ziyue Liu, Zanxi Ruan, Yiming Wang, Marco Cristani
On Large Multimodal Models as Open-World Image Classifiers

On Large Multimodal Models as Open-World Image Classifiers

Pseudo-SD: Pseudo Controlled Stable Diffusion for Semi-Supervised and Cross-Domain Semantic Segmentation

Dong Zhao, Qi Zang, Shuang Wang, Nicu Sebe, Zhun Zhong

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

Yue Li, Qi Ma, Runyi Yang, Huapeng Li, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Martin R. Oswald, Danda Pani Paudel
Superpowering Open-Vocabulary Object Detectors for X-ray Vision

Superpowering Open-Vocabulary Object Detectors for X-ray Vision

Training-Free Personalization via Retrieval and Reasoning on Fingerprints

Training-Free Personalization via Retrieval and Reasoning on Fingerprints

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Lorenzo Baraldi, Davide Bucciarelli, Federico Betti, Marcella Cornia, Lorenzo Baraldi, Nicu Sebe, Rita Cucchiara

ICIAP

2025 3 papers
Automatic benchmarking of large multimodal models via iterative experiment programming

Automatic benchmarking of large multimodal models via iterative experiment programming

Diversified in-domain synthesis with efficient fine-tuning for few-shot classification

Diversified in-domain synthesis with efficient fine-tuning for few-shot classification

Nicola Dall'Asen, Victor G Turrisi da Costa, Yiming Wang, Nicu Sebe, Elisa Ricci
Evaluating Attribute Confusion in Fashion Text-to-Image Generation

Evaluating Attribute Confusion in Fashion Text-to-Image Generation

Ziyue Liu, Federico Girella, Yiming Wang, Davide Talon
CVPR 2025 5 papers
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers

Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers

Compositional Caching for Training-free Open-vocabulary Attribute Detection
Highlight

Compositional Caching for Training-free Open-vocabulary Attribute Detection

Multi-focal Conditioned Latent Diffusion for Person Image Synthesis

Multi-focal Conditioned Latent Diffusion for Person Image Synthesis

Jiaqi Liu, Jichao Zhang, Paolo Rota, Nicu Sebe
Not Only Text: Exploring Compositionality of Visual Representations  in Vision-Language Models
Highlight

Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models

Seeing the abstract: Translating the abstract language for vision language models

Seeing the abstract: Translating the abstract language for vision language models

Davide Talon, Federico Girella, Ziyue Liu, Marco Cristani, Yiming Wang

WACV

2025 3 papers
3D Part Segmentation via Geometric Aggregation of 2D Visual Features

3D Part Segmentation via Geometric Aggregation of 2D Visual Features

Marco Garosi, Riccardo Tedoldi, Davide Boscaini, Massimiliano Mancini, Nicu Sebe, Fabio Poiesi
Face Anonymization Made Simple

Face Anonymization Made Simple

Hanwei Kung, Tuomas Varanka, Sanjay Saha, Terence Sim, Nicu Sebe
PDF
One vlm to keep it learning: Generation and balancing for data-free continual visual question answering

One vlm to keep it learning: Generation and balancing for data-free continual visual question answering