Emerging topics in learning from noisy and missing data

Tutorial course at ACM Conference on Multimedia 2016
On Sunday October 16th 2016 from 14:00 to 17:00.
Room C0.02 of the Roeterseiland complex of the University of Amsterdam.


Homepage of Xavier Alameda-Pineda Homepage of Timothy Hospedales Homepage of Elisa Ricci Homepage of Nicu Sebe Homepage of Xiaogang Wang

Xavier Alameda-Pineda
received his PhD from INRIA and University of Grenoble in 2013. He was a post-doctoral researcher at CNRS/GIPSA-Lab. Currently he holds a research fellowship at the University of Trento, in the Multimodal Human Understanding Group, working on signal processing and machine learning for scene and behavior understanding using multimodal data. He is the ACM Multimedia 2015 best paper prize winner and was recently appointed as a research scientist in the Perception at INRIA Grenoble Rhône-Alpes. [Homepage]

Timothy Hospedales
is a Senior Lecturer (associate professor) at Queen Mary University of London. He leads the Applied Machine Learning lab at QMUL, studying applications of weakly supervised, transfer and cross-modal learning within computer vision and multimedia. He was area chair of WACV 2016 and BMVC 2015 best paper prize winner. [Homepage]

Elisa Ricci
is a researcher at FBK and an assistant professor at the University of Perugia. She received her PhD from the University of Perugia in 2008. She has been a post-doctoral researcher at Idiap and FBK, Trento and a visiting researcher at the University of Bristol. Her research interests are directed along developing machine learning algorithms for video scene analysis, human behavior understanding and multimedia content analysis. She is area chair of ACM MM 2016 and of ECCV 2016. [Homepage]

Nicu Sebe
is a full professor in the University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human behavior understanding. He was a General Co-Chair of FG 2008 and ACM MM 2013, and a program chair of CIVR 2007 and 2010, and ACM MM 2007 and 2011. He is a program chair of ECCV 2016 and ICCV 2017. He is a senior member of IEEE and ACM and a fellow of IAPR. [Homepage]

Xiaogang Wang
received the PhD degree in Computer Science from Massachusetts Institute of Technology in 2009. He is an associate professor in the Department of Electronic Engineering at the Chinese University of Hong Kong. He received the Outstanding Young Researcher in Automatic Human Behavior Analysis Award in 2011, HK RGC Early Career Award in 2012, and Young Researcher Award of the Chinese University of Hong Kong. He was the area chair of ICCV 2011, ICCV 2015, ECCV 2014, 2016 and ACCV 2014. His research interests include computer vision, deep learning, crowd video surveillance, object detection, and face recognition. [Homepage]

Course Description


While vital for handling most multimedia and computer vision problems, collecting large scale fully annotated datasets is a resource-consuming, often unaffordable task. Indeed, on the one hand datasets need to be large and variate enough so that learning strategies can successfully exploit the variability inherently present in real data, but on the other hand they should be small enough so that they can be fully annotated at a reasonable cost. With the overwhelming success of (deep) learning methods, the traditional problem of balancing between dataset dimensions and resources needed for annotations became a full-fledged dilemma. In this context, methodological approaches able to deal with partially described data sets represent a one-of-a-kind opportunity to find the right balance between data variability and resource-consumption in annotation. These include methods able to deal with noisy, weak or partial annotations.

In this tutorial we will present several recent methodologies addressing different visual tasks under the assumption of noisy, weakly annotated data sets. Special emphasis will be given to methods based on deep architectures for unsupervised domain adaptation, low-rank modeling for learning in transductive settings and zero-shot learning. We will show how these approaches exhibit excellent performance in crucial tasks such as pedestrian detection or fine-grained visual recognition. Furthermore, we will discuss emerging application domains which are of great interest to the multimedia community and where handling noisy or missing information is essential. For instance, we will present recent works on multimodal complex scene analysis using wearable sensors, on the estimation of physiological signals from face videos in realistic conditions, and on the recognition of emotions elicited from abstract paintings.




  1. X. Alameda-Pineda, Y. Yan, E. Ricci, O. Lanz, N. Sebe, "Analyzing free-standing conversational groups: a multimodal approach." ACM Multimedia, 2015.
  2. S. Tulyakov, X. Alameda-Pineda, E. Ricci, L. Yin, J. F. Cohn, N. Sebe, "Self-Adaptive Matrix Completion for Heart Rate Estimation from Face Videos under Realistic Conditions", IEEE CVPR, 2016.
  3. X. Alameda-Pineda, E. Ricci, Y. Yan, N. Sebe, "Recognizing Emotions from Abstract Paintings using Non-Linear Matrix Completion", IEEE CVPR, 2016.
  4. T. Xiao, T. Xia, Y. Yang, C. Huang, and X. Wang, "Learning from Massive Noisy Labeled Data for Image Classification," IEEE CVPR 2015.
  5. X. Zeng, W. Ouyang, and X. Wang, "Deep Learning of Scene-Specific Classifier for Pedestrian Detection," ECCV 2014.
  6. Y. Fu, T. M. Hospedales, T. Xiang and S. Gong, Transductive Multi-view Zero-Shot Learning, IEEE PAMI 2015.