Multimodal human behaviour analysis in the wild

Recent advances and open problems

Tutorial course at IEEE ICPR 2016
On Sunday December 4th 2016 from 14:00 to 18:00.


Homepage of Xavier Alameda-Pineda Homepage of Elisa Ricci Homepage of Nicu Sebe

Xavier Alameda-Pineda
received his PhD from INRIA and University of Grenoble in 2013. He was a post-doctoral researcher at CNRS/GIPSA-Lab and at the University of Trento, in the Multimodal Human Understanding Group. He is a research scientist t the Perception team, at INRIA, working on signal processing and machine learning for scene and behavior understanding using multimodal data. He is the ACM Multimedia 2015 best paper prize winner, member of ACM SIGMM and of IEEE. [Homepage]

Elisa Ricci
is a researcher at FBK and an assistant professor at the University of Perugia. She received her PhD from the University of Perugia in 2008. She has been a post-doctoral researcher at Idiap and FBK, Trento and a visiting researcher at the University of Bristol. Her research interests are directed along developing machine learning algorithms for video scene analysis, human behavior understanding and multimedia content analysis. She is area chair of ACM MM 2016 and of ECCV 2016. [Homepage]

Nicu Sebe
is a full professor in the University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human behavior understanding. He was a General Co-Chair of FG 2008 and ACM MM 2013, and a program chair of CIVR 2007 and 2010, and ACM MM 2007 and 2011. He is a program chair of ECCV 2016 and ICCV 2017. He is a senior member of IEEE and ACM and a fellow of IAPR. [Homepage]

Course Description

The automated analysis of human behavior in unstructured scenarios has many potential applications in health care, conflict and people management, sociology, marketing and surveillance. It is therefore unsurprising that many researchers invested efforts into developing computational approaches able to automatically describe the behavior of a group or an individual. Generally speaking, the extraction of high-level information (e.g. emotional states, personality traits) is unfeasible if the low- and mid-level feature retrieving methods used are not robust and accurate. In this tutorial we will describe recent research efforts in the area of unstructured social scene analysis. Special emphasis will be given to recent approaches combining signal processing and machine learning to robustly extract crucial low and middle level information. We will consider tasks such as: (i) head and body pose estimation from multiple sensors, (ii) free-standing conversational group detection, (iii) audio-visual speaker detection, (iv) separation of moving sound sources and (v) estimation of physiological signals from visual data. An overall description of the extent of the current research, its limitations and promising future work lines will conclude the session. The content of the proposed tutorial lies in the intersection of three of the main topic areas of ICPR 2016, namely: pattern recognition and machine learning, computer and robot vision and image, speech, signal and video processing.