最新消息

2017年9月22日学术报告:Towards human behavior understanding from first-person vision


添加时间:2017-09-20 17:18:34



报告题目:Towards human behavior understanding from first-person vision

时间:2017年9月22日(星期五),下午2点

报告人:Mariella Dimiccoli

地点:研究生教学楼405

 

BIO:

Mariella Dimiccoli received the M.S, degree in computer engineering from the Polytechnic University of Bari, Italy, in 2004 and the M.A.S. and Ph.D. degrees in signal theory and communications from the Technical University of Catalonia, Spain, in 2006 and 2009, respectively.  She is currently with the Department of Mathematics and Computer Science at the University of Barcelona, where she is a member of the Consolidated Research Group ”Computer Vision at the University of Barcelona” (CVUB). She is also an associate researcher at the Computer Vision Centre, Spain, that she joined with a Marie Curie fellowship in 2014.  Her  work spans multiple research areas within computational perception, image processing, and computer vision.  Her current research focus is on first-person vision and visual lifelogging. She regularly serves as program committee for several international conferences and workshops in the areas of image processing and computer vision and as reviewer for international journals such as IEEE Transactions on Human Machine Systems and IEEE Transactions on Multimedia. She served as guest editor for a special issue on the Journal of Visual Communication and Image Representation.

 

Abstract:

In this talk, I will discuss some recent results on visual understanding of everyday subject or object-related actions captured by a wearable camera. In particular, I will focus on the problems of 1) activity recognition from videos, 2) social pattern characterization from photo-streams.

 

Our proposed approach for activity recognition builds on the observation that egocentric actions are always accompanied by contextual cues, such as the position of the hands, the identity and position of the objects surrounding them and the way they interact with those objects. We propose a CNN architecture able to exploit both contextual features and temporal evolution, without relying on motion information.

 

Second, we propose a complete pipeline for characterizing the face-to-face interactions of a person from photo-streams captured by a wearable camera during a few consecutive weeks. Building on the sociological concept of F-formation, we detect people with whom the camera wearer is interacting and characterize the type of interactions into formal and informal ones by using both environmental and emotional cues in a LSTM framework. Additionally, by leveraging a new face clustering technique, we identify the people with whom the camera wearer has been interacting, with the aim of characterizing the frequency and diversity of the individual’s social interactions.



作者:宋浩