最新消息

2016年01月08日学术报告:A Quest for Visual Commonsense: Scene Understanding by Functional and Physical Reasoning


添加时间:2016-01-07 14:35:08



 题目:A Quest for Visual Commonsense: Scene Understanding by Functional and Physical Reasoning

时间:2016年01月08日(周五),上午十点

报告人:Yibiao Zhao,MIT

地点:研究生楼101

 

Abstract: Computer vision has made significant progress in locating and recognizing objects in recent decades. However, it lacks the abilities to understand scenes characterizing human visual experience. Comparing with human vision, what is missing in current computer vision? One answer is that human vision is not only for pattern recognition, but also supports a rich set of commonsense reasoning about object function, scene physics, social intentions etc.

I build systems for real world applications and simultaneously pursuing a long-term goal of devising a unified framework that can make sense of an images and a scene by reasoning about the functional and physical mechanisms of objects in a 3D world. By bridging advances spanning fields of stochastic learning, computer vision, cognitive science, my research tackles following challenges:

(i) What is the visual representation? I develop stochastic grammar models to characterize spatiotemporal structures of visual scenes and events. The analogy of human natural language lays a foundation for representing both visual structure and abstract knowledge.

(ii) How to reason about the commonsense knowledge? I augment the commonsense knowledge about functionality, physical stability to the grammatical representation. The bottom-up and top-down inference algorithms are designed for finding a most plausible interpretation of visual stimuli.

(iii) How to acquire commonsense knowledge? I performed three case studies to acquire different kinds of commonsense knowledge: I teach the computer to learn affordance from observing human actions; to learn tool-use from single one-shot demonstration; and to infer containing relations by physical simulation without explicit training process.

Such sophisticated understanding of 3D scenes enables computer vision to reason, predict, interact with the 3D environment, as well as hold intelligent dialogues beyond visible spectrum.

 

Yibiao Zhao received a PhD degree from University of California, Los Angeles (UCLA).He is currently a postdoctoral research associate at Massachusetts Institute of Technology (MIT). His research interests include computer vision, cognitive modeling, cognitive robotics, statistical learning and inference. He work on cognitive robots for understanding 3D physical scenes and collaborating with humans. He is the co-chair of the series of Int’l Workshops on Vision Meets Cognition: Functionality, Physics, Intents and Causality at CVPR 2014, CVPR 2015 and CogSci2015.



作者:宋浩