科研项目

项目列表


Activity Analysis and Understanding


行为分析与理解


Introduction

 

A Hierarchical Video Description for Complex Activity Understanding

 

We propose a hierarchical description of a complex activity video, referring to the “which” of activities, “what” of atomic actions, and “when” of atomic actions happening in the video. In our work, each complex activity is characterized as a composition of simple motion units (called atomic actions), and different atomic actions are explained by different video segments. We develop a latent discriminative structural model to detect the complex activity and atomic actions, while learning the temporal structure of atomic actions simultaneously.

 

Figure

 

 

Papers

 

1. Cuiwei Liu, Xinxiao Wu and Yunde Jia, A Hierarchical Video Description for Complex Activity Understanding, International Journal of Computer Vision, 2016, in press. [PDF]

 

 

Cross-View Action Recognition over Heterogeneous Feature Spaces

 

In cross-view action recognition, what you saw in one view is different from what you recognize in another view, since the data distribution even the feature space can change from one view to another. We address the problem of transferring action models learned in one view (source view) to another different view (target view), where action instances from these two views are represented by heterogeneous features. A novel learning method, called heterogeneous transfer discriminant-analysis of canonical correlations (HTDCC), is proposed to discover a discriminative common feature space for linking source view and target view to transfer knowledge between them. We additionally propose a weighting learning framework for multiple source views adaptation to effectively leverage action knowledge learned from multiple source views for the recognition task in the target view.

 

Figure

 

 

Papers

 

1. Xinxiao Wu, Han Wang, Cuiwei Liu and Yunde Jia. Cross-View Action Recognition over Heterogeneous Feature Spaces. IEEE Transactions on Image Processing, 24(11), 4096-4108, 2015. [PDF]

2. Xinxiao Wu, Han Wang, Cuiwei Liu and Yunde Jia. Cross-View Action Recognition over Heterogeneous Feature Spaces. IEEE International Conference on Computer Vision (ICCV2013), 2013. [PDF]

3. Xinxiao Wu and Yunde Jia. View-invariant action recognition using latent kernelized structural SVM. European Conference on Computer Vision, Firenze (ECCV2012), 2012. [PDF]

 

 

Action Recognition Using Multilevel Features

 

In this project, we first propose a new low-level visual feature, called spatio-temporal context distribution feature of interest points, to describe human actions. We additionally propose a novel mid-level class correlation feature to capture the semantic correlations between different action classes. Since human actions are often associated with some specific natural environments and also exhibit high correlation with particular scene classes and related objects, we build a high-level co-occurrence relationship among action classes, scene classes and object classes to discover the mutual contextual constraints among them.

 

Figure

 

 

Papers

 

1. Xinxiao Wu, Dong Xu, Lixin Duan, Jiebo Luo and Yunde Jia. Action Recognition Using Multilevel Features and Latent Structural SVM. IEEE Transactions on Circuits and Systems for Video Technology, 2013, 23(10): 1422-1431. [PDF]

2. Xinxiao Wu, Dong Xu, Lixin Duan and Jiebo Luo. Action recognition using context and appearance distribution features. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), 2011. [PDF]

3. Jing Liu, Xinxiao Wu and Yang Feng. Modeling the Relationship of Action, Object and Scene. International Conference on Pattern Recognition, 2014.

 

 

Joint Recognition and Localization of Actions in Videos

 

In this project, we develop a novel transfer latent support vector machine for joint recognition and localization of actions by using Web images and weakly annotated training videos. The model takes training videos which are only annotated with action labels as input for alleviating the laborious and time-consuming manual annotations of action locations. Since the ground-truth of action locations in videos are not available, the locations are modeled as latent variables in our method and are inferred during both training and testing phrases. For the purpose of improving the localization accuracy with some prior information of action locations, we collect a number of Web images which are annotated with both action labels and action locations to learn a discriminative model by enforcing the local similarities between videos and Web images. A structural transformation based on randomized clustering forest is used to map the Web images to videos for handling the heterogeneous features of Web images and videos.

 

Figure

 

 

Papers

1. Cuiwei Liu, Xinxiao Wu and Yunde Jia. Transfer Latent SVM for Joint Recognition and Localization of Actions in Videos, IEEE Transactions on Cybernetics, 2015. [PDF]

2. Cuiwei Liu, Xinxiao Wu and Yunde Jia. Weakly Supervised Action Recognition and Localization Using Web Images, Asian Conference on Computer Vision, 2014. [PDF]