- 5 views
Wednesday, January 20, 2016 - 01:00 pm
Swearingen 3A75
Candidate: Xiaochuan Fan
Advisor: Dr. Song Wang
Date: January 20, 2016
Time: 1:00 P.M.
Place: Swearingen 3A75
Abstract
In this research, we mainly focus on the problem of estimating 2D and 3D human poses from monocular images. Different from many previous works, neither our 2D nor 3D pose estimation approaches uses hand-crafted graphical model. Instead, our approaches learn the knowledge on human body using machine learning techniques.
Reconstructing 3D human poses from a single set of 2D locations is an ill-posed problem without considering the human body model. In this research, we propose a new approach, namely pose locality constrained representation (PLCR), to model the 3D human body and use it to improve 3D human pose estimation. In this approach, PLCR utilizes a block-structural pose dictionary to explicitly encourage pose locality in human-body modeling. Finally, PLCR is combined into the matching-pursuit based algorithm for 3D human-pose estimation. The 2D locations used by our 3D pose estimation approach may come from manual annotation or estimated 2D poses. This research proposes a new learning-based 2D human pose estimation approach based on a Dual-Source Deep Convolutional Neural Networks (DS-CNN). The proposed DS-CNN model takes a set of category-independent object proposals detected from the image as the input and then learns the appearance of each local part by considering their holistic views in the full body. We also develop an algorithm to combine these results from all the object proposals for estimating the 2D human pose. The experimental results shows that our PLCR-based 3D pose estimation approach outperforms the state-of-the-art algorithm based on the standard sparse representation and physiological regularity in reconstructing a variety of 3D human poses from both synthetic data and real images. Furthermore, the proposed DS-CNN model produces superior or comparable performance against the state-of-the-art 2D human-pose estimation methods based on pose priors that are estimated from physiologically inspired graphical models or learned from a holistic perspective.
Surprised by CNN's power shown in our 2D human pose estimation approach and many other computer vision tasks, we are interested on such a question, if we can discover new knowledge from a CNN model? In this research, we evaluate the impact of all image regions and then show that different regions have different impact and the regions with large impact can provide important cue or signature for a given computer vision task. Note that this cue is not included in the ground truth of training samples. So we consider the signature regions as an interesting representation of new knowledge.