Date of Award


Embargo Period


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Robotics Institute


Fernando De la Torre


Many computer vision problems, such as object classification, motion estimation or shape registration rely on solving the correspondence problem. Existing algorithms to solve spatial or temporal correspondence problems are usually NP-hard, difficult to approximate, lack flexible models and mechanism for feature weighting. This proposal addresses the correspondence problem in computer vision, and proposes two new spatio-temporal correspondence problems and three algorithms to solve spatial, temporal and spatio-temporal matching between video and other sources. The main contributions of the thesis are: (1) Factorial graph matching (FGM). FGM extends existing work on graph matching (GM) by finding an exact factorization of the affinity matrix. Four are the benefits that follow from this factorization: (a) There is no need to compute the costly (in space and time) pairwise affinity matrix; (b) It provides a unified framework that reveals commonalities and differences between GM methods. Moreover, the factorization provides a clean connection with other matching algorithms such as iterative closest point; (c) The factorization allows the use of a path-following optimization algorithm, that leads to improved optimization strategies and matching performance; (d) Given the factorization, it becomes straight-forward to incorporate geometric transformations (rigid and non-rigid) to the GM problem. (2) Canonical time warping (CTW). CTW is a technique to temporally align multiple multi-dimensional and multi-modal time series. CTW extends DTW by incorporating a feature weighting layer to adapt different modalities, allowing a more flexible warping as combination of monotonic functions, and has linear complexity (unlike DTW that has quadratic). We applied CTW to align human motion captured with different sensors (e.g., audio, video, accelerometers). (3) Spatio-temporal matching (STM). Given a video and a 3D motion capture model, STM finds the correspondence between subsets of video trajectories and the motion capture model. STM is efficiently and robustly solved using linear programming. We illustrate the performance of STM on the problem of human detection in video, and show how STM achieves state-of-the-art performance.