-
Algebraic Approaches and Deep Neural Models for 3D Scene Reconstruction and Camera Pose Estimation in Static and Dynamic Environments
Abstract:
This talk will explore advances in 3D scene reconstruction, focusing on approaches to estimate camera poses and scene structures in challenging multiview and dynamic content scenarios. First, I will outline foundational aspects of my earlier work, where we characterized the algebraic structure of fundamental and essential matrices in multiview settings and developed deep learning methods for joint recovery of camera parameters and sparse 3D scene structures. The main part of the talk introduces TracksTo4D (NeurIPS 2024), a novel, efficient method for reconstructing dynamic 3D structures and camera motion from casual videos. TracksTo4D leverages a dedicated encoder, trained in an unsupervised way on a dataset of casual videos, that uses 2D point tracks as input to infer dynamic 3D structures and camera motion. Our architecture takes into account symmetries in the problem, enforces the reconstruction to be of low rank, and models both static and dynamic scene components. Our model demonstrates strong generalization to unseen videos from new categories, achieving accurate 3D reconstruction and camera localization through a single feed-forward pass while drastically reducing running times.