Pixel-Perfect Structure-From-Motion With Featuremetric Refinement

Gespeichert in:
Bibliographische Detailangaben
Titel: Pixel-Perfect Structure-From-Motion With Featuremetric Refinement
Autoren: Sarlin, Paul Edouard, Lindenberger, Philipp, Larsson, Viktor, Pollefeys, Marc
Weitere Verfasser: Lund University, Faculty of Engineering, LTH, LTH Profile areas, LTH Profile Area: AI and Digitalization, Lunds universitet, Lunds Tekniska Högskola, LTH profilområden, LTH profilområde: AI och digitalisering, Originator, Lund University, Faculty of Science, Centre for Mathematical Sciences, Research groups at the Centre for Mathematical Sciences, Mathematical Imaging Group, Lunds universitet, Naturvetenskapliga fakulteten, Matematikcentrum, Forskargrupper vid Matematikcentrum, Mathematical Imaging Group, Originator, Lund University, Faculty of Science, Centre for Mathematical Sciences, Mathematics (Faculty of Engineering), Computer Vision and Machine Learning, Lunds universitet, Naturvetenskapliga fakulteten, Matematikcentrum, Matematik LTH, Datorseende och maskininlärning, Originator, Lund University, Profile areas and other strong research environments, Lund University Profile areas, LU Profile Area: Natural and Artificial Cognition, Lunds universitet, Profilområden och andra starka forskningsmiljöer, Lunds universitets profilområden, LU profilområde: Naturlig och artificiell kognition, Originator, Lund University, Profile areas and other strong research environments, Strategic research areas (SRA), ELLIIT: the Linköping-Lund initiative on IT and mobile communication, Lunds universitet, Profilområden och andra starka forskningsmiljöer, Strategiska forskningsområden (SFO), ELLIIT: the Linköping-Lund initiative on IT and mobile communication, Originator
Quelle: IEEE Transactions on Pattern Analysis and Machine Intelligence. 47(5):3298-3309
Schlagwörter: Natural Sciences, Computer and Information Sciences, Computer graphics and computer vision, Naturvetenskap, Data- och informationsvetenskap (Datateknik), Datorgrafik och datorseende
Beschreibung: Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this article, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.
Zugangs-URL: https://doi.org/10.1109/TPAMI.2023.3237269
Datenbank: SwePub
Beschreibung
Abstract:Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this article, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.
ISSN:01628828
DOI:10.1109/TPAMI.2023.3237269