Using @rerun, I established a baseline from the HoCAP dataset and conducted a qualitative comparison among the ground-truth calibrated cameras, Dust3r, and VGGT. The improvements are evident in both the camera parameters and the multi-view depth map/point cloud.
I will eventually add quantitative comparisons, such as Relative Rotation Accuracy (RRA) and Relative Translation Accuracy (RTA).
I’m one step closer to developing a pipeline that integrates two iPhones and a Quest 3. Camera calibration was a major hurdle that has now been conquered! I believe that the egocentric (first-person) perspective is critical for achieving dataset collecting at scale, while the exocentric (third-person) perspective will also be crucial for accuracy—especially when addressing the occlusions that can arise during fine-grained hand and object interactions.
you can find the integrated VGGT calibration code here - https://github.com/rerun-io/pi0-lerobot?tab=readme-ov-file#c...