Stitch4D: Sparse Multi-Location 4D Urban Reconstruction via Spatio-Temporal Interpolation

1Keio University, 2The University of Tokyo
*Equal contribution
Comparison with existing methods. Existing approaches optimize each camera location independently and suffer from blur and geometric inconsistency in sparse multi-location settings. In contrast, Stitch4D reconstructs a unified 4D representation across locations.

Comparison with existing methods. Existing approaches optimize each camera location independently and suffer from blur and geometric inconsistency in sparse multi-location settings (top). In contrast, Stitch4D stitches spatially separated observations into a unified 4D representation (bottom).

Reconstruction Videos

Free-viewpoint Rendering — Urban Area 1

Free-viewpoint Rendering — Urban Area 2

Scalability to Multiple Videos

Stitch4D scales to an arbitrary number of input videos, enabling 4D reconstruction over larger spatial regions by stitching additional panoramic observations.

Free-viewpoint Rendering — Urban Area 1 (Additional)

Free-viewpoint Rendering — Urban Area 2 (Additional View 1)

Free-viewpoint Rendering — Urban Area 2 (Additional View 2)

Abstract

Dynamic urban environments are often captured by cameras placed at spatially separated locations with little or no view overlap. However, most existing 4D reconstruction methods assume densely overlapping views. When applied to such sparse observations, these methods fail to reconstruct intermediate regions and often introduce temporal artifacts. To address this practical yet underexplored sparse multi-location setting, we propose Stitch4D, a unified 4D reconstruction framework that explicitly compensates for missing spatial coverage in sparse observations. Stitch4D (i) synthesizes intermediate bridge views to densify spatial constraints and improve spatial coverage, and (ii) jointly optimizes real and synthesized observations within a unified coordinate frame under explicit inter-location consistency constraints. By restoring intermediate coverage before optimization, Stitch4D prevents geometric collapse and reconstructs coherent geometry and smooth scene dynamics even in sparsely observed environments. To evaluate this setting, we introduce Urban Sparse 4D (U-S4D), a CARLA-based benchmark designed to assess spatiotemporal alignment under sparse multi-location configurations. Experimental results on U-S4D show that Stitch4D surpasses representative 4D reconstruction baselines and achieves superior visual quality.

SP4DR: Sparse Multi-Location 4D Reconstruction

Example of the SP4DR problem.

Method

Overall architecture of Stitch4D

Overall architecture of Stitch4D.

Overview of MVBM

Multi-View Bridging Module (MVBM).

Structural overview of MVJOM

Multi-Video Joint Optimization Module (MVJOM).

U-S4D Benchmark

Overview of the U-S4D benchmark

Quantitative Results

Full Reconstruction Setting

Method Trajectory Interpolation Seen-viewpoints
PSNR [dB] ↑ SSIM ↑ LPIPS ↓ PSNR [dB] ↑ SSIM ↑ LPIPS ↓
4DGS 11.51 0.28 0.84 15.79 0.58 0.84
SpacetimeGS 13.25 0.54 0.67 17.97 0.79 0.32
FreeTimeGS 11.90 0.50 0.76 16.77 0.71 0.42
Stitch4D (Ours) 15.81 0.59 0.50 25.62 0.92 0.14

Temporal Split Setting

Method Trajectory Interpolation Seen-viewpoints
PSNR [dB] ↑ SSIM ↑ LPIPS ↓ PSNR [dB] ↑ SSIM ↑ LPIPS ↓
4DGS 10.54 0.25 0.80 13.78 0.52 0.64
SpacetimeGS 13.02 0.53 0.68 17.42 0.77 0.34
FreeTimeGS 11.94 0.50 0.76 16.22 0.69 0.43
Stitch4D (Ours) 15.53 0.58 0.51 24.12 0.90 0.15

Qualitative Results

Trajectory Interpolation (Full Reconstruction)

Qualitative results for trajectory interpolation

Temporal Split (Seen-viewpoints) — Urban Area 1

Qualitative results for temporal split UA1

Temporal Split (Seen-viewpoints) — Urban Area 3

Qualitative results for temporal split UA3

Additional Qualitative Results

Trajectory Interpolation (Additional)

Additional trajectory interpolation

Seen-viewpoints — Urban Area 1 (Additional)

UA1 seen viewpoints

Seen-viewpoints — Urban Area 2

UA2 seen viewpoints

Free-viewpoint Trajectory — Urban Area 1

Rotateshow UA1

Free-viewpoint Trajectory — Urban Area 2

Rotateshow UA2

Full Trajectory — Urban Area 1

UA1 trajectory

Full Trajectory — Urban Area 2

UA2 trajectory

Three-Input Reconstruction — Rotateshow

3-input rotateshow

Three-Input Reconstruction — LBRF

3-input LBRF

BibTeX

@article{kogure2026stitch4d,
  title={Stitch4D: Sparse Multi-Location 4D Urban Reconstruction via Spatio-Temporal Interpolation},
  author={Kogure, Hina and Katsumata, Kei and Miyanishi, Taiki and Sugiura, Komei},
  journal={arXiv preprint arXiv:2604.07923},
  year={2026}
}