Geometry-Informed Distance Candidate Selection for Omnidirectional Stereo Vision

This is the project page of the ICRA submission, “Geometry-Informed Distance Candidate Selection for Adaptive Lightweight Omnidirectional Stereo Vision with Fisheye Images”. For code, dataset, and the pre-trained models, please refer to the GitHub page.

ICRA '24 Submission Video.

Overview

Depth perception is essential for mobile robots in navigation and obstacle avoidance. LiDAR devices are commonly used due to their accuracy and speed, but they have mechanical complexities and can be costly. Current multiview stereo (MVS) vision methods with fisheye images are either not robust enough or cannot run in real-time. Many learning-based methods are very computationally-expensive due to the cost volume approach.

The study introduces Geometry-Informed (GI) distance candidates which can be integrated into many omnidirectional MVS models that use a fixed number of distance candidates. GI candidates aim to improve depth estimation accuracy and decrease the inference time by reducing the number of candidates needed.

Additionally, GI candidates convey an interesting property onto MVS models. The GI candidate distribution is dependent on baseline distance. So as the baseline distance between reference and query cameras changes, so does the GI candidates chosen. We show that this dependency allows camera locations can be adjusted post-training and the model still can perform well by adjusting the GI candidates to the new distribution.

Sample results on the Scene Flow dataset — Geometry-Informed (GI) Candidate Distribution compared to even candidate spacing in the inverse distance space.

100k-Samples Multiview Stereo Vision Dataset

The dataset consists of 112,344 samples collected from 68 environments. Each sample consists of RGB fisheye images and ground truth depth for each of the three cameras in our specific configuration. Our configuration uses an evaluation board with three fisheye cameras in a triangular configuration. Additionally, each sample has a ground truth RGB and distance map panorama at the reference camera’s location.

The dataset will be released soon on our project github page.

More Results

Point cloud representation of real-world inferencing.

Results show a comparison of synthetically-generated images from unknown environments. The Geometry-Informed (GI) candidates distribution improves depth candidate selection. Using GI candidates leads to:

Better accuracy in distance estimation
Faster model performance

An additional benefit of the GI candidates is their ability to accommodate changes in the distance between cameras after training without re-training.

The following table presents performance metrics across various models:

Model	Candidate Type	Number	MAE	RMSE	SSIM	Time (ms)	GPU Start (MB)	GPU Peak (MB)
RTSS[1]	EV	32	0.053	0.101	0.776	144	330	4240
E8	EV	8	0.013	0.032	0.862	65	790	1030
G8	GI	8	0.012	0.029	0.867	65	790	1030
E16	EV	16	0.011	0.028	0.876	111	790	1230
G16	GI	16	0.010	0.028	0.877	111	790	1230
G16V	GI	16	0.013	0.029	0.861	111	790	1230
G16VV	GI	16	0.012	0.028	0.872	114	800	1090

Pre-trained models will be released soon on our github project page.

Manuscript

Please refer to this link.

@misc{hu2021orstereo,
      title={Geometry-Informed Distance Candidate Selection for Adaptive Lightweight Omnidirectional Stereo Vision with Fisheye Images}, 
      author={Conner Pulling and Je Hon Tan and Yaoyu Hu and Sebastian Scherer},
      year={2023},
      primaryClass={cs.CV}
}

Contact

Conner Pulling: (cpulling [at] andrew [dot] cmu [dot] edu)
Yaoyu Hu: (yaoyuh [at] andrew [dot] cmu [dot] edu)
Sebastian Scherer: (basti [at] cmu [dot] edu)

Acknowledgments

This work was supported by Singapore’s Defence Science and Technology Agency (DSTA).