SALON: Self-supervised Adaptive Learning for Off-road Navigation

Matthew Sivaprakasam¹, Samuel Triest¹, Cherie Ho¹, Shubhra Aich¹, Jeric Lew², Isaiah Adu³, Wenshan Wang¹, Sebastian Scherer¹,

¹Carnegie Mellon University, ²National University of Singapore, ³Pennsylvania State University

Paper arXiv Video Code

staying on trail while avoiding uncertain objects such as tires

preference for smooth vegetation, without human labels

quick adaptation to novel terrain (cobblestone)

easy deployment on other robots

with one prior hand-label, our system running SALON learns from its own experience in the real-world to predict where and how fast to drive.

Overview

Robot autonomy in off-road environments presents a number of challenges that make it difficult to handcraft navigation heuristics that are robust to diverse scenarios. While learned methods using hand labels or self-supervised data improve generalizability, they often require a tremendous amount of data and can be vulnerable to domain shifts. To improve generalization in novel environments, recent works have incorporated adaptation and self-supervision to develop autonomous systems that can learn from their own experiences online. However, they often rely on significant prior data, for example minutes of human teleoperation data, which is difficult to scale to more environments and robots. To address these limitations, we propose SALON, a perception-action framework for fast adaptation of traversability estimates with minimal human input. SALON rapidly learns online from experience while avoiding out of distribution terrain to produce adaptive and risk-aware cost and speed maps. Within seconds of collected experience, our results demonstrate comparable navigation performance over kilometer-scale courses in diverse off-road terrain as methods trained on 100-1000x more data. We additionally show promising results on significantly different robots in different environments.

VFMs + Proprioception + Smart Buffer Management = Fast Adaptation

Using visual foundation models (such as DINOv2) as feature extractors is key to our approach. By grounding their generalizable features with proprioceptive feedback, robots can quickly adapt their understanding of the world through their own experiences without a human in the loop.

Fast Adaptation Example

Example of SALON's fast adaptation: Within 10 seconds of experiencing grass for the first time, SALON is able to quickly differentiate key terrains, such as, ideal short grass, riskier vegetation and lethal trees.

Autonomy Results

We run our autonomy experiments on a Yamaha Viking All-Terrain Vehicle, with two courses shown below. Course 1 consists of waypoints spaced 50m apart, and Course 2 consists of waypoints with varying spacing up to 200m. The VLAD clusters used for feature generation were computed from sample images collected from the "training data" zone. For each run, the system is initialized with no prior environment interaction data, and a single high-cost tree label from the training data area.

SALON is able to not only avoid lethal vegetation but also distinguish fine-grained terrain properties. Rough gravel in the middle of the trail below is higher cost than the smoother areas around it.

Prediction of speedmaps allows the system to go faster where appropriate. As seen below, it is predicted that the robot can drive faster on trail than in grass.

We show our method on 5 laps of course 1 against 4 other baselines. With no prior experience except for one hand-label, it quickly adapts and by the second lap already demonstrates comparable performance to Velociraptor (the prior state of the art).

Robot+Sensor Generalizability

Evaluation on wheelchair in urban environment: After driving over rough cobblestone, the system quickly recognizes within 5 seconds that it is much rougher than the smooth sidewalk.

With the same amount of data as Wild Visual Navigation, our method is able to correctly cost lethal objects like trees and walls without incorrectly costing short grass. Like WVN, we leverage only use visual features, and geometric information is used only to place them in the map.