EAGLE Icon: Efficient Adaptive Geometry-based Learning in Cross-view Understanding

1CVIU Lab, University of Arkansas     2Google DeepMind     3Dep. of BAEG, University of Arkansas     4Carnegie Mellon University     5Mohammed bin Zayed University of AI    
6Dep. of EECS, University of Arkansas     7Dep. of FDSC, University of Arkansas    

🎉 Accepted to NeurIPS 2024 🎉

Highlights

  • We introduce a novel Efficient Adaptive Geometry-based Learning (EAGLE) to Unsupervised Cross-view Adaptation that can adaptively learn and improve the performance of semantic segmentation models across camera viewpoints.
  • We introduce a new Geodesic Flow-based Metric to measure the structural changes across views via their manifold structures.
  • We introduce a new view-condition prompting to further improve the prompting mechanism of the open-vocab segmentation network in cross-view adaptation learning.
fail

Abstract

Unsupervised Domain Adaptation has been an efficient approach to transferring the semantic segmentation model across data distributions. Meanwhile, the recent Open-vocabulary Semantic Scene understanding based on large-scale vision language models is effective in open-set settings because it can learn diverse concepts and categories. However, these prior methods fail to generalize across different camera views due to the lack of cross-view geometric modeling. At present, there are limited studies analyzing cross-view learning. To address this problem, we introduce a novel Unsupervised Cross-view Adaptation Learning approach to modeling the geometric structural change across views in Semantic Scene Understanding. First, we introduce a novel Cross-view Geometric Constraint on Unpaired Data to model structural changes in images and segmentation masks across cameras. Second, we present a new Geodesic Flow-based Correlation Metric to efficiently measure the geometric structural changes across camera views. Third, we introduce a novel view-condition prompting mechanism to enhance the view-information modeling of the open-vocabulary segmentation network in cross-view adaptation learning. The experiments on different cross-view adaptation benchmarks have shown the effectiveness of our approach in cross-view modeling, demonstrating that we achieve State-of-the-Art (SOTA) performance compared to prior unsupervised domain adaptation and open-vocabulary semantic segmentation methods.

Qualitative Results

The Qualitative Results of Cross-View Adaptation (Without Prompt). fail
The Qualitative Results of Cross-View Adaptation (With Prompt). fail
Results of Segmenting Cars, Trees, Persons. (A) Input, (B) FreeSeg, and (C) EAGLE. fail

Experimental Results

Unsupervised Domain Adaptation
fail

Open-vocab Semantic Segmentation
fail

Open-vocab Semantic Segmentation on Unseen Classes
fail

Real-to-Real Cross-view Adaptation Setting
fail

BibTex

@inproceedings{truong2024eagle,
      title={EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding},
      author={Thanh-Dat Truong and Utsav Prabhu and Dongyi Wang and Bhiksha Raj and Susan Gauch and Jeyamkondan Subbiah and Khoa Luu},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
      year={2024}
    }