GithubHelp home page GithubHelp logo

wz0919 / etpnav Goto Github PK

View Code? Open in Web Editor NEW

This project forked from marsaki/etpnav

1.0 0.0 0.0 1.25 MB

Official Implementation of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"

License: MIT License

etpnav's Introduction

ETPNav for VLN-CE

Code of our paper "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments" [Paper]

🔥Winner of the RxR-Habitat Challenge in CVPR 2022. [Challenge Report] [Challenge Certificate]

Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments. It becomes increasingly crucial in the field of embodied AI, with potential applications in autonomous navigation, search and rescue, and human-robot interaction. In this paper, we propose to address a more practical yet challenging counterpart setting - vision-language navigation in continuous environments (VLN-CE). To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments. ETPNav performs online topological mapping of environments by self-organizing predicted waypoints along a traversed path, without prior environmental experience. It privileges the agent to break down the navigation procedure into high-level planning and low-level control. Concurrently, ETPNav utilizes a transformer-based cross-modal planner to generate navigation plans based on topological maps and instructions. The plan is then performed through an obstacle-avoiding controller that leverages a trial-and-error heuristic to prevent navigation from getting stuck in obstacles. Experimental results demonstrate the effectiveness of the proposed method. ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets, respectively.

Leadboard:

TODO's

  • Tidy and release the R2R-CE fine-tuning code.
  • Tidy and release the RxR-CE fine-tuning code.
  • Release the pre-training code.
  • Release the checkpoints.

Setup

Installation

Follow the Habitat Installation Guide to install habitat-lab and habitat-sim. We use version v0.1.7 in our experiments, same as in the VLN-CE, please refer to the VLN-CE page for more details. In brief:

  1. Create a virtual environment. We develop this project with Python 3.6.

    conda create -n etpnav python=3.6
    conda activate etpnav
  2. Install habitat-sim for a machine with multiple GPUs or without an attached display (i.e. a cluster):

    conda install -c aihabitat -c conda-forge habitat-sim=0.1.7 headless
  3. Clone this repository and install all requirements for habitat-lab, VLN-CE and our experiments. Note that we specify gym==0.21.0 because its latest version is not compatible with habitat-lab-v0.1.7.

    git clone [email protected]:MarSaKi/ETPNav.git
    cd ETPNav
    python -m pip install -r requirements.txt
    pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
  4. Clone a stable habitat-lab version from the github repository and install. The command below will install the core of Habitat Lab as well as the habitat_baselines.

    git clone --branch v0.1.7 [email protected]:facebookresearch/habitat-lab.git
    cd habitat-lab
    python setup.py develop --all # install habitat and habitat_baselines

Scenes: Matterport3D

Instructions copied from VLN-CE:

Matterport3D (MP3D) scene reconstructions are used. The official Matterport3D download script (download_mp.py) can be accessed by following the instructions on their project webpage. The scene data can then be downloaded:

# requires running with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/

Extract such that it has the form scene_datasets/mp3d/{scene}/{scene}.glb. There should be 90 scenes. Place the scene_datasets folder in data/.

Running

Training and Evaluation

Use release_r2r.bash and release_rxr.bash for Training/Evaluation/Inference with a single GPU or with multiple GPUs on a single node. Simply adjust the arguments of the bash scripts:

# for R2R-CE
CUDA_VISIBLE_DEVICES=0,1 bash run/release_r2r.bash train 12345  # training
CUDA_VISIBLE_DEVICES=0,1 bash run/release_r2r.bash eval 12345   # evaluation
CUDA_VISIBLE_DEVICES=0,1 bash run/release_r2r.bash inter 12345  # inference
# for RxR-CE
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run/release_rxr.bash train 12345  # training
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run/release_rxr.bash eval 12345   # evaluation
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run/release_rxr.bash inter 12345  # inference

Contact Information

Acknowledge

Our implementations are partially inspired by CWP, Sim2Sim and DUET.

Thanks for their great works!

Citation

If you find this repository is useful, please consider cite our work:

@article{an2023etpnav,
  title={ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments}, 
  author={An, Dong and Wang, Hanqing and Wang, Wenguan and Wang, Zun and Huang, Yan and He, Keji and Wang, Liang},
  journal={arXiv preprint arXiv:2304.03047}
  year={2023},
}

etpnav's People

Contributors

marsaki avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.