GithubHelp home page GithubHelp logo

ziadalh / epc-ssl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from srama2512/epc-ssl

0.0 0.0 0.0 3.31 MB

Environment Predictive Coding for Visual Navigation. ICLR 2022.

Home Page: https://vision.cs.utexas.edu/projects/epc/

License: MIT License

Python 100.00%

epc-ssl's Introduction

Environment Predictive Coding for Visual Navigation

This repository contains a Pytorch implementation of our ICLR 2022 paper:

Environment Predictive Coding for Visual Navigation
Santhosh Kumar Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman
UT Austin, Meta AI

Project website: https://vision.cs.utexas.edu/projects/epc/

Abstract

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning for individual images, we aim to encode a 3D environment using a series of images observed by an agent moving in it. We learn these representations via a masked-zone prediction task, which segments an agent’s trajectory into zones and then predicts features of randomly masked zones, conditioned on the agent’s camera poses. This explicit spatial conditioning encourages learning representations that capture the geometric and semantic regularities of 3D environments. We learn such representations on a collection of video walkthroughs and demonstrate successful transfer to multiple downstream navigation tasks. Our experiments on the real-world scanned 3D environments of Gibson and Matterport3D show that our method obtains 2 - 6× higher sample-efficiency and up to 57% higher performance over standard image-representation learning.

Installation

Create a conda environment.

conda create -n epc_ssl python=3.8.5

Download this repo and set path the repository root to $EPC_CODE_ROOT. Install pip dependencies.

pip install torch==1.8.1+cu101 torchvision==0.9.1+cu101 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

Update the submodules.

git submodule update --init --recursive

Install habitat-lab and habitat-sim.

cd $EPC_CODE_ROOT/dependencies/habitat-lab
pip install -r requirements.txt
python setup.py develop --all

cd $EPC_CODE_ROOT/dependencies/habitat-sim
pip install -r requirements.txt
python setup.py install --headless --with-cuda

Add repo to PYTHONPATH.

export PYTHONPATH=$PYTHONPATH:$EPC_CODE_ROOT

Generating video walkthroughs

Download Gibson GLB files for habitat from here. Create a symlink to the dataset at $EPC_CODE_ROOT/data/scene_datasets/gibson.

cd $EPC_CODE_ROOT
mkdir -p data/scene_datasets
cd data/scene_datasets
ln -s <GIBSON GLB DIRECTORY> gibson

Download exploration trajectories for Gibson scenes

cd $EPC_CODE_ROOT
mkdir -p data/walkthroughs/strong_exploration
mkdir -p data/walkthroughs/weak_exploration

cd $EPC_CODE_ROOT/data/walkthroughs/strong_exploration
wget -O strong_exploration_trajectories.tar.gz https://utexas.box.com/shared/static/o18q3erc5hoiqm8rnr4e0ogjliseeoig.gz
tar -xzf strong_exploration_trajectories.tar.gz && rm strong_exploration_trajectories.tar.gz

cd $EPC_CODE_ROOT/data/walkthroughs/weak_exploration
wget -O weak_exploration_trajectories.tar.gz https://utexas.box.com/shared/static/5j7m5r8mq28cbc2i1gt2e2bbgzn9ayqe.gz
tar -xzf weak_exploration_trajectories.tar.gz && rm weak_exploration_trajectories.tar.gz

Generate walkthrough videos corresponding to the exploration trajectories.

python epc/walkthrough_generation/generate_walkthroughs.py \
    --infos-root ./data/walkthroughs/strong_exploration/

python epc/walkthrough_generation/generate_walkthroughs.py \
    --infos-root ./data/walkthroughs/weak_exploration/

Training image-level encoders

We first perform self-supervised pre-training of image-encoders using MoCo.

Extract images from walkthrough videos.

cd $EPC_CODE_ROOT

python epc/moco_pretraining/convert_video_to_frames.py \
    --walkthroughs-dir ./data/walkthroughs/strong_exploration/

Pre-train image encoders using MoCo. We provide a pre-trained model here.

mkdir -p data/experiments/moco/strong_exploration

python -u epc/moco_pretraining/train_moco.py \
    --config-path epc/moco_pretraining/configs/train_config.yaml \
    --lr 0.03 --batch-size 128 --dist-url 'tcp://localhost:13513' \
    --mlp --moco-t 0.2 --epochs 20000 --schedule 12000 16000 \
    --multiprocessing-distributed --world-size 1 --rank 0 \
    --save-interval 200 \
    --frames-per-video 4 \
    --save-dir data/experiments/moco/strong_exploration \
    data/walkthroughs/strong_exploration/

Training EPC encoder

We then train the environment-level encoder using EPC masked-zone prediction.

Extract pre-trained MoCo image features for all video frames.

cd $EPC_CODE_ROOT

mkdir -p data/walkthrough_features/strong_exploration
mkdir pretrained_models
cd pretrained_models/
wget -O moco_encoder.pth.tar https://utexas.box.com/shared/static/mocu24bxftvj7ricupvzm7skrs32kdmk.tar

cd $EPC_CODE_ROOT
python -u epc/epc_pretraining/extract_video_features.py \
    --exp-config epc/epc_pretraining/configs/extract_features_config.yaml \
    --num-processes-per-gpu 8 \
    --save-root data/walkthrough_features/strong_exploration

Train EPC model using the video dataset of pre-extracted image features.

cd $EPC_CODE_ROOT
mkdir -p experiments/epc/strong_exploration/

python -u epc/epc_pretraining/train_epc.py \
    SAVE_DIR experiments/epc/strong_exploration/ \
    DATA_ROOT data/walkthrough_features/strong_exploration/ \
    VISUAL_ENCODER.RESNET.ENCODER.pretrained_path "pretrained_models/moco_encoder.pth.tar"

Pre-trained models

We provide the pre-trained models for our primary experiments here: Area coverage, Flee, Object visitation, RoomNav. These include checkpoints for both pre-trained encoders (for SMT-video, EPC), and RL trained policies (for all methods).

RL training + evaluation

Coming soon.

Acknowledgements

In our work, we used parts of Habitat-Lab and Habitat-Sim.

Citation

If you find this codebase useful, please cite us:

@inproceedings{
ramakrishnan2022environment,
title={Environment Predictive Coding for Visual Navigation},
author={Santhosh Kumar Ramakrishnan and Tushar Nagarajan and Ziad Al-Halah and Kristen Grauman},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=DBiQQYWykyy}
}

License

This project is released under the MIT license, as found in the LICENSE file.

epc-ssl's People

Contributors

srama2512 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.