GithubHelp home page GithubHelp logo

intuitionmachine / amdim-public Goto Github PK

View Code? Open in Web Editor NEW

This project forked from philip-bachman/amdim-public

0.0 2.0 0.0 111 KB

Public repo for Augmented Multiscale Deep InfoMax representation learning

License: MIT License

Python 100.00%

amdim-public's Introduction

Learning Representations by Maximizing Mutual Information Across Views

Introduction

AMDIM (Augmented Multiscale Deep InfoMax) is an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context.

Our paper describing AMDIM is available at: https://arxiv.org/abs/1906.00910.

Note

We are in the process of cleaning-up and porting over code from our working repo, which is still in flux, so performance of models from this repo will be variable for a little while. We will add checkpoints for some strong models trained on ImageNet via self-supervision. Also, contact me (Phil) for help with adding the 3rd-party data augmentation -- it's pretty quick to get working.

Main Results

Results of AMDIM compared to other methods when evaluating accuracy using a linear classifier trained on top of representations provided by the self-supervised encoder:

Method ImageNet Places205
Rotation [1] 55.4 48.0
Exemplar [1] 46.0 42.7
Patch Offset [1] 51.4 45.3
Jigsaw [1] 44.6 42.2
CPC - big [1] 48.7 n/a
CPC - huge [2] 61.0 n/a
AMDIM 68.1 55.0

[1]: Results from Kolesnikov et al. [2019].
[2]: Results from Henaff et al. [2019].

Self-Supervised Training

You should be able to get some good results on ImageNet if you have access to 4 Tesla V100 GPUs with:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py \
  --ndf 192 \
  --n_rkhs 1536 \
  --batch_size 480 \
  --tclip 20.0 \
  --n_depth 8 \
  --dataset IN128 \
  --amp

For GPUs older than Volta, you will need to tweak the model size to fit in the available memory. The command above will take about 15GB of memory on device 0, and slightly less on devices 1-3, when running in mixed precision (--amp). When running in FP32, memory usage will be significantly higher.

Results with the data augmentation implemented in this repo will be less than our strongest results with equivalent architecture by 1-2%. Our strongest results use augmentation based on the ImageNet policy from the Fast AutoAugment paper by Lim et al., implemented in the repo available at: https://github.com/kakaobrain/fast-autoaugment.

Using the stronger augmentation and an appropriate learning schedule, the command above should produce a bit over 63% accuracy on ImageNet using the online evaluation classifiers. With the standard torchvision augmentations the result will drop to a bit over 62%, which is still decent (significantly state-of-the-art, makes good coffee, etc.).

Evaluation Classifiers

Example of retraining evaluation classifiers on Places205, using an encoder checkpointed after training via self-supervised learning on ImageNet:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py \
  --classifiers # Add this flag to train evaluation classifiers
  --checkpoint_path ./path/to/imagenet/checkpoint.pth \
  --ndf 192 \
  --n_rkhs 1536 \
  --batch_size 480 \
  --tclip 20.0 \
  --n_depth 8 \
  --dataset Places205 \
  --input_dir /path/to/places205 \
  --amp
  • When restoring from an encoder checkpoint, classifiers will be re-initialized before training again.
  • This can be used with other datasets if you can endure the pain of figuring out how to load them (one of the greatest challenges in any computer vision project)...

Enabling Mixed Precision Training (--amp)

If your GPU supports half precision, you can take advantage of it when training by passing the --amp (automatic mixed precision) flag.
We use NVIDIA/apex to enable mixed precision, so you will need to have Apex installed, see: Quick Start.

Citation

@article{bachman2019amdim,
  Author={Bachman, Philip and Hjelm, R Devon and Buchwalter, William},
  Journal={arXiv preprint arXiv:1906.00910},
  Title={Learning Representations by Maximizing Mutual Information Across Views},
  Year={2019}
}

Contact

For questions please contact Philip Bachman at phil.bachman at gmail.com.

amdim-public's People

Contributors

philip-bachman avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.