GithubHelp home page GithubHelp logo

bioencoder's Introduction

BioEncoder

BioEncoder is a toolkit for supervised metric learning to i) learn and extract features from images, ii) enhance biological image classification, and iii) identify the features most relevant to classification. Designed for diverse and complex datasets, the package and the available metric losses can handle unbalanced classes and subtle phenotypic differences more effectively than non-metric approaches. The package includes taxon-agnostic data loaders, custom augmentation techniques, hyperparameter tuning through YAML configuration files, and rich model visualizations, providing a comprehensive solution for high-throughput analysis of biological images.

Preprint on BioRxiv: https://doi.org/10.1101/2024.04.03.587987

Features

>> Full list of available model architectures, losses, optimizers, schedulers, and augmentations <<

  • Taxon-agnostic dataloaders (making it applicable to any dataset - not just biological ones)
  • Support of timm models, and pytorch-optimizer
  • Access to state-of-the-art metric losses, such as Supcon and Sub-center ArcFace.
  • Exponential Moving Average for stable training, and Stochastic Moving Average for better generalization and performance.
  • LRFinder for the second stage of the training.
  • Easy customization of hyperparameters, including augmentations, through YAML configs (check the config-templates folder for examples)
  • Custom augmentations techniques via albumentations
  • TensorBoard logs and checkpoints (soon to come: WandB integration)
  • Streamlit app with rich model visualizations (e.g., Grad-CAM and timm-vis)
  • Interactive t-SNE and PCA plots using Bokeh

Quickstart

>> Comprehensive help files <<

1. Install BioEncoder (into a virtual environment with pytorch/CUDA):

pip install bioencoder

2. Download example dataset from the data repo: https://zenodo.org/records/10909614/files/BioEncoder-data.zip. This archive contains the images and configuration files needed for step 3/4, as well as the final model checkpoints and a script to reproduce the results and figures presented in the paper. To play around with theinteractive figures and the model explorer you can also skip the training / SWA steps.

3. Start interactive session (e.g., in Spyder or VS code) and run the following commands one by one:

## use "overwrite=True to redo a step

import bioencoder

## global setup
bioencoder.configure(root_dir=r"~/bioencoder_wd", run_name="v1")

## split dataset
bioencoder.split_dataset(image_dir=r"~/Downloads/damselflies-aligned-trai_val", max_ratio=6, random_seed=42, val_percent=0.1, min_per_class=20)

## train stage 1
bioencoder.train(config_path=r"bioencoder_configs/train_stage1.yml")
bioencoder.swa(config_path=r"bioencoder_configs/swa_stage1.yml")

## explore embedding space and model from stage 1
bioencoder.interactive_plots(config_path=r"bioencoder_configs/plot_stage1.yml")
bioencoder.model_explorer(config_path=r"bioencoder_configs/explore_stage1.yml")

## (optional) learning rate finder for stage 2
bioencoder.lr_finder(config_path=r"bioencoder_configs/lr_finder.yml")

## train stage 2
bioencoder.train(config_path=r"bioencoder_configs/train_stage2.yml")
bioencoder.swa(config_path=r"bioencoder_configs/swa_stage2.yml")

## explore model from stage 2
bioencoder.model_explorer(config_path=r"bioencoder_configs/explore_stage2.yml")

## inference (stage 1 = embeddings, stage 2 = classification)
bioencoder.inference(config_path="bioencoder_configs/inference.yml", image="path/to/image.jpg" / np.array)

4. Alternatively, you can directly use the command line interface:

## use the flag "--overwrite" to redo a step

bioencoder_configure --root-dir "~/bioencoder_wd" --run-name v1
bioencoder_split_dataset --image-dir "~/Downloads/damselflies-aligned-trai_val" --max-ratio 6 --random-seed 42
bioencoder_train --config-path "bioencoder_configs/train_stage1.yml"
bioencoder_swa --config-path "bioencoder_configs/swa_stage1.yml"
bioencoder_interactive_plots --config-path "bioencoder_configs/plot_stage1.yml"
bioencoder_model_explorer --config-path "bioencoder_configs/explore_stage1.yml"
bioencoder_lr_finder --config-path "bioencoder_configs/lr_finder.yml"
bioencoder_train --config-path "bioencoder_configs/train_stage2.yml"
bioencoder_swa --config-path "bioencoder_configs/swa_stage2.yml"
bioencoder_model_explorer --config-path "bioencoder_configs/explore_stage2.yml"
bioencoder_inference --config-path "bioencoder_configs/inference.yml" --path "path/to/image.jpg"

Citation

Please cite BioEncoder as follows:

@UNPUBLISHED{Luerig2024-ov,
  title    = "{BioEncoder}: a metric learning toolkit for comparative
              organismal biology",
  author   = "Luerig, Moritz D and Di Martino, Emanuela and Porto, Arthur",
  journal  = "bioRxiv",
  pages    = "2024.04.03.587987",
  month    =  apr,
  year     =  2024,
  language = "en",
  doi      = "10.1101/2024.04.03.587987"
}

bioencoder's People

Contributors

agporto avatar mluerig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bioencoder's Issues

missing multi-GPU capability

currently, bioencoder is optimized for use on local workstation with a single GPU. the goal is to implement multi-GPU capability using torch.distributed so that users can use all of their GPUs or multiple nodes of a cluster.

please get in touch with us if you'd like to help us - it shouldn't be that hard, and we'd be very happy to support you!

inference and image embeddings

currently, bioencoder does not feature an inference script, i.e., a simple way to use a trained modfel on a single image and predict its class or compute embeddings - we have plans to implement this within the next few months

windows installation version issues

when installing bioencoder from a vanilla python 3.7 environment in windows I get a bunch of issues related to no matching version of faiss-gpu being found, which causes additional issues further downstream

D:\git-repos\mluerig\BioEncoder>mamba activate bioencoder

(bioencoder) D:\git-repos\mluerig\BioEncoder>pip install --upgrade pip
Requirement already satisfied: pip in c:\miniforge3\envs\bioencoder\lib\site-packages (23.3)

(bioencoder) D:\git-repos\mluerig\BioEncoder>pip install -r requirements.txt
Collecting torch-ema@ git+https://github.com/fadel/pytorch_ema@27afe25b9fb9f0d05a87ae94e4e4ad9e92d70a85 (from -r requirements.txt (line 5))
  Cloning https://github.com/fadel/pytorch_ema (to revision 27afe25b9fb9f0d05a87ae94e4e4ad9e92d70a85) to c:\users\mluerig\appdata\local\temp\pip-install-46gxfqqg\torch-ema_1ca08371f3be4775bf47cac5395af84c
  Running command git clone --filter=blob:none --quiet https://github.com/fadel/pytorch_ema 'C:\Users\mluerig\AppData\Local\Temp\pip-install-46gxfqqg\torch-ema_1ca08371f3be4775bf47cac5395af84c'
  hint: core.useBuiltinFSMonitor will be deprecated soon; use core.fsmonitor instead
  hint: Disable this message with "git config advice.useCoreFSMonitorConfig false"
  Running command git rev-parse -q --verify 'sha^27afe25b9fb9f0d05a87ae94e4e4ad9e92d70a85'
  Running command git fetch -q https://github.com/fadel/pytorch_ema 27afe25b9fb9f0d05a87ae94e4e4ad9e92d70a85
  hint: core.useBuiltinFSMonitor will be deprecated soon; use core.fsmonitor instead
  hint: Disable this message with "git config advice.useCoreFSMonitorConfig false"
  Running command git checkout -q 27afe25b9fb9f0d05a87ae94e4e4ad9e92d70a85
  hint: core.useBuiltinFSMonitor will be deprecated soon; use core.fsmonitor instead
  hint: Disable this message with "git config advice.useCoreFSMonitorConfig false"
  Resolved https://github.com/fadel/pytorch_ema to commit 27afe25b9fb9f0d05a87ae94e4e4ad9e92d70a85
  Preparing metadata (setup.py) ... done
Collecting albumentations==1.3.0 (from -r requirements.txt (line 1))
  Downloading albumentations-1.3.0-py3-none-any.whl (123 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 123.5/123.5 kB 1.2 MB/s eta 0:00:00
Collecting scikit-learn==1.0.2 (from -r requirements.txt (line 2))
  Downloading scikit_learn-1.0.2-cp37-cp37m-win_amd64.whl (7.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 7.2 MB/s eta 0:00:00
Collecting torch-lr-finder (from -r requirements.txt (line 3))
  Downloading torch_lr_finder-0.2.1-py3-none-any.whl (11 kB)
Collecting torch-optimizer==0.3.0 (from -r requirements.txt (line 4))
  Downloading torch_optimizer-0.3.0-py3-none-any.whl (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.9/61.9 kB ? eta 0:00:00
Collecting torch (from -r requirements.txt (line 6))
  Downloading torch-1.13.1-cp37-cp37m-win_amd64.whl (162.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 162.6/162.6 MB 6.5 MB/s eta 0:00:00
Collecting timm==0.6.12 (from -r requirements.txt (line 7))
  Downloading timm-0.6.12-py3-none-any.whl (549 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 549.1/549.1 kB 5.8 MB/s eta 0:00:00
Collecting torchvision (from -r requirements.txt (line 8))
  Downloading torchvision-0.14.1-cp37-cp37m-win_amd64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 6.9 MB/s eta 0:00:00
Requirement already satisfied: numpy==1.21.6 in c:\users\mluerig\appdata\roaming\python\python37\site-packages (from -r requirements.txt (line 9)) (1.21.6)
Collecting pytorch-metric-learning==2.0.1 (from -r requirements.txt (line 10))
  Downloading pytorch_metric_learning-2.0.1-py3-none-any.whl (109 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.3/109.3 kB 6.2 MB/s eta 0:00:00
Collecting matplotlib==3.5.3 (from -r requirements.txt (line 11))
  Downloading matplotlib-3.5.3-cp37-cp37m-win_amd64.whl (7.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.2/7.2 MB 7.3 MB/s eta 0:00:00
ERROR: Ignored the following versions that require a different python version: 1.1.0 Requires-Python >=3.8; 1.1.1 Requires-Python >=3.8; 1.1.2 Requires-Python >=3.8; 1.1.3 Requires-Python >=3.8; 1.2.0 Requires-Python >=3.8; 1.2.0rc1 Requires-Python >=3.8; 1.2.1 Requires-Python >=3.8; 1.2.2 Requires-Python >=3.8; 1.3.0 Requires-Python >=3.8; 1.3.0rc1 Requires-Python >=3.8; 1.3.1 Requires-Python >=3.8; 3.6.0 Requires-Python >=3.8; 3.6.0rc1 Requires-Python >=3.8; 3.6.0rc2 Requires-Python >=3.8; 3.6.1 Requires-Python >=3.8; 3.6.2 Requires-Python >=3.8; 3.6.3 Requires-Python >=3.8; 3.7.0 Requires-Python >=3.8; 3.7.0rc1 Requires-Python >=3.8; 3.7.1 Requires-Python >=3.8; 3.7.2 Requires-Python >=3.8; 3.7.3 Requires-Python >=3.8; 3.8.0 Requires-Python >=3.9; 3.8.0rc1 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement faiss-gpu==1.7.2 (from versions: 1.5.3, 1.6.0, 1.6.1, 1.6.3, 1.6.4, 1.6.4.post2, 1.6.5, 1.7.0, 1.7.1, 1.7.1.post1, 1.7.1.post2)
ERROR: No matching distribution found for faiss-gpu==1.7.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.