GithubHelp home page GithubHelp logo

mos_cnn's Introduction

novel-species-detection

Goodwin, A., Padmanabhan, S., Hira, S. et al. Mosquito species identification using convolutional neural networks with a multitiered ensemble model for novel species detection. Sci Rep 11, 13656 (2021). https://doi.org/10.1038/s41598-021-92891-9 Algorithm structure can be viewed in the paper.

Requirements and Installation

This code base will function as expected Ubuntu 20.04/18.04 and is run with Python3, occassionally with Jupyter notebooks. Training should be done on a computer with appropriate hardware (modern GPUs with >8GB memory). Cuda support is necessary for accessing such hardware for training with this repository. Creation of a virtual envrionment for this work is recommended. The dependencies and versions are listed in the environment.yml file. If you need have not installed conda, do so following these guidelines. Create a virtual environment with conda: conda env create -f environment.yml -n <env_name>. Use conda activate <env_name> to activate the envrionment and begin running experiments. For more instruction on conda environments, please follow the details listed here. Additionally, for Tiers II and III, install using pip any outstanding requirements using the requirements.txt file: pip install -i requirements.txt. Supposing Cuda is already installed and the GPU drivers for your GPUs are appropriately installed, setup and installation may take up to 2 hours depending on your internet connection. Download the images used in the paper, extract the zip file to your /opt/ directory (another directory may be used, but this requires changing the files paths in the datasplit csv files). Model files as generated by the config files in this repo are available for download here.

Details

Folders and Files

  • configs/ in here you will find the configuration files for Xception closed and Xception open. When a classification notebook is run, the file "config/config.py" will be duplicated and placed in a folder named after the experiment name in the config file, and placed in the folder "configs/old_configs/". So simply modify the config file to change the parameters of the experiment. Old configs (in particular those under paper_redo and bigset) can be used to replicate the results in the paper.

  • data/ in here you will find the datasplits for Tier I Xception and the 39 species classification (referred to as bigset), and the notebooks for generating them. You will also find a smaller test datasplit for verifying the funtion of the repo more rapidly, and pad.jpg, which is used to help make the images square prior to downsampling.

  • model_weights/ not in the github, but will be generated as a location to store the model weights from tier I

  • models/ location for scripts to be imported which will download the pretrained xception network.

  • modules/ loss functions optimizers etc

  • utils/ misc functions called for training testing evaluation etc

  • notebooks/ notebooks for training Tier I components (and producing features and outputs for other tiers), 39 species classification, and figure generation. Everytime you run an experiment from the classification-explore*.ipynb, it reads the parameters from config/config.py, unless otherwise specified (by pointing to a config file located elsewhere, such as the "configs/old_configs"). Then it copies the file of path config/config.py into 'config/old_configs/{}.py'.format(config.exp_name)

  • subm/ results for each photo are submitted here with probabilities.

  • tierI_output/ features and probabilties for each photo are submitted here for the entire dataset.

  • evaluations.py - helps generate submission files, feature files and probability files

  • test.py -for testing just CNN classification independent of other elements

Tier I components

  • Before training any Tier I component, double check that the configuration file is set to desired parameters
  • All Tier I components will report outputs relevant to Tier II or Tier III in directory 'tierI-output/'
  • Tier 1 Closed and Open features and probabilities as generated in the paper are available for download.

Xception

  • To train, test, or output features, follow 'notebooks/classification-explore.ipynb'. Training on a RTX 2070 Super GPU takes approximately two hours, but is estimated in the notebook with real-time updates.
  • To toggle between open and closed sets, and folds, change configuration file (config/config.py)
  • 'notebooks/classification-explore-bigset.ipynb' will train the 39 species classification. The results of this experiment as produced in the paper are available here.

Tier II Components

  • see ReadMe.md in the unknown/ directory

Tier III Components

  • see ReadMe.md in the unknown/ directory

Results Analysis

All relevant scripts are located in the results_processing/ folder. See the additional readme in that folder. Processing results over the folds indicated in the paper can be done as follows:

  • For averaging confusion matrices for just classification without unknown detection, go to the notebooks/figure_generation.ipynb.
  • For cascading novelty detection with classification, use: 1. cascade_novelty_and_classify.py, 2. prep_cascaded_test_sheets.ipynb, 3. avg_cascades.ipynb
  • For condensing results in preparation for McNemar's test, use comparison.ipynb Each script or function within these notebooks should complete within about one minute. The expected output for processing the results of the paper is as reported in the abstract: Closed-set classification of 16 known species achieved 97.04±0.87% accuracy independently, and 89.07±5.58% when cascaded with novelty detection. Closed-set classification of 39 species produces a macro F1-score of 86.07±1.81%. . These results are the expected output from processing the results from the model files or the given outputs of the model files (eg. features.csv), or processing the results from outputs of Tier III as given. However, if the system is retrained, variability is expected given the stochastic nature of training neural networks. This zip file contains the outputs of these results processing along with the relevent outputs from Tiers II and III as as produced in the paper.

Copyright information

Shield: CC BY NC 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY NC 4.0

mos_cnn's People

Contributors

minhhieutruong0705 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.