GithubHelp home page GithubHelp logo

ampbenchmark's Introduction

AMPBenchmark

AMPBenchmark is a part of our initative for the improvement of benchmarking standards in the field of antimicrobial peptide (AMP) prediction.

How to use the public data?

  1. Download the benchmark sequence data:
  2. Download the training sequence data for all methods and replications:
  3. Train your model using each of the training data set (class of a sequence is denoted by AMP=1 for AMPs and AMP=0 for negative samples, see Sequence data section for details.)
  4. Benchmark trained models against our data. Make sure to use a subset of sequences for appropriate replication (replication number is denoted by, e.g. rep=1, see Sequence data section for details.)
  5. Submit the results in the format described below to the AMPBenchmark web server.

Data submission format

ID training_sampling AMP_probability
DBAASP_10018_AMP=1_rep1 dbAMP 0.97
DBAASP_3217_AMP=1_rep1 dbAMP 0.61
  • ID: must contain the sequence ID, as provided in the FASTA headers of the input sequences.
  • training_sampling: has to contain the type of negative sampling method used to train the model. Possible values are: AMAP, AmpGram, ampir-mature, AMPlify, AMPScannerV2, CS-AMPPred, dbAMP, Gabere&Noble, iAMP-2L, Wang-et-al, Witten&Witten. Remember that a proper benchmark requires you to train your model using every provided sampling method and evaluate it using all sampling methods using appropriate replication.
  • AMP_probability: has to be in the range between 0 and 1.

Example data for a random classifier can be downloaded from Dropbox.

Sequence data

The input data is hosted on Dropbox and GitHub. Note that this single file contains data for all replications which should be used separately with appropriate replications of training sets.

The training data sets are hosted on Dropbox and follow the same naming convention.

There are two types of the input sequences:

  • positive sequence (e.g., DBAASP_10718_AMP=1_rep1): IDinDBAASP_class_replicateID.
  • negative sequences (e.g., Seq1896_sampling_method=Gabere&Noble_AMP=0_rep4): IDandSamplingMethod_class_replicateID.

AMP sequences are derived from the DBAASP database.

md5 sum of the AMPBenchmark_public.fasta: 58f1424c057aaeb64bc632cad6038cad.

Citation

Katarzyna Sidorczuk, Przemysław Gagat, Filip Pietluch, Jakub Kała, Dominik Rafacz, Laura Bąkała, Jadwiga Słowik, Rafał Kolenda, Stefan Rödiger, Legana C H W Fingerhut, Ira R Cooke, Paweł Mackiewicz, Michał Burdukiewicz, Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data, Briefings in Bioinformatics, 2022;, bbac343, https://doi.org/10.1093/bib/bbac343.

Important links

Contact

If you have any questions, suggestions or comments, contact Michal Burdukiewicz.

Changelog

  • 2024/07/29: updated dropbox links.
  • 2023/01/11: fixed data processing.

ampbenchmark's People

Contributors

ksidorczuk avatar michbur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ampbenchmark's Issues

Question about training data

Hello, I'm interested in your study and AMP activity prediction using AI.

I want to use AMPBenchmark in my study. But I have some question about AMPBenchmark training sets.

There exists 5 repeats training sets. When I upload benchmark set result, should I upload 5 times for each repeated training set or average of 5 times probabilities?

Thank you for reading.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.