GithubHelp home page GithubHelp logo

cherry_crispr_multihost's Introduction

CHERRY-crispr version

CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes. CHERRY is based on a deep learning model, which consists of a graph convolutional encoder and a link prediction decoder.

In this program, we provide an extension version of CHERRY, which uses the CRISPR information in CHERRY's database for multi-host (host range) prediction.


Input (provided by the user):
    1. phage contigs from their samples (FASTA files)

Output:
    The host range of the given phages (CSV files)

Required Dependencies

  • Python 3.x
  • Pandas
  • Numpy
  • Biopython
  • NCBI BLAST+

An easiler way to install

We suggest you to install all the package using conda (both miniconda and Anaconda are ok) following the command lines as below:

conda create --name cherry_crispr_multihost python=3.8
conda activate cherry_crispr_multihost

conda install pandas numpy biopython
conda install blast -c bioconda

Usage

Once install the required environment, you need to activate it when you want to use:

conda activate cherry_crispr_multihost

Then, the command of multi-host extension can be called by:

python PATH_TO_cherry_crispr_multihost/Cherry_multihost.py --infile PATH_TO_FASTA --outfolder PATH_TO_OUTPUT_FOLDER --datasetpth [where you place the dataset folder provided in this GitHub] --threads NUM_OF_THREAD --ident IDENTITY_OF_ALIGNMENT --coverage COVERAGE_OF_ALIGNMENT

# example
python CHERRY_crispr_multihost/Cherry_multihost.py --infile nucl.fasta --outfolder test_out/ --datasetpth CHERRY_crispr_multihost/dataset --ident 75 --coverage 0.75

There are two thresholds for users:

  1. --ident: the identity of the CRISPRs alignments (default: 75)
  2. --coverage: the coverage of the CRISPRs alignments (default: 0.75)

Outputs

There are two output files in --outfolder PATH_TO_OUTPUT_FOLDER.

  1. alignment_result.tab: BLASTN results between CRISPR and phage
  2. prediction.csv: CSV files of the prediction (alignment > --ident IDENTITY_OF_ALIGNMENT && > --coverage COVERAGE_OF_ALIGNMENT)

Citation

If you use this program, please cite the following papers:

  • CHERRY:
Jiayu Shang, Yanni Sun, CHERRY: a Computational metHod for accuratE pRediction of virus–pRokarYotic interactions using a graph encoder–decoder model, Briefings in Bioinformatics, 2022;, bbac182, https://doi.org/10.1093/bib/bbac182

The original version of CHERRY can be found via: CHERRY

cherry_crispr_multihost's People

Contributors

kennthshang avatar

Stargazers

xdli avatar

Watchers

 avatar

cherry_crispr_multihost's Issues

Difference in the number of CHERRY_crispr_multihost and CHERRY prediction results

Dear KennthShang,

I hope this email finds you well. I am reaching out to you regarding your open-source project and have a question regarding the difference in predicting the number of virus hosts using CHERRY_crispr_multihost.py and Cherry_single.py in PhaBOX.

I have noticed that when running CHERRY_crispr_multihost.py and Cherry_single.py with their default parameters, they provide different results. Specifically, CHERRY_crispr_multihost.py yields a lower number of host predictions whereas Cherry_single.py produces the opposite results. I only counted the number based on crispr.

Cherry_single.py Result
cut -f5 -d , host_prediction.csv |sort |uniq -c
    112 -
   2989 CRISPR
    170 Predict

CHERRY_crispr_multihost Result
$wc -l prediction.csv 
2566 prediction.csv

Additionally, I would like to inquire whether the values used in Cherry_multihost.py for "--ident" (the identity of the CRISPR alignments) with a default of 75, and "--coverage" (the coverage of the CRISPR alignments) with a default of 0.75, might be too low. Is it necessary for me to increase these threshold values, and if so, should I make adjustments in Cherry_single.py as well?

I truly appreciate your contributions to the open-source community, and I am very interested in your project. I would greatly appreciate any insights you can provide to help me better understand this discrepancy.

Thank you for your time and response.

Best regards,
Robin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.