joshpk105 / summonchimera Goto Github PK
View Code? Open in Web Editor NEWhttps://link.springer.com/article/10.1186/s12859-014-0348-4
License: GNU General Public License v3.0
https://link.springer.com/article/10.1186/s12859-014-0348-4
License: GNU General Public License v3.0
SummonChimera Copyright (C) 2014 Joshua Patrick Katz This program comes with ABSOLUTELY NO WARRANTY; This is free software, and you are welcome to redistribute and/or modify it under the conditions outlined in the terms of the GNU General Public Licenses as published by the Free Software Foundation either version 3 or later. GNU General Public License version 3 is provided with this program as LICENSE.txt SummonChimera version 1.0 by Joshua Patrick Katz CONTENTS: bin/SummonChimera.pl: Perl script used to find viral integrations in NGS alignment data lib/Chimera.pm: Perl object used by SummonChimera LICENSE.txt: contains version 3 of the GNU General Public License README: this file PREREQS: While SummonChimera does not require any non-core Perl module in order to run, input files are generated by external programs. BLAST can be found at ftp://ftp.ncbi.nlm.nih.gov/blast/. Any mapping program which generates SAM files can be used though Bowtie2 and BWA are the two which were tested: Bowtie2: http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.1.0/ BWA: http://sourceforge.net/projects/bio-bwa/files/ BASH INSTALLATION: Add lib to your PERL5LIB enviroment variable Add bin to your PATH enviroment variable Replace Perl shebang line in SummonChimera.pl with results from `which perl` TESTING: Run the following command: perl TestSummonChimera.pl SummonChimera is properly installed if the output of TestSummonChimera is: No errors encountered when running SummonChimera.pl DEVELOPMENT: Currently SummonChimera predicts integrations as if the virus genome is circular. When using linear genome viruses incorrect predictions may be made. This will soon be rectified. USAGE: This pipeline is an example and is not the fastest method. INPUT: one.fq - file continaing first read in paired-end sequencing two.fq - file containing second read in paired-end sequencing host.fa - file containing host genome virus.fa - file containing virus genome virusID - the fasta identifier found in virus.fa SCRIPT: bowtie2-build host.fa,virus.fa host-virus.bowtieDB bowtie2 -x host-virus.bowtieDB -1 one.fq -2 two.fq 1> virus-host.sam #add samtools / perl one liner to retrieve unmapped makeblastdb -in host.fa -dbtype nucl -out host.blastDB makeblastdb -in virus.fa -dbtype nucl -out virus.blastDB blastn -db virus.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1> virus-host.blastn.txt blastn -db host.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1>> virus-host.blastn.txt SummonChimera.pl -b virus-host.blastn.txt -s virus-host.sam -v virudID > Integrations.tsv
Hello,
Based on the "README" file, I used my WGS samples and while executing the following script, I encountered some issues:
SCRIPT:
bowtie2-build host.fa,virus.fa host-virus.bowtieDB
bowtie2 -x host-virus.bowtieDB -1 one.fq -2 two.fq 1> virus-host.sam
#add samtools / perl one liner to retrieve unmapped
makeblastdb -in host.fa -dbtype nucl -out host.blastDB
makeblastdb -in virus.fa -dbtype nucl -out virus.blastDB
blastn -db virus.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1> virus-host.blastn.txt
blastn -db host.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1>> virus-host.blastn.txt
SummonChimera.pl -b virus-host.blastn.txt -s virus-host.sam -v virudID > Integrations.tsv
After completing the "blastn -db virus.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1> virus-host.blastn.txt" step, the "blastn -db host.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1>> virus-host.blastn.txt" step has been running for almost a month without completion. Moreover, the generated file has increased to 4.8TB in size. I'm wondering what might have gone wrong.
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.