GithubHelp home page GithubHelp logo

summonchimera's Introduction

SummonChimera Copyright (C) 2014 Joshua Patrick Katz
This program comes with ABSOLUTELY NO WARRANTY;
This is free software, and you are welcome to redistribute and/or modify
it under the conditions outlined in the terms of the GNU General Public Licenses
as published by the Free Software Foundation either version 3 or later.
GNU General Public License version 3 is provided with this program as LICENSE.txt

SummonChimera version 1.0 by Joshua Patrick Katz 

CONTENTS:
	bin/SummonChimera.pl: Perl script used to find viral integrations in NGS alignment data
	lib/Chimera.pm: Perl object used by SummonChimera
	LICENSE.txt: contains version 3 of the GNU General Public License
	README: this file

PREREQS:
	While SummonChimera does not require any non-core Perl module in order to run, input files are
	generated by external programs. BLAST can be found at ftp://ftp.ncbi.nlm.nih.gov/blast/. Any
	mapping program which generates SAM files can be used though Bowtie2 and BWA are the two
	which were tested: 
	Bowtie2: http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.1.0/ 
	BWA: http://sourceforge.net/projects/bio-bwa/files/

BASH INSTALLATION:
	Add lib to your PERL5LIB enviroment variable
	Add bin to your PATH enviroment variable
	Replace Perl shebang line in SummonChimera.pl with results from `which perl`

TESTING:
	Run the following command:
		perl TestSummonChimera.pl
	SummonChimera is properly installed if the output of TestSummonChimera is: 
		No errors encountered when running SummonChimera.pl

DEVELOPMENT:
	Currently SummonChimera predicts integrations as if the virus genome is circular. When using
	linear genome viruses incorrect predictions may be made. This will soon be rectified.

USAGE:
	This pipeline is an example and is not the fastest method.
	INPUT:
		one.fq - file continaing first read in paired-end sequencing
		two.fq - file containing second read in paired-end sequencing
		host.fa - file containing host genome
		virus.fa - file containing virus genome
		virusID - the fasta identifier found in virus.fa
	SCRIPT:
		bowtie2-build host.fa,virus.fa host-virus.bowtieDB
		bowtie2 -x host-virus.bowtieDB -1 one.fq -2 two.fq 1> virus-host.sam
		#add samtools / perl one liner to retrieve unmapped
		makeblastdb -in host.fa -dbtype nucl -out host.blastDB
		makeblastdb -in virus.fa -dbtype nucl -out virus.blastDB
		blastn -db virus.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1> virus-host.blastn.txt
		blastn -db host.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1>> virus-host.blastn.txt
		SummonChimera.pl -b virus-host.blastn.txt -s virus-host.sam -v virudID > Integrations.tsv
		 	

summonchimera's People

Stargazers

Alexander Solovyov avatar

Watchers

 avatar

summonchimera's Issues

Issue with Generating virus-host.blastn.txt File

Hello,

Based on the "README" file, I used my WGS samples and while executing the following script, I encountered some issues:

SCRIPT:

bowtie2-build host.fa,virus.fa host-virus.bowtieDB
bowtie2 -x host-virus.bowtieDB -1 one.fq -2 two.fq 1> virus-host.sam
#add samtools / perl one liner to retrieve unmapped
makeblastdb -in host.fa -dbtype nucl -out host.blastDB
makeblastdb -in virus.fa -dbtype nucl -out virus.blastDB
blastn -db virus.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1> virus-host.blastn.txt
blastn -db host.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1>> virus-host.blastn.txt
SummonChimera.pl -b virus-host.blastn.txt -s virus-host.sam -v virudID > Integrations.tsv

After completing the "blastn -db virus.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1> virus-host.blastn.txt" step, the "blastn -db host.blastDB -query unmapped.fa -word_size 16 -outfmt 6 1>> virus-host.blastn.txt" step has been running for almost a month without completion. Moreover, the generated file has increased to 4.8TB in size. I'm wondering what might have gone wrong.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.