chrishiv / shiver Goto Github PK

View Code? Open in Web Editor NEW

23.0 23.0 15.0 6.28 MB

Sequences from HIV Easily Reconstructed

License: GNU General Public License v3.0

Shell 22.81% Python 74.61% R 2.58%

shiver's People

Contributors

Stargazers

Watchers

Forkers

damlabresources sanbihiv m-bull magosil86 masterxilo hcovlab pauloluniyi sclipman wangdi2014 olli0601 dianalir lithomson mqondisi charlesfoster susannah-b

shiver's Issues

Restriction on sequence names?

Hi @ChrisHIV

I've run into this error:

shiver_map_reads.sh was called thus:
./shiver/shiver_map_reads.sh MyInitDir config.sh HIV2-043/contigs.fasta HIV2-043 HIV2-043.blast HIV2-043_raw_wRefs.fasta HIV2-043_1.fastq HIV2-043_2.fastq
Info: using 01_AE.JP.11.DE00111JP003.KF859741 (elongated with other longer references if needed) to provide 821 bases to fill in gaps before/between/after contigs. This reference was the best match for the contigs: 95.1076974453% of positions in agreement.
cp: ./HIV2-043_1.fastq and HIV2-043_1.fastq are identical (not copied).
cp: ./HIV2-043_2.fastq and HIV2-043_2.fastq are identical (not copied).
Found at least one read in HIV2-043_1.fastq whose sequence ID does not end in "\1".
Problem with read names. Quitting.

Is there a restriction on sequence IDs, such that sequence IDs have to end in \1 and \2?

no consensus fasta file produced

I have been trying to use shiver for a set of illumina data. SPAdes produced a single good and long HIV contig. shiver_align_contigs.sh produced a _shiverBiuld_raw_wRefs.fasta , but not a _shiverBiuld_cut_wRefs.fasta. shiver_map_reads.sh did not produce a consensus fasta file. What could be the problem? Thanks.

Error running with Mafft

Hello,

I am very interested in shiver for my hepatitis B virus work.
I have encounter an error with mafft in the "shiver.align_contigs.sh" step. Please see commands below. I have used the example datasets from the program.
I installed mafft with conda and mafft is running fine from anywhere in terminal. Do I still have to specific the path for mafft, even when installed with conda. Please let us know.
Thanks very much,
Josef

sh shiver_init.sh MyInitDir config_original.sh HIV1_REF_2010_env_DNA.fasta adapters_Illumina.fasta primers_GallEtAl2012.fasta

Building a new DB, current time: 06/19/2018 09:20:55
New DB name: /Users/wagnerj/09_shiver/shiver-master/MyInitDir/ExistingRefsBlastDatabase
New DB title: /Users/wagnerj/09_shiver/shiver-master/MyInitDir/ExistingRefsUngapped.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 170 sequences in 0.0232549 seconds.

Informations-MacBook:shiver-master wagnerj$ sh shiver_align_contigs.sh MyInitDir config_original.sh MysteryHIV_contigs.fasta Mystery
Error running 'mafft'. Are you sure that mafft is installed, and that you chose the right value for the config file variable 'mafft'?
Problem with config_original.sh. Quitting.
Informations-MacBook:shiver-master wagnerj$

import of maketrans causing errors

imports of maketrans from string in ConstructBestRef and MergeAlignments are causing import errors in my environment -- and don't seem to be needed? @ChrisHIV

Suggestion to rename FindSeqsInFasta.py within shiver and/or phyloscanner 'tools' folders

Hi Chris! Hope you are doing great. This is just a kind suggestion to rename FindSeqsInFasta.py within shiver and/or phyloscanner 'tools' folders. The scripts have the same name, but they are different what can produce a lot of confusion if one wants to pipeline both tools (shiver & phyloscanner), fusing the 'tools' folders of both into a common one within a bin. Without noticing it in the first place, I rewrote the shiver's FindSeqsInFasta.py by the phyloscanner's one and got an error which took me quite some time to fix as I just could not get what was going on. I had to go through the relevant code chunks before I found out the root cause of my issue. Thanks, Vera.

TranslateSeqForGlobalAln.py Error

Hi Chris,

Thanks for developing the great tool for virus genome reconstruction. When I used TranslateSeqForGlobalAln.py to generate a consensus for global alignment, I got an error as follows:

Traceback (most recent call last):
  File "/home/user/shiver/tools/TranslateSeqForGlobalAln.py", line 47, in <module>
    if seq == None:
  File "/usr/local/lib/python2.7/dist-packages/Bio/SeqRecord.py", line 748, in __eq__
    raise NotImplementedError(_NO_SEQRECORD_COMPARISON)
NotImplementedError: SeqRecord comparison is deliberately not implemented. Explicitly compare the attributes of interest.

I found something similar on this website: https://www.rosettacommons.org/node/10197
So I changed if seq == None: to if seq is None:, then TranslateSeqForGlobalAln.py seems to work well. I was wondering if this is the right way to correct this error?

Best wishes,
Michael

"alignment" to "global alignment" in global alignment base freqs header

convert python to python3?

Problem with read names that end in /1 & /2

My original read names look like:
@M03972:191:000000000-CJT9T:1:1101:12698:6609 1:N:0:CTCTCTAC+GCGTAAGA

I have amended them to look like:
@M03972:191:000000000-CJT9T:1:1101:12698:6609/1:N:0:CTCTCTAC+GCGTAAGA

I have also removed the information after the /1 /2 that was introduced in Casava 1.8, giving me:
@M03972:191:000000000-CJT9T:1:1101:12698:6609/1

But I still get the Problem with read names. Quitting. error. Do you have any advice?

create a release

@ChrisHIV Could you create a named release? I want to add a bioconda recipe for shiver, but the standard process for that is based on having a github release. Thanks a lot!

Simplify deployment - provide Docker container?

Me and a colleague had some troubles installing and getting this to work, going through the classic dependency hell. The VirtualBox image functions nicely, but we cannot use it in our cluster setup: we cannot run 64 bit virtualbox machines nested inside our private-cloud virtual machines it seems - nested VT-x appears to be hard to do.

The setup instructions on https://github.com/ChrisHIV/shiver/blob/master/info/InstallationNotes.sh where very helpful for installing shiver from scratch. However, they where incomplete in our environment, possibly due to some expected tools not being preinstalled. Also, I don't see any reason not to set up trimmomatic as a system-wide program on the PATH like all the other programs, so the script should just do (2) (or something similar) for us instead of bothering us with a choice for that one (it doesn't with all the other programs). I was able to successfully set up shiver on a bare Ubuntu 16.04 64bit virtual machine using the following steps:

sudo apt update
sudo apt install python-pip -y
pip install --upgrade pip
sudo apt install python3-pip -y
sudo apt install git -y
sudo apt install wget -y
sudo apt install unzip -y
sudo apt install bc -y
cd ~

python --version
python2 --version
python3 --version
bc --version
git --version
unzip --hh
pip --version
pip2 --version
pip3 --version
wget --version

# fastaq
pip3 install pyfastaq

# biopython (import Bio - fix ImportError: No module named Bio)
#sudo pip install biopython
sudo pip2 install biopython
sudo pip3 install biopython


# blast
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.7.1+-x64-linux.tar.gz
tar -xzf ncbi-blast-2.7.1+-x64-linux.tar.gz
echo 'PATH=$PATH:~/ncbi-blast-2.7.1+/bin/' >> ~/.bashrc; source ~/.bashrc
blastx -h

# samtools
# 
# dependency ncurses
sudo apt install -y libncurses5-dev

# 
sudo apt-get install -y zlib1g-dev libbz2-dev liblzma-dev
wget https://github.com/samtools/samtools/releases/download/1.6/samtools-1.6.tar.bz2
tar -xjf samtools-1.6.tar.bz2 
cd ~/samtools-1.6/
./configure
make
sudo make install
echo 'PATH=$PATH:~/samtools-1.6/' >> ~/.bashrc; source ~/.bashrc
cd ~

samtools --version


# mafft
wget https://mafft.cbrc.jp/alignment/software/mafft-7.313-without-extensions-src.tgz
tar -xzf mafft-7.313-without-extensions-src.tgz
cd mafft-7.313-without-extensions/core/
make clean
make
sudo make install
cd ~

mafft --version


# trimmomatic... needs more configuration to be used
sudo apt-get install -y default-jre
wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.36.zip
unzip Trimmomatic-0.36.zip

#echo "alias trimmomatic='java -jar $HOME/Trimmomatic-0.36/trimmomatic-0.36.jar'" >> ~/.bashrc; source ~/.bashrc
# Aliases are not available outside of bash, need:
mkdir -p ~/bin
echo '#!/bin/bash' >> ~/bin/trimmomatic
echo "java -jar $HOME/Trimmomatic-0.36/trimmomatic-0.36.jar" '"$@"' >> ~/bin/trimmomatic
chmod a=rwx ~/bin/*

echo trimmomatic -version
trimmomatic -version

cd ~

# smalt
wget https://sourceforge.net/projects/smalt/files/latest/download -O smalt.tgz
tar -xzf smalt.tgz
cd smalt-0.7.6/
./configure
make
sudo make install
cd ~

smalt help

# iva (https://github.com/sanger-pathogens/iva http://sanger-pathogens.github.io/iva/)
# its dependencies (that are missing up to this point)
sudo apt install kmc -y
sudo apt install mummer -y
pip3 install iva
cd ~

kmc
#kmc_dump --help
iva --version
nucmer --version

# BWA
git clone https://github.com/lh3/bwa.git
cd bwa
make
echo 'PATH=$PATH:~/bwa/' >> ~/.bashrc; source ~/.bashrc
cd ~

#bwa --version # does not exist

# bowtie2
# isn't that the easiest possible software installation?
wget https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.3.3.1/bowtie2-2.3.3.1-linux-x86_64.zip/download -O bowtie2.zip 
unzip bowtie2.zip
echo 'PATH=$PATH:~/bowtie2-2.3.3.1-linux-x86_64/' >> ~/.bashrc; source ~/.bashrc
cd ~

bowtie2 --version

# shiver
git clone https://github.com/ChrisHIV/shiver

This has worked nicely on our private cloud setup's Ubuntu 16.04 image and on the Ubuntu 16.04 image from https://www.osboxes.org/ubuntu/ and even inside a docker container starting from the ubuntu:xenial image.

In order to save future researchers from having to figure out this mundane stuff and let them focus on doing research instead, it would be nice if you could give us a more self-contained software package. It would be cool if you guys could provide a more lightweight Docker container which can easily run within another virtualization environment in addition to the VM. The above steps should provide a basis for creating a corresponding Dockerfile.

Issue with Initialization using ExampleInput

Hi,

I have been having trouble with using the code and trying to get it to run with python 3.10.0. I initially installed just python 3.10 but encountered errors saying that py2 was needed. I then installed python 2.7 and tried to run initialization again but have now encountered this error:

Traceback (most recent call last): File "/home/kkolsun/shiver/tools/RemoveBlankColumns.py", line 15, in <module> from Bio import AlignIO ImportError: No module named Bio Problem removing pure-gap columns from HIV1_ALL_2020_genome_DNA.fasta. Quitting.

I have tried a few online solutions but with no success. If there is a way to fix this I would love guidance as I really want to try to use this pipeline but am having a tough time getting it to actually run.

error if too many (contaminant) contigs

Hi Chris,

I'd like to try out shiver with some metagenomic read sets and assemblies, and reference alignments for various viruses (each done in turn).

Trying one sample with a large number of contigs (>100000, most of which "contaminant"), I get the following error:
./shiver-1.3.3/shiver_map_reads.sh: line 347: ./shiver-1.3.3/tools/FindSeqsInFasta.py: Argument list too long
.. because the shell can't deal with so many arguments. It results in the temporary RefAndContaminantContigs blast database being built with only the reference, but no contaminant contigs.

I know metagenomic assemblies are not what your tool was designed for, but I wonder whether you would be willing to address this issue anyway (if there is an easy way around it)?

Many thanks,

Carlijn

Problem aligning

I run the command:
cd SID && ../../shiver/shiver_align_contigs.sh
../ShiverInitDir
../../shiver/config.sh
SID/contigs.fasta
SID && cd ..

I get the error below but it does not give any indication as to how to resolve.

Problem aligning temp_HIVcontigs_uncut1.fasta using mafft.
Problem aligning the raw contigs to refs. Quitting.

Please advise.

Issue with shiver_map_reads.sh - temp_ContaminantReads_1.txt contains reads but 0 in temp_reads1trim2.fastq

Hi Chris,

I'm encountering issues running the "shiver_map_reads.sh" using my own dataset of HIV sequences. After successfully running "shiver_align_contigs.sh", the "shiver_map_reads.sh" function has always quit on my own dataset (it has worked when I use files from the ExampleInput folder).

Before quitting it returns "temp_ContaminantReads_1.txt contains (e.g. 275913) reads but we only found 0 in temp_reads1trim2.fastq."
The last file created in the output directory is always the 'T1076_PreMapping_1.fastq' file. Typically, it runs for quite a while i.e. several hours before I come across this issue. Below I have pasted my input command and the verbose.

I am unsure what the issue and next steps to take. I'd greatly appreciate any help or advice you may have!

Thanks,
Stephanie

Info: shiver_map_reads.sh was called thus:
/mnt/c/Users/smelnych/Desktop/shiver/shiver_map_reads.sh /mnt/c/Users/smelnych/Desktop/shiver/MyInitDir_hiv /mnt/c/Users/smelnych/Desktop/shiver/config_original.sh T1076prrt_contigs.fasta T1076 T1076.blast T1076_cut_wRefs.fasta T1076-PRRT_R_1_renamed.fastq T1076-PRRT_R_2_renamed.fastq
With these config file parameter values:
MaxContigGappiness=0.05
MinContigHitFrac=0.9
python2='python2'
BlastDBcommand='makeblastdb'
BlastNcommand='blastn'
smalt='smalt'
bwa='bwa'
bowtie2='bowtie2'
bowtie2_build='bowtie2-build'
samtools='samtools'
mafft='mafft'
fastaq='fastaq'
trimmomatic="trimmomatic"
MafftArgsForPairwise='--maxiterate 1000 --localpair'
MinContigLength=300
MinGapSizeToSplitGontig=160
MinContigFragmentLength=80
TrimToKnownGenome=true
BlastTasks='megablast'
ContigBlastArgs="-max_target_seqs 1 -word_size 17"
ContigMinBlastOverlapToMerge='2'
MafftTestingStrategy="MinAlnLength"
TrimReadsForAdaptersAndQual=true
IlluminaClipParams='2:10:7:1:true'
BaseQualityParams='MINLEN:50 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20'
NumThreadsTrimmomatic=1
TrimReadsForPrimers=true
TrimPrimerWithOneSNP=false
CleanReads=true
mapper="smalt"
smaltIndexOptions="-k 15 -s 3"
smaltMapOptions="-x -y 0.7 -j 0 -i 2000"
bowtieOptions="--local --maxins 2000 --no-discordant --no-unal --quiet"
bwaOptions='-v 2'
samtoolsReadFlags='-f 3 -F 4'
mpileupOptions='--no-BAQ --min-BQ 5 --max-depth 1000000'
MinCov1=15
MinCov2=30
MinBaseFrac=-1
deduplicate=false
DeduplicationCommand="picard MarkDuplicates"
remap=true
MapContaminantReads=false
GiveHXB2coords=true
AlignContigsToConsensus=false
KeepPreMappingReads=false
###############################################

Info: using CON_B(1295)_Pol (elongated with other longer references if needed) to provide 1722 bases to fill in gaps before/between/after contigs. This reference was the best match for the contigs: 94.1466854725% of positions in agreement.
Now trimming reads - typically a slow step.
Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGATCTTACGCGTGTAGATCTCGGTGGTCGCCGTATCATT'
Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACCGTACTAGATCTCGTATGCCGTCTTCTGCTTG'
Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACTAAGGCGAATCTCGTATGCCGTCTTCTGCTTG'
Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACTCCTGAGCATCTCGTATGCCGTCTTCTGCTTG'
Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACAGGCAGAAATCTCGTATGCCGTCTTCTGCTTG'
Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACTAGGCATGATCTCGTATGCCGTCTTCTGCTTG'
Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACGGACTCCTATCTCGTATGCCGTCTTCTGCTTG'
Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACCTCTCTACATCTCGTATGCCGTCTTCTGCTTG'
Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACGTAGAGGAATCTCGTATGCCGTCTTCTGCTTG'
Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACAAGAGGCAATCTCGTATGCCGTCTTCTGCTTG'
Skipping duplicate Clipping Sequence: 'CCGAGCCCACGAGACCGAGGCTGATCTCGTATGCCGTCTTCTGCTTG'
Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGATACTCCTTGTGTAGATCTCGGTGGTCGCCGTATCATT'
Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGAAGGCTTAGGTGTAGATCTCGGTGGTCGCCGTATCATT'
Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGACTCCTTACGTGTAGATCTCGGTGGTCGCCGTATCATT'
Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGATATGCAGTGTGTAGATCTCGGTGGTCGCCGTATCATT'
Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGAAGAGGATAGTGTAGATCTCGGTGGTCGCCGTATCATT'
Skipping duplicate Clipping Sequence: 'GACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATT'
fastaq completed successfully.

Building a new DB, current time: 01/18/2023 13:46:57
New DB name: /mnt/c/Users/smelnych/Desktop/shiver/hiv/fastq/T1076/temp_BlastDB
New DB title: temp_RefAndContaminantContigs.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 45 sequences in 0.093262 seconds.

Now blasting the reads - typically a slow step.
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Error: temp_ContaminantReads_1.txt contains 275913 reads but we only found 0 in temp_reads1trim2.fastq.
Problem extracting the non-contaminant reads using /mnt/c/Users/smelnych/Desktop/shiver/tools/FindNamedReadsInSortedFastq.py. Quitting.

Inquiry about Using Shiver Software Output for HIV Drug Resistance Prediction

Hello,

I'm interested in using the output results from the Shiver software for HIV drug resistance prediction. Could you please advise if it's possible to utilize Shiver's output for this purpose? Additionally, how can I convert the output results into a format that is recognizable by the HIVdb, such as a Codon Frequency (CodFreq) format file?

Thank you.

Xiangchen

Shiver for HTLV-1?

Hi Chris,

Do you think Shiver could also be used for HTLV-1?

chrishiv / shiver Goto Github PK

shiver's People

Contributors

Stargazers

Watchers

Forkers

shiver's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs