andersen-lab / ivar Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gkarthik/ivar

115.0 13.0 39.0 16.39 MB

iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Home Page: https://andersen-lab.github.io/ivar/html/

License: GNU General Public License v3.0

Makefile 1.66% Shell 0.78% M4 0.66% Python 5.95% C++ 90.44% Dockerfile 0.50%

variant-calling amplicon-sequencing

ivar's Introduction

iVar

iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing. Additional tools for metagenomic sequencing are actively being incorporated into iVar. While each of these functions can be accomplished using existing tools, iVar contains an intersection of functionality from multiple tools that are required to call iSNVs and consensus sequences from viral sequencing data across multiple replicates. We implemented the following functions in iVar: (1) trimming of primers and low-quality bases, (2) consensus calling, (3) variant calling - both iSNVs and insertions/deletions, and (4) identifying mismatches to primer sequences and excluding the corresponding reads from alignment files.

An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar

Genome Biology 2019 20:8

Nathan D Grubaugh, Karthik Gangavarapu, Joshua Quick, Nathaniel L Matteson, Jaqueline Goes De Jesus, Bradley J Main, Amanda L Tan, Lauren M Paul, Doug E Brackney, Saran Grewal, Nikos Gurfield, Koen KA Van Rompay, Sharon Isern, Scott F Michael, Lark L Coffey, Nicholas J Loman, Kristian G Andersen

bioRxiv doi: https://doi.org/10.1101/383513

Manual

Manual for iVar is available here.

Insallation

Dependencies

HTSlib
GCC any version after v5.0. Support for C++11 standard required.

Note:

It is highly recommended that samtools also be installed alongside iVar. iVar uses the output of samtools mpileup to call variants and generate consensus sequences. In addition, samtools sort and index commands are very useful to setup a pipeline using iVar.

Installing via conda

iVar is available on bioconda. To install conda, please use the miniconda package. After intalling conda please add the following channels,

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

To install iVar,

conda install ivar

Installing via homebrew

iVar can be installed using Homebrew.

brew install brewsci/bio/ivar

Installing on Mac

Installing build tools

Xcode from Apple is required to compile iVar (and other tools) from source. If you don't want to install the full Xcode package from the AppStore, you can install the Xcode command line tools,

xcode-select --install

GNU Autotools is required to compile iVar from source.

To install Autotools using homebrew please use the command below,

brew install autoconf automake libtool

HTSlib installed using conda

HTSlib can be installed with conda using the command,

conda install -c bioconda htslib

The conda binary is by default installed at /opt/. You can check the installation location by running the following command,

which conda

The output of the command will be in this format - /opt/conda/bin/conda or /opt/anaconda2/bin/conda or /opt/anaconda3/bin/conda depending on whether you installed miniconda or anaconda.

If the output is for example, /opt/conda/bin/conda, then you can add the path to the lib folder to $LD_LIBRARY_PATH using the command below. You can add this to your ~/.bash_profile or ~/.bashrc to avoid rerunning the command everytime a new bash session starts.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/lib

HTSlib installed from source

Installation instructions and downloads for HTSlib can be found at http://www.htslib.org/download/.

If HTSlib is installed in a non standard location, please add the following to your .bash_profile so that iVar can find HTSlib dynamic libraries during runtime.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/hts/lib/folder

Installing iVar

To install iVar, run the following commands.

./autogen.sh
./configure
make
make install

If HTSlib was installed using conda, please run the following commands by supplying the prefix to the bin folder of the conda binary.

The prefix to the bin folder can be found using the command which conda. The output of the command will be in this format - /opt/conda/bin/conda or /opt/anaconda2/bin/conda or /opt/anaconda3/bin/conda depending on whether you installed miniconda or anaconda. For example, if the output of the command is /opt/conda/bin/conda, the prefix to the htslib bin folder will be /opt/conda. This can be supplied to ./configure --with-hts=/opt/conda.

./autogen.sh
./configure --with-hts=/prefix/to/bin/folder/with/HTSlib
make
make install

If HTSlib was installed in a non standard location, please run the following commands,

./autogen.sh
./configure --with-hts=/prefix/to/bin/folder/with/HTSlib
make
make install

To test installation just run, ivar version and you should get the following output,

iVar version 1.0

Please raise issues and bug reports at https://github.com/andersen-lab/ivar/

Installing on Linux

Installing build tools

GNU Autotools is required to compile iVar from source.

To install Autotools using APT please use the command below,

apt-get install autotools-dev

HTSlib installed using conda

HTSlib can be installed with conda using the command,

conda install -c bioconda htslib

The conda binary is by default installed at /opt/. You can check the installation location by running the following command,

which conda

The output of the command will be in this format - /opt/conda/bin/conda or /opt/anaconda2/bin/conda or /opt/anaconda3/bin/conda depending on whether you installed miniconda or anaconda.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/lib

HTSlib installed from source

Installation instructions and downloads for HTSlib can be found at http://www.htslib.org/download/.

If HTSlib is installed in a non standard location, please add the following to your .bash_profile so that iVar can find HTSlib dynamic libraries during runtime.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/hts/lib/folder

Installing iVar

To install iVar, run the following commands.

./autogen.sh
./configure
make
make install

If HTSlib was installed using conda, please run the following commands by supplying the prefix to the bin folder of the conda binary.

./autogen.sh
./configure --with-hts=/prefix/to/bin/folder/with/HTSlib
make
make install

If HTSlib was installed in a non standard location, please run the following commands,

./autogen.sh
./configure --with-hts=/prefix/to/bin/folder/with/HTSlib
make
make install

To test installation just run, ivar version and you should get the following output,

iVar version 1.0

Please raise issues and bug reports at https://github.com/andersen-lab/ivar/

Running from Docker

iVar can also be run via Docker. Pull the docker image from Docker Hub using the following command,

docker pull andersenlabapps/ivar

This docker image contains all the required dependencies to run iVar and the [pipelines](@ref cookbookpage) developed using iVar. You will have to attach a docker volume to get data into the docker container. Instructions to do so are in the Docker docs.

iVar on Docker Hub

Contact

For bug reports please email gkarthik[at]scripps.edu or raise an issue on Github.

Acknowledgements

This work was supported in part by NIH grants U19AI135995, R21AI137690, and UL1TR002550.

ivar's People

Contributors

Stargazers

Watchers

ivar's Issues

Multiple primer sequences at same start site

Make ivar primer sequence aware instead of trimming just based on positions. This will help differentiate between reads that originate at the same start site but with different primer sequences. This will allow ivar to handle cases where the virus has very high diversity.

[E::bam_read1] CIGAR and query sequence lengths differ

Describe the bug
A clear and concise description of what the bug is.

samtools sort is failing on output of ivar trim with v1.2.1 of iVar on Bioconda. This wasnt an issue with v1.2. Apparently, Conda will resolve the environment to contain v1.2.1 even though I had explicitly specified v1.2. Since this environment is automatically built within a Docker container whenever we merge to dev/master it took a bit of work figuring out why our pipeline CI tests were failing 😅

Caused by:
  Process `IVAR_TRIM (SAMPLE3_SE)` terminated with an error exit status (1)

Command executed:

  samtools view -b -F 4 SAMPLE3_SE.sorted.bam > SAMPLE3_SE.mapped.bam
  samtools index SAMPLE3_SE.mapped.bam

  ivar trim \
      -i SAMPLE3_SE.mapped.bam \
      -e \
      -b nCoV-2019.artic.V1.bed \
      -p SAMPLE3_SE.trim > SAMPLE3_SE.trim.ivar.log

  samtools sort -@ 2 -o SAMPLE3_SE.trim.sorted.bam -T SAMPLE3_SE.trim SAMPLE3_SE.trim.bam
  samtools index SAMPLE3_SE.trim.sorted.bam
  samtools flagstat SAMPLE3_SE.trim.sorted.bam > SAMPLE3_SE.trim.sorted.bam.flagstat
  samtools idxstats SAMPLE3_SE.trim.sorted.bam > SAMPLE3_SE.trim.sorted.bam.idxstats
  samtools stats SAMPLE3_SE.trim.sorted.bam > SAMPLE3_SE.trim.sorted.bam.stats

Command exit status:
  1

Command output:
  (empty)

Command error:
  [E::bam_read1] CIGAR and query sequence lengths differ for M03352:174:000000000-J3R29:1:1119:12450:3784
  samtools sort: truncated file. Aborting

The CIGAR string for that read is indeed astronomically large, and definitely not a product of long-read sequencing!

> samtools view SAMPLE3_SE.trim.bam | grep "M03352:174:000000000-J3R29:1:1119:12450:3784"
M03352:174:000000000-J3R29:1:1119:12450:3784    16      NC_045512.2     4967    22      48S173246631M   *       0       0       CTTCTTTCTTTGAGAGAAGTGAGGACTTGTTCTTACCTTCTTTTCCA      GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC AS:i:54 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:27 YT:Z:UU XA:i:32

To Reproduce
Steps to reproduce the behavior:

All files required to reproduce the error are in the attached tar archive. Please see command.sh for the commands used.
ivar_bug.tar.gz

Typo


           -t    Minimum fration of files required to contain the same variant. Specify value within [0,1]. (Default: 1)

fration ⚔️

is ivar getmasked broken at the current version?

Hi,
I just tested ivar getmasked at versions 1.2.2 and 1.2.1 and the tool seems completely broken!
Instead of reporting primer pairs, it simply lists primers with mismatches.

I checked against version 1.0.1 and that one works as expected. Seems something bad has happened to the tool in between?

Can you confirm that observation?

Document altered primer trimming rules for PE data in v1.2.2

@gkarthik after noticing drastic changes in the number of trimmed primer seqs between ivar 1.2.1 and 1.2.2, I came across your comment in #42 (comment) stating that version 1.2.2 is now taking primer strand information into account.

Given how big the impact of this change seems to be, I would say it deserves some proper documentation. At least it should be highlighted in the release notes.

Also, just to make sure I understand the motivation behind this change: the idea is that you want to trim the primers from reads derived from the corresponding amplicon, but not from reads derived from overlapping (second-pool) amplicons that happen to end at those primer binding sites?
So a substantially reduced number of trimmed primer sequences upon a version upgrade (a bit alarming at first glance) should actually be considered an improvement?

Generate diversity metrics

ivar can accept a .gtf file to identify CDS and then generate diversity metrics.

Suggestion by: Dr. James Weger (Lab website)

Print out minimum depth

When running ivar consensus with the minimum depth option -m and the threshold (-t), the output to stdout does not provide the minimum depth.

samtools mpileup -B -A -d 1000000 -Q 0 input.bam | ivar consensus -p output_consensus -n N -t 0.6 -m 10
[mpileup] 1 samples in 1 input files
Minimum Quality: 20
Threshold: 0.6
Minimum depth: 

Regions with depth less than minimum depth covered by: N

The program is filtering based on depth so is working properly but first thought that it was not filtering based on depth due to the lack of info on stdout.

Combine consensus and variants

It is inefficient to run both of these, when they do nearly the same thing, and both use the same mpileup.

Can you merge them?

Invalid subcommands aren't reported clearly

As you can see i made a typo in the subcommand.
I would have expected to see an error Unknown command 'consenus
Eventually figured it out and posted this :)

% ivar consenus

Usage:  ivar [command <trim|variants|filtervariants|consensus|getmasked|removereads|version|help>]

        Command       Description
           trim       Trim reads in aligned BAM file
       variants       Call variants from aligned BAM file
 filtervariants       Filter variants across replicates or samples
      consensus       Call consensus from aligned BAM file
      getmasked       Detect primer mismatches and get primer indices for the amplicon to be masked
    removereads       Remove reads from trimmed BAM file
        version       Show version information

To view detailed usage for each command type `ivar <command>`

Insertions called as "+N" instead of the corresponding nucleotide

I observed that some insertions in .tsv files (called as "+T" in this example), in consensus files appear as "+N". This happens even if providing the reference genome to mpileup. The command line I used with ivar 1.2:

samtools mpileup -aa -A --reference Sars-Cov-2_REFERENCE.fasta -d 0 -B -Q 0 COV000517.trim.sort.bam | ivar consensus -p COV000517 -q 20 -t 0.7 -m 30 -n N

I thought that maybe it could be related to low depths, so I repeated using - as low-depth character but again, a +N appeared.

I thought that maybe it could be related to quality filters, so I repeated setting -q 0 and then something even funnier happened: this and other indels disappeared.
I attach an image of the consensus genomes obtained with the different options used:

The .tsv file for that position using our ivar variants parameters (-q 20 -t 0.05 -m 20):

MN908947.3      9604    A       +T      2608    973     67      1313    0       20      0.277766        4727

And for (-q 0 -t 0):

MN908947.3      9604    A       T       4718    2355    37      8       4       36      0.00169241      4727
MN908947.3      9604    A       C       4718    2355    37      1       1       0       0.000211551     4727
MN908947.3      9604    A       +T      4718    2355    37      1313    0       0       0.277766        4727

-Werror=maybe-uninitialized errors in `conda build` (GCC 7.3) of test_unpaired_trim.cpp

conda build current uses GCC 7.3.0. This causes an error when doing make check, as follows:

test_unpaired_trim.cpp: In function 'int main()':
test_unpaired_trim.cpp:31:10: error: 't.cigar_::nlength' may be used uninitialized in this function [-Werror=maybe-uninitialized]
   cigar_ t;
          ^
test_unpaired_trim.cpp:128:9: error: 't.cigar_::cigar' may be used uninitialized in this function [-Werror=maybe-uninitialized]
       t = condense_cigar(t.cigar, t.nlength);
       ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I think this is because t is assigned a value inside a conditional (e.g. at line 81) but used outside that conditional (.e.g at line 113 and elsewhere).

Update version number to 1.0.1

ivar version
iVar version 1.0

Please raise issues and bug reports at https://github.com/andersen-lab/ivar/

Read artic "BED" format

iVar seg faults (!!) when reading artic BED format.

Solution: Add artic BED format reader. Refer artic-network/artic-ncov2019#12

missing #include <algorithm>

I had to add
#include <algorithm>
to primer_bed.h
in order to avoid the compilation error:
‘all_of’ is not a member of ‘std’

ivar 1.1 is now in homebrew

brew install brewsci/bio/ivar

feel free to add this to the docs for installing

Strand bias filter.

ivar already stores forward and reverse allele counts. Compute fisher's test

Suggested by @mpauthner

ivar not trimming overlapping sets of primers correctly

Hi,
in #49 (comment) the intended behavior of ivar trim for reads amplified using overlapping sets of primers is described as:

Starting with v1.2, ivar handles overlapping primers by taking minimum of start pos of primers for reverse reads and maximum of end position of primers for forward reads. So it ensures that in case of overlapping primers, the innermost sequence is taken.

I think this is not what is happening though with version 1.2.2, but it seems that only the outer primer sequence gets trimmed instead. Here's a screenshot of an alignment after trimming for illustration (soft-clipped bases are highlighted to show the trimmed parts of each read):

As you can see reads that have the 18_LEFT primer sequence trimmed never show trimming of 18_LEFT_alt2.

In fact, when running ivar trim a second time on the already trimmed BAM, this is the stats it outputs:

Results: 
Primer Name	Read Count
nCoV-2019_1_LEFT	0
nCoV-2019_1_RIGHT	0
nCoV-2019_2_LEFT	0
nCoV-2019_2_RIGHT	0
nCoV-2019_3_LEFT	0
nCoV-2019_3_RIGHT	0
nCoV-2019_4_LEFT	0
nCoV-2019_4_RIGHT	0
nCoV-2019_5_LEFT	0
nCoV-2019_5_RIGHT	0
nCoV-2019_6_LEFT	0
nCoV-2019_6_RIGHT	0
nCoV-2019_7_LEFT	195
nCoV-2019_7_LEFT_alt0	0
nCoV-2019_7_RIGHT	0
nCoV-2019_7_RIGHT_alt5	6
nCoV-2019_8_LEFT	0
nCoV-2019_8_RIGHT	0
nCoV-2019_9_LEFT	0
nCoV-2019_9_LEFT_alt4	0
nCoV-2019_9_RIGHT	0
nCoV-2019_9_RIGHT_alt2	4
nCoV-2019_10_LEFT	0
nCoV-2019_10_RIGHT	0
nCoV-2019_11_LEFT	0
nCoV-2019_11_RIGHT	0
nCoV-2019_12_LEFT	0
nCoV-2019_12_RIGHT	0
nCoV-2019_13_LEFT	0
nCoV-2019_13_RIGHT	0
nCoV-2019_14_LEFT	324
nCoV-2019_14_LEFT_alt4	0
nCoV-2019_14_RIGHT	0
nCoV-2019_14_RIGHT_alt2	0
nCoV-2019_15_LEFT	0
nCoV-2019_15_LEFT_alt1	1
nCoV-2019_15_RIGHT	0
nCoV-2019_15_RIGHT_alt3	16
nCoV-2019_16_LEFT	0
nCoV-2019_16_RIGHT	0
nCoV-2019_17_LEFT	0
nCoV-2019_17_RIGHT	0
nCoV-2019_18_LEFT	0
nCoV-2019_18_LEFT_alt2	42
nCoV-2019_18_RIGHT	0
nCoV-2019_18_RIGHT_alt1	0
nCoV-2019_19_LEFT	0
nCoV-2019_19_RIGHT	0
nCoV-2019_20_LEFT	0
nCoV-2019_20_RIGHT	0
nCoV-2019_21_LEFT	0
nCoV-2019_21_LEFT_alt2	0
nCoV-2019_21_RIGHT	0
nCoV-2019_21_RIGHT_alt0	15
nCoV-2019_22_LEFT	0
nCoV-2019_22_RIGHT	0
nCoV-2019_23_LEFT	0
nCoV-2019_23_RIGHT	0
nCoV-2019_24_LEFT	0
nCoV-2019_24_RIGHT	0
nCoV-2019_25_LEFT	0
nCoV-2019_25_RIGHT	0
nCoV-2019_26_LEFT	0
nCoV-2019_26_RIGHT	0
nCoV-2019_27_LEFT	0
nCoV-2019_27_RIGHT	0
nCoV-2019_28_LEFT	0
nCoV-2019_28_RIGHT	0
nCoV-2019_29_LEFT	0
nCoV-2019_29_RIGHT	0
nCoV-2019_30_LEFT	0
nCoV-2019_30_RIGHT	0
nCoV-2019_31_LEFT	0
nCoV-2019_31_RIGHT	0
nCoV-2019_32_LEFT	0
nCoV-2019_32_RIGHT	0
nCoV-2019_33_LEFT	0
nCoV-2019_33_RIGHT	0
nCoV-2019_34_LEFT	0
nCoV-2019_34_RIGHT	0
nCoV-2019_35_LEFT	0
nCoV-2019_35_RIGHT	0
nCoV-2019_36_LEFT	0
nCoV-2019_36_RIGHT	0
nCoV-2019_37_LEFT	0
nCoV-2019_37_RIGHT	0
nCoV-2019_38_LEFT	0
nCoV-2019_38_RIGHT	0
nCoV-2019_39_LEFT	0
nCoV-2019_39_RIGHT	0
nCoV-2019_40_LEFT	0
nCoV-2019_40_RIGHT	0
nCoV-2019_41_LEFT	0
nCoV-2019_41_RIGHT	0
nCoV-2019_42_LEFT	0
nCoV-2019_42_RIGHT	0
nCoV-2019_43_LEFT	0
nCoV-2019_43_RIGHT	0
nCoV-2019_44_LEFT	0
nCoV-2019_44_LEFT_alt3	27
nCoV-2019_44_RIGHT	0
nCoV-2019_44_RIGHT_alt0	877
nCoV-2019_45_LEFT	713
nCoV-2019_45_LEFT_alt2	0
nCoV-2019_45_RIGHT	0
nCoV-2019_45_RIGHT_alt7	86
nCoV-2019_46_LEFT	0
nCoV-2019_46_LEFT_alt1	0
nCoV-2019_46_RIGHT	0
nCoV-2019_46_RIGHT_alt2	0
nCoV-2019_47_LEFT	0
nCoV-2019_47_RIGHT	0
nCoV-2019_48_LEFT	0
nCoV-2019_48_RIGHT	0
nCoV-2019_49_LEFT	0
nCoV-2019_49_RIGHT	0
nCoV-2019_50_LEFT	0
nCoV-2019_50_RIGHT	0
nCoV-2019_51_LEFT	0
nCoV-2019_51_RIGHT	0
nCoV-2019_52_LEFT	0
nCoV-2019_52_RIGHT	0
nCoV-2019_53_LEFT	0
nCoV-2019_53_RIGHT	0
nCoV-2019_54_LEFT	0
nCoV-2019_54_RIGHT	0
nCoV-2019_55_LEFT	0
nCoV-2019_55_RIGHT	0
nCoV-2019_56_LEFT	0
nCoV-2019_56_RIGHT	0
nCoV-2019_57_LEFT	0
nCoV-2019_57_RIGHT	0
nCoV-2019_58_LEFT	0
nCoV-2019_58_RIGHT	0
nCoV-2019_59_LEFT	0
nCoV-2019_59_RIGHT	0
nCoV-2019_60_LEFT	0
nCoV-2019_60_RIGHT	0
nCoV-2019_61_LEFT	0
nCoV-2019_61_RIGHT	0
nCoV-2019_62_LEFT	0
nCoV-2019_62_RIGHT	0
nCoV-2019_63_LEFT	0
nCoV-2019_63_RIGHT	0
nCoV-2019_64_LEFT	0
nCoV-2019_64_RIGHT	0
nCoV-2019_65_LEFT	0
nCoV-2019_65_RIGHT	0
nCoV-2019_66_LEFT	0
nCoV-2019_66_RIGHT	0
nCoV-2019_67_LEFT	0
nCoV-2019_67_RIGHT	0
nCoV-2019_68_LEFT	0
nCoV-2019_68_RIGHT	0
nCoV-2019_69_LEFT	0
nCoV-2019_69_RIGHT	0
nCoV-2019_70_LEFT	0
nCoV-2019_70_RIGHT	0
nCoV-2019_71_LEFT	0
nCoV-2019_71_RIGHT	0
nCoV-2019_72_LEFT	0
nCoV-2019_72_RIGHT	0
nCoV-2019_73_LEFT	0
nCoV-2019_73_RIGHT	0
nCoV-2019_74_LEFT	0
nCoV-2019_74_RIGHT	0
nCoV-2019_75_LEFT	0
nCoV-2019_75_RIGHT	0
nCoV-2019_76_LEFT	0
nCoV-2019_76_LEFT_alt3	1
nCoV-2019_76_RIGHT	0
nCoV-2019_76_RIGHT_alt0	69
nCoV-2019_77_LEFT	0
nCoV-2019_77_RIGHT	0
nCoV-2019_78_LEFT	0
nCoV-2019_78_RIGHT	0
nCoV-2019_79_LEFT	0
nCoV-2019_79_RIGHT	0
nCoV-2019_80_LEFT	0
nCoV-2019_80_RIGHT	0
nCoV-2019_81_LEFT	0
nCoV-2019_81_RIGHT	0
nCoV-2019_82_LEFT	0
nCoV-2019_82_RIGHT	0
nCoV-2019_83_LEFT	0
nCoV-2019_83_RIGHT	0
nCoV-2019_84_LEFT	0
nCoV-2019_84_RIGHT	0
nCoV-2019_85_LEFT	0
nCoV-2019_85_RIGHT	0
nCoV-2019_86_LEFT	0
nCoV-2019_86_RIGHT	0
nCoV-2019_87_LEFT	0
nCoV-2019_87_RIGHT	0
nCoV-2019_88_LEFT	0
nCoV-2019_88_RIGHT	0
nCoV-2019_89_LEFT	0
nCoV-2019_89_LEFT_alt2	5
nCoV-2019_89_RIGHT	0
nCoV-2019_89_RIGHT_alt4	211
nCoV-2019_90_LEFT	0
nCoV-2019_90_RIGHT	0
nCoV-2019_91_LEFT	0
nCoV-2019_91_RIGHT	0
nCoV-2019_92_LEFT	0
nCoV-2019_92_RIGHT	0
nCoV-2019_93_LEFT	0
nCoV-2019_93_RIGHT	0
nCoV-2019_94_LEFT	0
nCoV-2019_94_RIGHT	0
nCoV-2019_95_LEFT	0
nCoV-2019_95_RIGHT	0
nCoV-2019_96_LEFT	0
nCoV-2019_96_RIGHT	0
nCoV-2019_97_LEFT	0
nCoV-2019_97_RIGHT	0
nCoV-2019_98_LEFT	0
nCoV-2019_98_RIGHT	0

Clearly, further trimming occurs, and occurs exclusively for amplicons with complex primer sets.

terminate called after throwing an instance of 'std::invalid_argument'

using 1.1 and htslib 1.10.2

% ivar trim -i ivar.bam -b nCoV-2019.scheme.bed -p out

terminate called after throwing an instance of 'std::invalid_argument'
 what():  stoi
Aborted (core dumped)

Maybe it's the BED:

% cut -f1-3 nCoV-2019.scheme.bed > new.bed

ivar trim -i ivar.bam -b nww.bed -p out
Number of references: 1
Reference Name: MN908947
Reference Length: 29903
Using Region: MN908947
Sorted By Coordinate
Sorted By Query Name
Segmentation fault (core dumped)

Any ideas?

Compilation warnings

I'm getting a lot of compilation warnings (with g++ 8.1.0) mostly concerning comparison of signed with unsigned variables, along with unused variables.

If I were to add -Werror to CXXFLAGS and fix the warnings, would you consider a pull request?

feeding ivar pre-stitched PE reads

@mcroxen and i were wondering
if you pre-stitch your PE reads (eg. FLASH, bbmerge.sh, fastp --merge)
And then align these longer SE reads and feed to ivar
Will it still work in terms of primer trimming etc?

-m 1 and code m=0 do not match


  g_args.min_depth = 0;
<https://github.com/andersen-lab/ivar|andersen-lab/ivar>andersen-lab/ivar | Added by GitHub


           -m    Minimum depth to call consensus(Default: 1)

Trim all primers in cases with overlapping primer sequences

Artic V3 scheme has overlapping primers. In these cases iVar will trim the first primer it encounters.

Change behavior to allow trimming of all primers at positions of overlapping primers.

Export variants to VCF format

ivar variants outputs results only in custom .tsv format so unusable with other tools.

Allow option to export variants in VCF v4.3 format.

missing data file for tests

tests/check_primer_trim fails because we're missing the data fie
"../data/test.unmapped.sorted.bam"

Indels that not met minimum requierements called in consensus

Describe the bug
Hi I think Ivar (v1.2) is adding to consensus indels that do not meet the requirements set by user.

In a sample with an insertion in the consensus genome, when I see that insertion in the tsv file, it has 52% freq. However I have set minimum freq of 70% for consensus calling.

This happened for several other samples.

However not all insertions are added to the consensus.
For example, in another sample I saw the expected behavior. That sample had many indels in the tsv file, with the PASS flag, but only one, with 82% freq (thus passing the 70% threshold) was included in the consensus.

Document handling of overlapping primer pairs

I tried to find best-practice instructions for dealing with overlapping primer pairs like the ones found in the Artic V3 primer scheme for SARS-CoV-2, but couldn't really work it out.

In a case like this:

MN908947.3	4054	4077	nCoV-2019_14_LEFT	60	+
MN908947.3	4044	4068	nCoV-2019_14_LEFT_alt4	60	+
MN908947.3	4428	4450	nCoV-2019_14_RIGHT	60	-
MN908947.3	4402	4424	nCoV-2019_14_RIGHT_alt2	60	-

which primers will the current version of ivar (1.2.2) trim? All of them? The first it encounters in the BED file? The inner/outermost pair?
I found #18, but I don't understand how a change in the ivar tests (f274d3a) is supposed to affect this.

Similarly for ivar getmasked and ivar removereads: how should a set of primers like the above be translated into a proper tsv of primer pairs? My feeling would be that you want to remove reads for which binding of any of the primers involved in their generation was affected by a variant, but what would be the closest approximation of expressing this in terms of primer pairs?

Any hints would be appreciated ❤️

[warning] samtools mpileup option `F` is functional, but deprecated. Please switch to using bcftools mpileup in future.

Following the manual, these warnings are seen:

samtools mpileup -A -d 300000 -Q 0 -F 0 ivar.sorted.bam | ivar consensus -p ivar -m 1
[warning] samtools mpileup option `F` is functional, but deprecated. Please switch to using bcftools mpileup in future.

Deciding on overlapping variants for consensus

Testing out iVar with some tougher ONT samples, I've run in to an issue with overlapping annotations. Here is the ivar variants output for this region:

When using ivar consensus to apply the consensus, the one base deletion is applied, when using thresholds 0, 0.2, and 0.5--an 'N' is applied at thresholds 0.9 and 1. Despite the DP information, I know that the deletion is an ONT sequencing artifact (the G->T mutation creates a long string of Ts), and that there is a true SNV there.

How are overlapping indels decided upon as the "winning" variant? Can this be adjusted, particularly for ONT variants like this?

Also, thanks for all your hard work on this tool!

GFF file downloaded from NCBI not recognized as GFF3 format

I was using ivar to call variants in a bam file and used a GFF file downloaded from NCBI for feature information.

My command (with my sample bam and using MN908947.3 as a reference with corresponding gff file from NCBI:

samtools mpileup -A -d 600000 -B -Q 0 sample.primertrim.sorted.bam | ivar variants -p sample.variants -q 20 -t 0 -r MN908947.reference.fasta -g GCF_009858895.2_ASM985889v3_genomic.gff

The command executed, but I got the following output:

Tue Mar 31 16:44:33 MDT 2020
[mpileup] 1 samples in 1 input files
GFF file is not in GFF3 file format!
GFF file is not in GFF3 file format!
GFF file is not in GFF3 file format!
GFF file is not in GFF3 file format!

OS: Centos7
ivar version: 1.1 (conda install)

filtervariants -t 1 default?

-t    Minimum fration of files required to contain the same variant. Specify value within [0,1]. (Default: 1)

Does this mean you only keep a variant if it is in ALL (100%) of files?
If it's the same in all then it's not a variant?

Refactor awk script into C++

Refactor get_common_variants.sh.

Two reasons:

Not cool.
Package into a single binary
Causes issues if mawk implementation of awk is used - issue

ivar variants -m 0 default?

Is this correct?
(-m 0 doesn't make sense, but maybe it means no min depth)

ivar variants
           -m    Minimum read depth to call variants (Default: 0)

ivar trim accept stdin ?

ideally one should be able to do this:

read_aligner ref R1.fq R2.fq | ivar trim - - | samtools sort | something > answer.out

can ivar trim -i /dev/stdin work or is random access needed?
i think a raw SAM (and the REF + BED) has all the info you need to trim appropriately

REF_DP + ALT_DP != TOTAL_DP

I've noticed that in the tsv variant file REF_DP + ALT_DP is usually not equal to TOTAL_DP, and sometimes is quite a bit off. For example, in the attached tsv file at position 8376
TOTAL_DP = 1527
REF_DP + ALT_DP = 2621
ALT_FREQ = 0.717092
ALT_DP / TOTAL_DP = 0.717092
ALT_DP / (REF_DP + ALT_DP) = 0.417779

variants.txt

Link to manual is not obvious

For quite a while i thought this tool had no documentation.

But then I realised the word Manual in the README.md was not a subheading but a link to a wonderful set of detailed information!

Would it be possible to make this more prominent?
Maybe a sentence under the word "Manual" saying where all the docs are?

Isolate quality trimming

For item 1 in broadinstitute/viral-pipelines#28

Minimum read threshold for variant calling

Suggested by @mpauthner

Corrupt primer-trimmed bam file

Describe the bug
ivar trim creates a corrupt primer-trimmed bam file:

samtools view 178Ct21-C74045848_S178.primertrim.sorted.bam
[E::sam_format1] Corrupted aux data for read M02568:512:000000000-J52NT:1:1103:25289:20956
samtools view: writing to standard output failed: Invalid argument

To Reproduce
Command to generate the bam file:
ivar trim -i $sampleID.sorted.bam -b /seq/Development/Projects/covid/References/ARTIC-1k-primer.bed -p $sampleID.primertrim
samtools sort -@ 8 $sampleID.primertrim.bam -o $sampleID.primertrim.sorted.bam

Expected behavior
samtools view should behave normally

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):
Redhat linux 7

uname -a
Linux sjlsd0018.us.qdx.com 3.10.0-1062.7.1.el7.x86_64 #1 SMP Wed Nov 13 08:44:42 EST 2019 x86_64 x86_64 x86_64 GNU/Linux

"Could not retrieve index file" but still works?

I did not index prim.bam and got the error, but it all still worked?
Does it need to be indexed?
Related to #28 stdin.

ivar  trim -i prim.bam -b ARTIC-V1.bed -p prim-ivar
Found 196 primers in BED file
[E::idx_find_and_load] Could not retrieve index file for 'prim.bam'

Number of references in file: 1
MN908947.3
Using Region: MN908947.3

Found 1178203 mapped reads
Found 5122 unmapped reads
Sorted By Coordinate
-------
Processed 10% reads ... 
Processed 20% reads ... 
Processed 30% reads ... 
Processed 40% reads ... 
Processed 50% reads ...

ivar trim to stdout ?

currently need to do ivar trim -i in.bam -p out to get out.bam
it seems to only write 1 file, the bam?
can you support stdout please so I can pipe it to the next stage?

eg. if no -p do this?

ivar trim -i in.bam | samtools sort -n | samtools fastq > reads.fq

Allow naming of consensus ID

% ivar consensus -i myseqID  -p out ...

% head -n 1 out.fa
>myseqID
ATGCTATGCGATATATAT

Check BED format

When reading a BED file and format is off, iVar throws an error and exists. Refer #7

Solution: Add format check for BED file and show useful error message.

Document indel format for ivar variants

Indels are shows as [+-][ATGC]+. Make this format clear in ivar variants docs.

consensus: "Minimum depth:" value never printed

It isn't prining the -m 1 value i provided?

samtools mpileup -A -d 300000 -Q 0 -F 0 ivar.sorted.bam | ivar consensus -p ivar -m 1

Minimum Quality: 20
Threshold: 0
Minimum depth: 
Regions with depth less than minimum depth covered by: -

Read alignment off by one after primer trimming

Describe the bug

Some read alignments are shifted after ivar trim (v1.2) compared to untrimmed read alignments from Minimap2 (v2.17-r941) and NGMLR (v0.2.7).

To Reproduce

I aligned the Nanopore reads to the reference genome (MN908947.3) with Minimap2 (and NGMLR), trimmed primer sequences and low quality bases

$ minimap2 -ax map-ont -t16 ref.fa reads.fq | samtools sort -@16 | samtools view -F4 -b -o reads.sort.bam
$ ivar trim -b artic-ncov2019/primer_schemes/nCoV-2019/V2/nCoV-2019.bed -p reads.trim -i reads.sort.bam -q 1 -m 30 -s 4

Note: for some reason many of the reads had very low average quality scores for this run and we were mostly interested in trimming the primer sequences.

I viewed and compared alignments in IGV (v2.8.2 installed from BioConda) and noticed some big differences between the untrimmed and trimmed alignments of some reads in certain regions.

Top alignment is untrimmed Minimap2 and bottom is iVar trimmed alignment:

For example, here's the same read at position MN908947.3:29,383 where in the original alignment the base is G (QV=27) and in the trimmed alignment the base is A (QV=11):

Before ivar trim read alignment information in IGV

Read name = 04853829-6653-42bb-b303-351fbd160119
Read length = 515bp
----------------------
Mapping = Primary @ MAPQ 60
Reference span = MN908947.3:29,363-29,646 (+) = 284bp
Cigar = 89S15M1D11M1I3M1D20M1I14M5I3M2I2M2I18M1I33M1D7M1I7M2I12M1I28M2I8M5I3M1D12M2I3M4I12M1I6M2D5M1D23M1D31M120S
Clipping = Left 89 soft; Right 120 soft
----------------------
s1 = 79
s2 = 0
NM = 51
AS = 314
de = 0.1145
rl = 0
cm = 9
nn = 0
tp = P
ms = 314Location = MN908947.3:29,383
Base = G @ QV 27

After ivar trim

Read name = 04853829-6653-42bb-b303-351fbd160119
Read length = 515bp
----------------------
Mapping = Primary @ MAPQ 60
Reference span = MN908947.3:29,380-29,647 (+) = 268bp
Cigar = 104S11M1I3M1D20M1I14M5I3M2I2M2I18M1I33M1D7M1I7M2I12M1I28M2I8M5I3M1D12M2I3M4I12M1I6M2D5M1D23M1D31M120S
Clipping = Left 104 soft; Right 120 soft
----------------------
s1 = 79
s2 = 0
NM = 51
AS = 314
de = 0.1145
rl = 0
cm = 9
nn = 0
tp = P
ms = 314
Hidden tags: XALocation = MN908947.3:29,383
Base = A @ QV 11

It looks like the iVar trimmed alignment has been shifted when I zoom in on the read in both untrimmed and trimmed alignments:

Note that the position is incremented by 1 in the trimmed read alignment relative to the untrimmed alignment.

Expected behavior

I expected that the alignments should remain the same aside from primer and quality trimming.

Workstation:

OS: Arch Linux x86_64 (Kernel Release: 5.5.9-arch1-2)
CPU: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
iVar Version 1.2 from BioConda

Additional context

Conda env YAML ($ conda env export) for running iVar, Minimap2, NGMLR and samtools:

channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=0_gnu
  - bzip2=1.0.8=h516909a_2
  - ca-certificates=2020.4.5.1=hecc5488_0
  - curl=7.69.1=h33f0ec9_0
  - htslib=1.9=h4da6232_3
  - krb5=1.17.1=h2fd8d38_0
  - libcurl=7.69.1=hf7181ac_0
  - libdeflate=1.5=h516909a_0
  - libedit=3.1.20170329=hf8c457e_1001
  - libgcc-ng=9.2.0=h24d8f2e_2
  - libgomp=9.2.0=h24d8f2e_2
  - libssh2=1.8.2=h22169c7_2
  - libstdcxx-ng=9.2.0=hdf63c60_2
  - minimap2=2.17=h8b12597_1
  - ncurses=6.1=hf484d3e_1002
  - ngmlr=0.2.7=he860b03_1
  - openssl=1.1.1f=h516909a_0
  - samtools=1.9=h10a08f8_12
  - sniffles=1.0.11=hdbcaa40_1
  - tclap=1.2.1=h470a237_1
  - tk=8.6.10=hed695b0_0
  - xz=5.2.5=h516909a_0
  - zlib=1.2.11=h516909a_1006

consensus with ambiguities

Hi @gkarthik,

I was using ivar consensus with a -t set to 0.5 and I increased it to 0.75 to only have the major variants within my consensus but it created ambiguous bases (like Y or K for instance). If I don't want ambiguous bases should I, instead, lower the -t to 0.25 (and still have the variants with a support of at least 75%) or is there another way?

Thanks,
Paul

ivar trim doesn't fail when providing an improper BED file

Describe the bug
Previously in version 1.2, if an improper BED file was provided, then ivar trim would fail. However in 1.2.2, if an improper BED file is provided, then ivar trim won't fail, it will simply skip trimming, but still output a bam file, as if no -b option had been provided, potentially causing the error to go unnoticed. Related to #35

To Reproduce
Steps to reproduce the behavior:

Download the BED file from https://github.com/artic-network/artic-ncov2019/blob/master/primer_schemes/nCoV-2019/V3/nCoV-2019.bed. The 5th column is not an integer so ivar trim will consider it invalid.
Run ivar trim using this bed file. ivar trim will complain about the BED file format:

iVar uses the standard 6 column BED format as defined here - https://genome.ucsc.edu/FAQ/FAQformat.html#format1.
It requires the following columns delimited by a tab: chrom, chromStart, chromEnd, name, score, strand

But then proceed anyways, resulting in no reads being trimmed:

Trimmed primers from 0% (0) of reads.
0.73% (10132) of reads were quality trimmed below the minimum length of 30 bp and were not writen to file.
99.27% (1376097) of reads started outside of primer regions. Since the -e flag was given, these reads were written to file.

GFF3 translate in reverse for -ve strand

Current aa translation in variants does not account for translation on -ve strand.

handle unmapped reads in BAM

Would be great if unmapped reads could be left alone and ignored?

% ivar trim -i ivar.bam
Segmentation fault

# remove unmapped reads
% samtools view -F 4 -o ivar.bam ivar2.bam
% ivar trim -i ivar2.bam
# works!

variants & consensus "doing the same thing" ?

Are the variants and consensus tools doing a lot of similar work, or am I missing something?

At first I assume it was trim -> variants -> consensus but then it seems they are independent?

Are indels handled ok?

PS. trim and consensus are exactly the tools i needed, and i am so grateful this exists!

andersen-lab / ivar Goto Github PK

ivar's Introduction

iVar

Manual

Insallation

Dependencies

Installing via conda

Installing via homebrew

Installing on Mac

Installing build tools

HTSlib installed using conda

HTSlib installed from source

Installing iVar

Installing on Linux

Installing build tools

HTSlib installed using conda

HTSlib installed from source

Installing iVar

Running from Docker

Contact

Acknowledgements

ivar's People

Contributors

Stargazers

Watchers

Forkers

ivar's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs