Comments (3)
The problem lies within the custom script ivar_varaints_to_vcf.py from the viralrecon pipeline
The script ivar_variants_to_vcf.py makes the original variant table (tsv output from ivar):
REGION POS REF ALT REF_DP REF_RV REF_QUAL ALT_DP ALT_RV ALT_QUAL ALT_FREQ TOTAL_DP PVAL PASS GFF_FEATURE REF_CODON REF_AA ALT_CODON ALT_AA POS_AA
LVE00096_cl5_it1.consensus_bcftools 10324 A -NNNNNNNNNNNNNNNNNNNNNNN 149 97 33 147 0 20 0.986577 149 3.98648e-32 TRUE NA NA NA NA NA NA
LVE00096_cl5_it1.consensus_bcftools 10354 G -NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 188 119 33 147 0 20 0.777778 189 5.86768e-34 TRUE NA NA NA NA NA NA
To
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT LVE00096_cl5_it1
LVE00096_cl5_it1.consensus_bcftools 10324 . ANNNNNNNNNNNNNNNNNNNNNNN A . PASS DP=149 GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ 1:149:97:33:147:0:20:0.986577
LVE00096_cl5_it1.consensus_bcftools 10354 . GNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN G . PASS DP=189 GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ 1:188:119:33:147:0:20:0.777778
from viralgenie.
*Update, the problem isn't the script. Instead it's samtools mpileup not ignoring the reference from time to time. It's running it without one despite being given which results in
SRR11140748_MT192765.1 1 N 3 ^]G^]G^]G FHG
SRR11140748_MT192765.1 2 N 5 TTT^]T^]t FHHHG
SRR11140748_MT192765.1 3 N 5 TTTTt GGHHH
SRR11140748_MT192765.1 4 N 5 TTTTt HHHHH
SRR11140748_MT192765.1 5 N 5 AAAAa HHHHH
SRR11140748_MT192765.1 6 N 5 TTTTt HHHHH
SRR11140748_MT192765.1 7 N 5 AAAAa HGHHH
SRR11140748_MT192765.1 8 N 5 CCCCc HHHHG
SRR11140748_MT192765.1 9 N 5 CCCCc HHHHG
SRR11140748_MT192765.1 10 N 5 TTTTt HHHHH
The same run another time (literally just bash .command.run
)
The output becomes:
SRR11140748_MT192765.1 1 G 3 ^].^].^]. FHG
SRR11140748_MT192765.1 2 T 5 ...^].^], FHHHG
SRR11140748_MT192765.1 3 T 5 ...., GGHHH
SRR11140748_MT192765.1 4 T 5 ...., HHHHH
SRR11140748_MT192765.1 5 A 5 ...., HHHHH
SRR11140748_MT192765.1 6 T 5 ...., HHHHH
SRR11140748_MT192765.1 7 A 5 ...., HGHHH
SRR11140748_MT192765.1 8 C 5 ...., HHHHG
SRR11140748_MT192765.1 9 C 5 ...., HHHHG
SRR11140748_MT192765.1 10 T 5 ...., HHHHH
I'm uncertain how to avoid this behaviour. I'll remove -B
(recaculates the base alignment score) from the arguments
from viralgenie.
Within the documentation of IVAR it's suggested to include -B
:
Note: Please use the
-B
options withsamtools mpileup
to call variants and generate consensus. When a reference sequence is supplied, the quality of the reference base is reduced to 0 (ASCII: !) in the mpileup output. Disabling BAQ with-B
seems to fix this. This was tested in samtools 1.7 and 1.8.
It's also shown that BAQ vastly reduces the number of FP but then also increases the FN. I think in our case for the hyper diverse population. We cannot affort the increase of FN so I'll keep -B
in the default setting.
from viralgenie.
Related Issues (20)
- autodetect paired-end
- Out-of-memory during NETWORK_CLUSTER process on unclassified taxa
- Contig consensus 0 coverage HOT 1
- Include coverage plot in MQC HOT 1
- nf-test
- patch nf-core template
- patch nf-core modules
- Include sspace & cobra for contig extension
- Set default UMI-tools dedup strategy to a non-read counting method
- Vazymolo
- Make a quick start section
- FIx OOM for long contigs
- include option for protein clustering
- Filtering clusters HOT 1
- Make custom annotation database using ncbi virus
- Identify total number of reads after host-removal not just % HOT 1
- Make a conda biocontainer for viralgenie
- Simplify preclustering of contigs up to a certain level HOT 1
- Migrate annotation files to external repo HOT 1
- Refactor readclassifier & preclustering HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from viralgenie.