Comments (13)
I think it's difficult to include in the final report at least. There can be various insertions also based on seq errors, used protocols, ...
The question would be also how the newest Nextstrain output looks like in this context?
My last idea was to at least add a column for the totalInsertions
so that one can easily spot weirdos for further checks
from porecov.
In the upcoming release (today or tomorrow) Nextclade/align will output aa insertions. Maybe this is still relevant?
from porecov.
Well we are reporting amino acid changes, so these columns:
"aaSubstitutions"
"aaDeletions"
but there is no "aaInsertions" at this time.
Also the "insertions" column is empty in all the runs I just checked
from porecov.
Yes, I think insertions are rare. Let me check if I can find an example Nextclade output w/ insertions.
from porecov.
Okay, so it seems insertions are only reported on nucleotide level. E.g. I just checked some sequences and selected some different outputs:
totalInsertions insertions
17 21534:A,27373:CTTTCGATCTCT,27380:C,27384:CTC
16 27373:CTTTCGATCTCT,27380:C,27384:CTC
10 9627-9631,20528-20531,22492,28967-28969
28 1569:C,1574:AGAGCTAG,27373:CTTTCGATCTCT,27380:C,27384:CTC,28250:CTG
many matches"""
248 7010-7049,9543-9561,14331-14515,29760-29767
1 28250:G
17 39,19286,29602,29605,29631,29706,29752,29772,29779,29785,29792-29794,29797,29806,29866,29868-29870
It seems the output format can vary a lot. So I am not sure if it is so meaningful to add this. Maybe just information if there are insertions or not (totalInsertions
)?
from porecov.
from porecov.
@replikation @RaverJay actually this is getting now important with Omicron, especially because of an larger insertion in the Spike:
S:R214REPE
Currently, this is just not shown in the report which can lead to misleading results.
Not sure what's best to solve this, I think currently we could still only extract such information from Nextclade.
Another way would be to use our tool covsonar
to basically call all substitutions, insertions and deletions with one tool and include this into the report... https://gitlab.com/s.fuchs/covsonar
To do so, we would need a process that generates the covsonar database for all genomes and then we could easily extract all information.
from porecov.
Yeah this is bad
Problem is, Nextclade output is still missing a 'aaInsertions' column
(there is only: substitutions deletions insertions aaSubstitutions aaDeletions)
We could of course calculate it from 'insertions' though
@replikation what do you think, add or switch to covsonar?
from porecov.
Ahh, we would then need a converter from nt insertions reported by Nextclade to aa insertions. Something like
https://codon2nucleotide.theo.io/
CovSonar might solve that but here the only weak point is that the tool is also under development still (currently people are on a CovSonar2 version) so there might be certain changes. And it's not tested so extensively like Nextclade, etc...
from porecov.
Nextclade might take too long to add this: nextstrain/nextclade#319
Maybe we should implement it ourselves at this time - just 'borrow' the code from codon2nucleotide and translate to AAs
from porecov.
Yes, seems fair to me. Then we still rely on Nextclade that anyway runs and just translate the nt insertions for the report. I thin having the code from https://github.com/theosanderson/codon2nucleotide as a small script in bin
should do the trick?
from porecov.
It's a little bit more complicated than that, see PR
from porecov.
@corneliusroemer thanks for informing us
from porecov.
Related Issues (20)
- add skip scorpio parameter to pangolin HOT 1
- Only calculate NanoPlot after read filtering step HOT 5
- Add new V5 ARTIC primer BED HOT 5
- Medaka step fails in the -profile fastq_test HOT 3
- summary_report.py fails HOT 7
- publish primersitereport from medaka output
- VarSkipV2b primer does not work as expected HOT 7
- Update Medaka to support R10.4.1 models HOT 14
- Update --help to list up-to-date primer schemes that are supported
- MinKNOW/Guppy update needs new model for R10.4.1 5 kHz HOT 6
- Warning when execution report and timeline already exists HOT 1
- The pipeline fails in artic_ncov_wf_artic_medaka HOT 7
- new pangolin table columns HOT 1
- CovarPlot fails w/ custom BED HOT 5
- Process retry in slurm profile HOT 1
- Publish VCF files HOT 2
- Update medaka to get the latest models HOT 1
- Test Freyja Update function HOT 11
- [Question] CI for Variant Calling HOT 5
- Singularity container execution of pangolin crashes with recent nextflow versions HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from porecov.