Comments (3)
idea for workflow:
- start from CDS w/o annotations: either no hits at all or still
hypothetical
- re-search pseudogene candidates against Bakta DB using the following thresholds:
- identity >=80%
- query coverage >=80%
- subject coverage >=40%
- align subject AA sequence against translated CDS region plus
N
bp up-/downstream - look for
frameshifts
orstop codons
from bakta.
Hi there,
So, how bakta last version deal with pseudogenes ?
from bakta.
Hi @amvarani , thanks for reaching out on this. We're actively working on a first pseudogene detection/annotation feature as a default step in the Bakta workflow. We're currently fixing the last little things and look forward to release a new version very soon (i.e. next weeks).
There are different strategies how to detect pseudogenes, the most promising would be to use a closely related genome - however, as this is often not the case in a de novo assembly/annotation workflow, we address this without external genome information.
At this point, Bakta will look for gene residues that could not be annotated (hypothetical proteins) b/c hits for these gene residues only result in Diamond hits with less than 80% subject coverage. In a second more relaxed search Bakta looks for decent hits against the database with subject coverages of less than 80%. These references are then blasted against the 6 frame-translated CDS regions of the hypotheticals plus a 300 bp extension in up- and downward direction. If Bakta finds a conserved homology, it tries to detect indels/mutation and start/stop codon events. If this is the case, the hypothetical gene is annotated as a pseudo gene. Later, we'll extent this approach by taking into account spare genomic regions (w/o annotations), using some information to detect & annotate translational exceptions and so on.
I hope this answers your question. Best regards!
from bakta.
Related Issues (20)
- Is it possible to output RefSeq locus tags? HOT 2
- PNG circos plot legend HOT 3
- Feature suggestions regarding Expert search and others HOT 1
- VirulenceFinder database version out of date HOT 4
- Improve gene symbol annotation of CDS
- tRNAscan-SE fails when writing to custom tmp directory HOT 5
- Introduce simple metagenome mode HOT 1
- Use three-letter code gene symbols for rRNA and tRNA genes
- Upgrade DoriC to v12
- Improve gene symbol annotation of CDS
- singularity run problem with --db option HOT 3
- Adding Userprotein min_identity as command line option HOT 8
- BlastRules and VFDB hits are not displayed at the same time. HOT 5
- Small User Improvements for Bakta HOT 3
- Compatibility .fna output with NCBI Bankit Submission HOT 5
- Bakta can't be installed via conda HOT 1
- update AMRFinderPlus database fails HOT 3
- Bakta_proteins connection to dbxrefs/json-type identifier HOT 2
- --force not working HOT 3
- adding deepsig to conda recipe HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bakta.