Comments (6)
Hi @oschwengers ,
I can answer my own question here- BAKTAs ORF preserves the gene neighborhood of the CRISPR array even though PILER-CR cannot find the CRISPR.
Thanks for a great annotation software!
from bakta.
Hi,
Got some more updates. I compared the RS_GCF_000279285.1
results from BAKTA and CRISPRCasFinder. BAKTA finds one CRISPR array in this sample start=602511, stop= 603618
while CRISPRCasFinder finds two (screenshot below).
I think BAKTA is doing a stringent cutoff for finding CRISPR and this is probably leading to missed annotation. I examined the CRISPR array present in the location start=92144, stop=92347
which is missed by BAKTA, and this CRISPR is associated with the Tn7 system!
Any suggestions to decrease/remove the stringency in BAKTA for CRISPR annotation?
from bakta.
Hi @Jigyasa3, thanks a lot for reaching out and this deeper comparison!
Bakta only executes PILER-CR with default parameters and accepts whatever it predicts. Hence, there are no CRISPR-related filters within Bakta. The default parameters of PILER-CR due to its usage are:
Criteria for CRISPR detection, defaults in parentheses:
-minarray <N> Must be at least <n> repeats in array (3).
-mincons <F> Minimum conservation (0.9).
At least N repeats must have identity
>= F with the consensus sequence.
Value is in range 0 .. 1.0.
It is recommended to use a value < 1.0
because using 1.0 may suppress true
arrays due to boundary misidentification.
-minrepeat <L> Minimum repeat length (16).
-maxrepeat <L> Maximum repeat length (64).
-minspacer <L> Minimum spacer length (8).
-maxspacer <L> Maximum spacer length (64).
-minrepeatratio <R> Minimum repeat ratio (0.9).
-minspacerratio <R> Minimum spacer ratio (0.75).
'Ratios' are defined as minlength / maxlength,
thus a value close to 1.0 requires lengths to
be similar, 1.0 means identical lengths.
Spacer lengths sometimes vary significantly, so
the default ratio is smaller. As with -mincons,
using 1.0 is not recommended.
Parameters for creating local alignments:
-minhitlength <L> Minimum alignment length (16).
-minid <F> Minimum identity (0.94).
Of course, the various prediction tools (PILER-CR, CRISPRCasFinder, Minced, etc) produce slighty different results. Sometimes a is better than b and vice versa. Hence, it's hard to tell, which one is right.
I have the feeling that CRISPRCasFinder seems to be better and certainly better maintained than PILER-CR what makes it a very intersting candidate for Bakta. However, we'd need to add a bunch of additional dependencies and also, CRISPRCasFinder requires its own database that we would need to handle somehow. I don't say this is a no-go, but currently I'm a bit reluctant to add all this extra complexity to Bakta.
from bakta.
Hi @Jigyasa3,
not solving your issue, but somehow related. I just merged a new PR #249 improving the CRISPR information. Now, Bakta also provides information on the CRISPR spacer sequences. Maybe this is of interest for you.
from bakta.
So, having thought about this a bit longer, I think this is a regular case with varying outputs of different tools. Hence, I guess, we cannot do anything about that in principle. As explained, both PILER-CR
and MinCED
are not actively maintained anymore but novel tools are too complex having too many dependencies themselves.
Hence, it might be best to maybe use these tools in external dedicated analysis and use this information outside of the actual genome annotation process?
I'm sorry, that I cannot provide any more help here. Thus, I'd close this for now. But please do not hesitate to re-open this or a new issue in any case.
Thanks again and best regards!
from bakta.
Hi @oschwengers,
Thank you for replying and explaining the background PILER-CR
CRISPR annotation.
I am specifically interested in examining the genetic background of CRISPR arrays. For example, some CRISPR arrays are associated with tns proteins (https://pubmed.ncbi.nlm.nih.gov/34845024/). But if the CRISPR array is missed by BAKTA, then the genomic background gets annotated by protein-coding genes.
My question is, if I use CRISPRCasFinder
to annotate CRISPR arrays, can I still use BAKTA to examine the genomic neighborhood of the region of interest? Would the ORFs be correct?
from bakta.
Related Issues (20)
- Fix runtime duration report
- Is it possible to import my own cds fasta file for annotation? HOT 3
- Merge summary files into one excel sheet. HOT 3
- Transfer annotations from similar genome HOT 5
- Add import feature for user-provided regions and/or features HOT 16
- ERROR: Could not detect/read %s version!', command[0] HOT 2
- Increasing genome annotation: integrating StORF-Reporter functionality into bakta HOT 3
- pyrodigal issues with fresh 1.8.2 conda install HOT 5
- produce the --replicons input content based on the flye assembly_info.txt HOT 3
- Something wrong with bakta_db download HOT 3
- bakta is renaming input contigs in its outputs HOT 4
- MacOS M2 cannot install bakta using conda. HOT 1
- Bakta fails using --regions if a sequence does not contain a valid start codon HOT 3
- %2C instead of common in GFF3 files HOT 4
- Failed bakta run for some genomes HOT 8
- origin of replication detection not match Ori-Finder web server HOT 1
- Option to Skip the AMR detection step HOT 5
- Exception: diamond error! error code: -11 HOT 6
- Bakta Diamond error HOT 4
- DeepSig could not be executed! Please make sure DeepSig is installed and executable HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bakta.