GithubHelp home page GithubHelp logo

Comments (6)

Jigyasa3 avatar Jigyasa3 commented on July 17, 2024 1

Hi @oschwengers ,

I can answer my own question here- BAKTAs ORF preserves the gene neighborhood of the CRISPR array even though PILER-CR cannot find the CRISPR.

Thanks for a great annotation software!

from bakta.

Jigyasa3 avatar Jigyasa3 commented on July 17, 2024

Hi,

Got some more updates. I compared the RS_GCF_000279285.1 results from BAKTA and CRISPRCasFinder. BAKTA finds one CRISPR array in this sample start=602511, stop= 603618 while CRISPRCasFinder finds two (screenshot below).
I think BAKTA is doing a stringent cutoff for finding CRISPR and this is probably leading to missed annotation. I examined the CRISPR array present in the location start=92144, stop=92347 which is missed by BAKTA, and this CRISPR is associated with the Tn7 system!
Any suggestions to decrease/remove the stringency in BAKTA for CRISPR annotation?

CRISPRCasFinder results-
Screenshot 2023-09-22 at 6 49 30 PM

from bakta.

oschwengers avatar oschwengers commented on July 17, 2024

Hi @Jigyasa3, thanks a lot for reaching out and this deeper comparison!

Bakta only executes PILER-CR with default parameters and accepts whatever it predicts. Hence, there are no CRISPR-related filters within Bakta. The default parameters of PILER-CR due to its usage are:

Criteria for CRISPR detection, defaults in parentheses:
   -minarray <N>          Must be at least <n> repeats in array (3).
   -mincons <F>           Minimum conservation (0.9).
                            At least N repeats must have identity
                            >= F with the consensus sequence.
                            Value is in range 0 .. 1.0.
                            It is recommended to use a value < 1.0
                            because using 1.0 may suppress true
                            arrays due to boundary misidentification.
   -minrepeat <L>         Minimum repeat length (16).
   -maxrepeat <L>         Maximum repeat length (64).
   -minspacer <L>         Minimum spacer length (8).
   -maxspacer <L>         Maximum spacer length (64).
   -minrepeatratio <R>    Minimum repeat ratio (0.9).
   -minspacerratio <R>    Minimum spacer ratio (0.75).
                            'Ratios' are defined as minlength / maxlength,
                            thus a value close to 1.0 requires lengths to
                            be similar, 1.0 means identical lengths.
                            Spacer lengths sometimes vary significantly, so
                            the default ratio is smaller. As with -mincons,
                            using 1.0 is not recommended.

Parameters for creating local alignments:
   -minhitlength <L>      Minimum alignment length (16).
   -minid <F>             Minimum identity (0.94).

Of course, the various prediction tools (PILER-CR, CRISPRCasFinder, Minced, etc) produce slighty different results. Sometimes a is better than b and vice versa. Hence, it's hard to tell, which one is right.

I have the feeling that CRISPRCasFinder seems to be better and certainly better maintained than PILER-CR what makes it a very intersting candidate for Bakta. However, we'd need to add a bunch of additional dependencies and also, CRISPRCasFinder requires its own database that we would need to handle somehow. I don't say this is a no-go, but currently I'm a bit reluctant to add all this extra complexity to Bakta.

from bakta.

oschwengers avatar oschwengers commented on July 17, 2024

Hi @Jigyasa3,
not solving your issue, but somehow related. I just merged a new PR #249 improving the CRISPR information. Now, Bakta also provides information on the CRISPR spacer sequences. Maybe this is of interest for you.

from bakta.

oschwengers avatar oschwengers commented on July 17, 2024

So, having thought about this a bit longer, I think this is a regular case with varying outputs of different tools. Hence, I guess, we cannot do anything about that in principle. As explained, both PILER-CR and MinCED are not actively maintained anymore but novel tools are too complex having too many dependencies themselves.

Hence, it might be best to maybe use these tools in external dedicated analysis and use this information outside of the actual genome annotation process?

I'm sorry, that I cannot provide any more help here. Thus, I'd close this for now. But please do not hesitate to re-open this or a new issue in any case.
Thanks again and best regards!

from bakta.

Jigyasa3 avatar Jigyasa3 commented on July 17, 2024

Hi @oschwengers,

Thank you for replying and explaining the background PILER-CR CRISPR annotation.
I am specifically interested in examining the genetic background of CRISPR arrays. For example, some CRISPR arrays are associated with tns proteins (https://pubmed.ncbi.nlm.nih.gov/34845024/). But if the CRISPR array is missed by BAKTA, then the genomic background gets annotated by protein-coding genes.
My question is, if I use CRISPRCasFinder to annotate CRISPR arrays, can I still use BAKTA to examine the genomic neighborhood of the region of interest? Would the ORFs be correct?

from bakta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.