GithubHelp home page GithubHelp logo

Valid CDS identifier? about clinker HOT 8 CLOSED

aberaslop avatar aberaslop commented on August 25, 2024
Valid CDS identifier?

from clinker.

Comments (8)

gamcil avatar gamcil commented on August 25, 2024 4

Hey everyone,

I think this is because clinker currently only checks for protein_id, locus_tag and ID qualifiers to use as gene names. In @aberaslop's file for example, the features have these instead:

/Name="input.path1.gene38"
/gene="input.path1.gene38"

The quick fix would then just be do a search and replace on the problematic files (i.e. change Name= to protein_id=). When I get some time I'll add some extra qualifiers for it so you shouldn't have this problem.

As far as features go, clinker only looks for CDS, so you shouldn't need any gene/mRNA etc.

from clinker.

hungenlai90 avatar hungenlai90 commented on August 25, 2024

First of all thanks so much for writing this program, it's really user-friendly to non-bioinformatics experts like me. I managed to get it running in just a few steps. I really like how easy it is to use this and the visualisation is good enough for publication.
However, I have the identical error message as above when running the script against 49 ripps clusters gbk files. When I removed the problematic files, the script ran fine (three of the 49 files were problematic). I'm not sure what causes the issue in these three files, I have tried trimming down the length of the dna sequence, removing extra features annotated by antismash that weren't important to my gene cluster, but still had no luck getting the script to run the alignment for these files.
As the gbk sequences/files are not public release, I can't share them here unfortunately.

from clinker.

hungenlai90 avatar hungenlai90 commented on August 25, 2024

Did more troubleshooting and I found that removing three of the CDS features in one file (doesn't work if you only remove any one or two of them) allowed the script to run without error. I tried removing another set of three CDS features but that didn't work either. It's very weird indeed...

from clinker.

rob2go avatar rob2go commented on August 25, 2024

I also got the same error when using genbank files generated by SnapGene.
If i download directly from NCBI, it works perfectly.
The thing is it has to have the Gene and CDS annotations
gene 1..1716
/locus_tag=
/note=
CDS 1..1716
/locus_tag=

Snapgene was not generating the genes because I have not annotated them...
But I am also worried how to do with the genomes I have that are also not public yet. I have to generate them somehow and the error will be there, probably. I don't know what else to do. Just trying to figure it out by comparing gbk files I generate with those from NCBI.

from clinker.

hungenlai90 avatar hungenlai90 commented on August 25, 2024

In my gbk files all of them have no gene annotation (just CDS, misc_feature and primer_bind). They ran fine without error, so not sure if that is the cause of this issue?

from clinker.

aberaslop avatar aberaslop commented on August 25, 2024

Hi Cameron,

Thank you for such a quick answer and solving my issue ! Changing /Name by /protein_id totally fixed the problem.

Thank you so much!!

L.

from clinker.

sjmoore505 avatar sjmoore505 commented on August 25, 2024

This is a cool tool and straightforward to use - thanks to @hungenlai90 for heads up

from clinker.

gamcil avatar gamcil commented on August 25, 2024

This should now be fixed in 0.0.7. clinker should now save most common name qualifiers, though if it's missing some/erroring feel free to re-open this issue.

from clinker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.