GithubHelp home page GithubHelp logo

RDS is always 0.0 about platon HOT 7 OPEN

oschwengers avatar oschwengers commented on May 27, 2024
RDS is always 0.0

from platon.

Comments (7)

oschwengers avatar oschwengers commented on May 27, 2024

Hi @barbaracania,
thanks for reaching out. There are couple of things going on here, so I'll try to address them in chronological order:

  1. For some reason, the first 4 contigs are denoted as NOTE_1. Have you merged contigs from different assemblies, potentially from different strains/species? If this is the case, then this could cause severe issues for Prodigal's gene prediction which in turn would cause issues to detect Platon's marker protein sequences (MPS).
  2. --characterize leads to a full characterization of all contigs and therefore deactivates any filtering. Hence, this option can be used to gain information on any contig, no matter whether its chromosome or plasmid borne.
  3. The last contig indeed has 2 rRNAs detected, however in --characterize mode, Platon doesn't classify contigs but characterizes all of them
  4. It depends on the data, sometimes sensitivity and accuracy mode provide the same results. Also, in specificity mode Platon uses very strict classification rules for the RDS and since it is below the specificity threshold, it refuses to classify any of your contigs as plasmid. So yes, this is expected.

Could you provide some information on your data: Metagenome or isolate? Merged assemblies?
Best regards!

from platon.

barbaracania avatar barbaracania commented on May 27, 2024

Thank you for your answer. My data is metagenomic, but the samples were treated with a plasmid-safe DNAse, so it should contain mostly plasmid reads. I ran SPAdes on it with the --metaplasmid option, and afterwards I only modified the names of contigs by removing all the information after the coverage, as otherwise Platon was not able to read the coverage correctly from them. Without the modification, the names look like this: >NODE_1_length_63294_cov_26.832935_cutoff_20_type_circular. The data was not modified in any other way. As it is suggested that the contigs produced by metaplasmidSPAdes should still be confirmed as plasmids by additional means, I thought of including Platon in my pipeline for this purpose.

Just to make this clear, I understand that using the --characterize option for Platon gives only info about contigs. I used it only to get an idea about my data and also to show it to you. When I was testing the three different modes, I was not using this option. For example, when I used

platon contigs.fasta --db ~/Databases/db --output platon_accu --mode accuracy --threads 8

my contigs.tsv file starts like this:

ID Length Coverage # ORFs RDS Circular Inc Type(s) # Replication # Mobilization # OriT # Conjugation # AMRs # rRNAs # Plasmid Hits
NODE_1_length_63165_cov_26.834275 63165 26.8 48 0.0 yes 0 0 0 0 0 0 0 0
NODE_1_length_51546_cov_2.360878 51546 2.4 74 0.0 yes 0 0 0 0 0 0 0 0
NODE_2_length_32011_cov_1.484036 32011 1.5 39 0.0 yes 0 0 0 0 0 0 0 0
NODE_3_length_19747_cov_141.934964 19747 141.9 3 0.0 yes 0 0 0 0 0 0 2 0

My contigs.chromosome.fasta contains only the first two contigs from my previous post that were not identified by Platon as circular, and the contigs.plasmid.fasta has everything else, including the contig on which the rRNA genes were found. When I try the sensitivity mode, I get the same results as with the accuracy mode, but the specificity mode gives me empty contigs.tsv and contigs.plasmid.fasta files, while all the contigs are found in the contigs.chromosome.fasta. From what I understood, the accuracy mode should take all the contig characteristics into consideration when making a choice if a contig comes from a plasmid or a chromosome, while the other two modes are relying only on the RDS values. Since all my RDS values are 0.0, I am confused why I am getting the above-described results...

from platon.

oschwengers avatar oschwengers commented on May 27, 2024

Hi,
could you repeat your analysis by using the --meta option? This is currently not yet available in the latest official release v1.6 but available in the main branch. You can install it into your environment via:
git clone https://github.com/oschwengers/platon.git python -m pip install --no-deps --ignore-installed platon/
Without further information I cannot figure out what is causing this behaviour, but Prodigal will certainly not work perfectly without the meta option set as it thinks it's a single genome.
Another reason could be that Platon simply cannot detect any marker proteins within your metagenome contigs. In order to do so, I'd need the <prefix>.json.

from platon.

barbaracania avatar barbaracania commented on May 27, 2024

Hi,
Thank you very much for trying to help me with my issue! I tried the --meta option, but the results seem to be all the same. Here is the .json file produced with the command platon contigs.fasta --db ~/Databases/db --output platon_accu_meta --meta --mode accuracy --threads 8
contigs.json.zip

from platon.

oschwengers avatar oschwengers commented on May 27, 2024

Hi,
indeed there is not a single marker protein that could be detected on your contigs, which is odd/interesting and hasn't occured so far - at least not for an entire dataset. However, we do not have much experience with metagenome data so far.

So in principle, there are 2 different reasons that I can think of:

  1. Platon's marker protein sequences are actually not encoded on these contigs. In this case, Platon's database wouldn't cover the protein space encoded in your data. We're currently compiling an updated DB which could help here.
  2. There could be an error occuring. In order to check that may I ask you to also provide the contigs.log file?

from platon.

barbaracania avatar barbaracania commented on May 27, 2024

Good morning,
Sure! Here is the .log file from the same run:
contigs.log

from platon.

oschwengers avatar oschwengers commented on May 27, 2024

I took a look at the logs and from a technical perspective, everything is just fine.
However, there is indeed not a single blast (diamond) hit against the marker protein database which so far has not occured (at least not that I knew of). This is very interesting and helpful to know in terms of metagenome analysis with platon!

As mentioned above, I'm currently computing and compiling a database update which could help here - of course this would require further investigations. As of today, it seems to be the case that Platon is not the right tool for your dataset. May I refere you to PlasFlow? Since Platon was initially developed with single isolates in mind, PlasFlow might provide better results since it's solely addressing metagenome data.

I'll leave this open until we've released the new database version and Platon [v1.7] just to let you know.
Again, thanks for trying Platon and reporting this!
Best regards!

from platon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.