GithubHelp home page GithubHelp logo

Comments (8)

vaulot avatar vaulot commented on June 18, 2024 1

Hi Soluna and Peter

I have been using dada2 version 1.8.0 in R and did not have this problem (16 Gb of memory). I just saw that there is a new version on Bioconductor 1.10.0 (https://bioconductor.org/packages/release/bioc/html/dada2.html). I do not know which version you are using Soluna (Peter is using 1.10.0) ? Maybe there has been a change in the Memory allocation between the two versions.

Soluna, I saw that you also posted on the github from dada2 (benjjneb/dada2#691). I think selecting planktonic organisms is OK.

What you could do also is using mothur pcr.seqs or another program extract the V9 region from the pr2 database using the primer that you are using (allowing 1 or 2 mismatches on each side). You will need though to do this with the mothur format of the pr2 database and then convert back to the dada2 format. But for sure this will reduce very likely the memory allocation.

Same thing for Peter but with the V4 region ?

I will be working on pr2 in the coming month and this is something I could provide, a dada2 database limited to the V4 region and another limited to the V9 version.

Cheers. Daniel

from pr2database.

vaulot avatar vaulot commented on June 18, 2024

Hi Soluna

Do you exactly where the program gets stuck ? Are you using dada2 under R ?

Cheers. Daniel

from pr2database.

pdcountway avatar pdcountway commented on June 18, 2024

I'm having a similar problem with AssignTaxonomy using dada2 v. 1.10, the pr2 v.4.72 reference database, and the associated data for the metabarcode tutorial described here: https://github.com/vaulot/metabarcodes_tutorials/tree/master/R_dada2. All was working perfectly until the AssignTaxonomy step.
In my case AssignTaxonomy failed after about 10 seconds with:
Error in C_assign_taxonomy(seqs, rc(seqs), refs, ref.to.genus, tax.mat.int, :
Memory allocation failed.
I'm running this in RStudio under R 3.5.3, on a Windows PC with 16Gb or memory. There was at least 10Gb of memory available when this process failed.

from pr2database.

soluna1 avatar soluna1 commented on June 18, 2024

Hi Daniel,

yes, I'm using dada2 tool (and pr2_version_4.11.1_dada2.fasta) database. The command is:

taxa <- assignTaxonomy(seqtab, "pr2_version_4.11.1_dada2.fasta", multithread=T, minBoot=80, tryRC=T, taxLevels = c("Kingdom","Supergroup","Division","Class","Order","Family","Genus","Species"))

The run doesn't end after 10 min. It is true that the database is very large compared to the silva 16 one (rdp_train_set_16), so I'm thinking about creating a shorter version of pr2. I've detected that there are some sequences repeated, I mean same taxon same sequence, e.g.

Eukaryota;Archaeplastida;Streptophyta;Embryophyceae;Embryophyceae_X;Embryophyceae_XX;Zea;Zea_mays;

Do you think that I can reduce the size of the database by removing those repetitions and selecting mostly planktonic organisms? Would it still be consistent to assign the taxonomy to 18s v9 region samples?
Thanks for your help.
best,

Soluna

from pr2database.

soluna1 avatar soluna1 commented on June 18, 2024

Hi Daniel,

Thank you so much for your replay. I'm using 1.10.1 dada2 version. So, I you said (and also from the Dada2 GitHub) perhaps I was too optimistic about my computer capabilities, or I should keep on the running more time, more than 10'. I'm going to move the process to a cluster computer next week.

I've managed to compile a shorter version of pr2, selecting those groups which I think could appear in the samples, and I added also 16S sequences from the[ RDP training set] (https://zenodo.org/record/801828#.XKNX4y2B3OQ) and it works fine. It took a couple of minutes to end, so, now I'm including more groups in the database (I think something close to 80000 sequences could work fine, as now most sequences of my samples have been assigned).

Regarding your comment about a pr2 version limited to v4 and/or v9 region it would be fantastic. In fact, I'm working with v9 region sequences, and at the beginning I though that I wouldn't work with a complete 18s database, but fortunately I was wrong, the taxonomy assignation worked fine.

Yes, thanks for the idea, I'll try to use mothur function and the primers to create a v9 database based on pr2. It certainly will be easier to manage for my computer.
I'll let you know if this works.

thanks a lot!!

Soluna

from pr2database.

vaulot avatar vaulot commented on June 18, 2024

Hi Soluna

In the end, I decided not to release a V4 or V9 version of PR2. I used quite universal primers to cut the pr2 sequence to V4 and then used dada2 to assign some metabarcoding datasets. To my surprise many sequences became unassigned despite the fact that the corresponding reference sequence was in the PR2 V4 sequence dataset. So I will need to explore more until I provide such data so that users do not encounter the same problem.

from pr2database.

soluna1 avatar soluna1 commented on June 18, 2024

Hi Daniel,
Thank you for the information. It is not exactly the same problem, but I also found "weird" results from dada2 AssignTaxonomy function. I commented this in dada2 Github https://github.com/benjjneb/dada2/issues/750. That was that dada2 returned different results depending on the reference database used, assigning the same sequence to two different taxa with a 100% bootstrapping in both cases.
Testing a little more on that, but using EukDiv w2_v9 reference database, I found that dada2 taxonomic assignation algorithm seems to depend on the number of time a particular taxa is repeated in the database. So, a particular asv was assigned to the most frequent taxa (c. closterium or B. paxillifer) depending on whether I put more copies of one or the other. Once all the options where balanced (no repetitions) the assignation was done to a higher taxonomic level, i.e. correct assignation.
However, using pr2 the same asv was correctly assigned to a higher taxonomic level in the first try, even if there are more "copies" of C. closterium than the rest of options (N. fonticola, B. paxillifer, C. fusifurmis, etc). I supposed that in the case of pr2 the proportion of nucleotides matching the reference sequence played a role in the assignation, although I haven't checked this out yet.

So, I really don't know which reference database is better in my case, having only v9 (very short) ASVs. I obtain a higher proportion of assigned ASVs with EukDiv w2 v9 than with pr2, but they could have errors like the one mentioned in some very common sequences. So...

Thanks again for your feedback.

from pr2database.

vaulot avatar vaulot commented on June 18, 2024

In the next version of PR2, there will be a link to EukRibo which has annotation for the V9 region. So it will be possible to select only sequences that contain the full V9 region. However such sequences represent only 10% of the full PR2 database which means that the taxonomic sampling will be reduced compared to the full PR2 database.

from pr2database.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.