GithubHelp home page GithubHelp logo

make nt database index about centrifuge HOT 8 OPEN

jrherr avatar jrherr commented on September 18, 2024 1
make nt database index

from centrifuge.

Comments (8)

fconstancias avatar fconstancias commented on September 18, 2024

I am having an issue as well on building the nt index. I just followed the manual and I got this error :

centrifuge-build -p 16 --bmax 1342177280 --conversion-table gi_taxid_nucl.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt.fa nt

_
...
Warning: taxomony id doesn't exists for NM_001348188.1!
Warning: taxomony id doesn't exists for NM_001348191.1!
Warning: taxomony id doesn't exists for NM_001348189.1!
Warning: taxomony id doesn't exists for NM_001348193.1!
Warning: taxomony id doesn't exists for NM_001348190.1!
Warning: taxomony id doesn't exists for NM_001348194.1!
Warning: taxomony id doesn't exists for NM_001348192.1!
Warning: taxomony id doesn't exists for NM_003831.4!
Warning: taxomony id doesn't exists for NG_001488.3!
Warning: taxomony id doesn't exists for NG_052639.1!
Warning: taxomony id doesn't exists for NG_052638.1!
Warning: taxomony id doesn't exists for NR_145473.1!
Warning: taxomony id doesn't exists for NR_145472.1!
Warning: taxomony id doesn't exists for NR_145477.1!
Warning: taxomony id doesn't exists for NR_145465.1!
Warning: taxomony id doesn't exists for NR_145474.1!
Warning: taxomony id doesn't exists for NR_145475.1!
Warning: taxomony id doesn't exists for NR_103441.2!
Warning: taxomony id doesn't exists for NR_145476.1!
Warning: taxomony id doesn't exists for NR_145471.1!
Error: taxonomy/nodes.dmp doesn't exist!
Total time for call to driver() for forward index: 02:09:20
Error: Encountered internal Centrifuge exception (#1)
Command: centrifuge-build-bin --wrapper basic-0 -p 16 --bmax 1342177280 --conversion-table gi_taxid_nucl.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt.fa nt
Deleting "nt.1.cf" file written during aborted indexing attempt.
Deleting "nt.2.cf" file written during aborted indexing attempt.
Deleting "nt.3.cf" file written during aborted indexing attempt.
_

Do you have any idea what is wrong with my command?
Thanks a lot.

Flo

from centrifuge.

xgnusr avatar xgnusr commented on September 18, 2024

The same issue ..............
Any news ????

from centrifuge.

feltzmc avatar feltzmc commented on September 18, 2024

@xgnusr Building the nt index takes a lot of memory, I managed to build it successfully on an AWS server with 488 GB of memory, but it fails on a server with 196 GB.

Command used:
centrifuge-build -p 16 --conversion-table acc2tax.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt.fa nt

from centrifuge.

GastonViarengo avatar GastonViarengo commented on September 18, 2024

@xgnusr Building the nt index takes a lot of memory, I managed to build it successfully on an AWS server with 488 GB of memory, but it fails on a server with 196 GB.

Command used:
centrifuge-build -p 16 --conversion-table acc2tax.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt.fa nt

Is this the memory requeriment also for the RefSeq database? I'm trying to make it work but been unsuccesful in creating a bacterial index (#199 (comment)). How can I get access to the AWS with that amount of RAM??? Thanks for any help!

from centrifuge.

feltzmc avatar feltzmc commented on September 18, 2024

@GastonViarengo the memory required for RefSeq could be even higher. I believe RefSeq complete is around 120GB at the moment, whereas nt is around 94GB in BLAST db form. As for accessing an AWS with a large amount of RAM, you would first need to have an account on Amazon Web Services (https://aws.amazon.com/). From there I would recommend building a machine image on a free micro instance with centrifuge already installed, then creating an AWS storage drive with the concatenated RefSeq files in FASTA format. Once you are all set up and ready to build then you would want to reserve an AWS VM with a large amount of memory, load the image, mount the drive and begin the build process. It is a somewhat complicated process so you may want to have someone familiar with AWS walk you through. Please note that running AWS machines and reserving storage is not free and you will need to pay for any resources you use. Good luck!

from centrifuge.

GastonViarengo avatar GastonViarengo commented on September 18, 2024

Thanks Matthew for the quick response! I have downloaded bacterial RefSeq database (through centrifuge-download) and it's around 70GB in FASTA form, that's why I was wondering why it takes so much memory to build the index. I'll check out AWS but it's complicated for us to be able to pay such services, do you (or anyone) know a FREE server with that amount of memory? Another approach I am thinking is to split the bacterial DB and create multiple indexes, but that's annoying for further analysis! Thanks so much for your help!

from centrifuge.

feltzmc avatar feltzmc commented on September 18, 2024

@GastonViarengo Ah 70GB should be much more manageable than 120. I don't know of any free resources with that amount of memory but if you are in academia you might be able to reach out to nearby universities and find one that has a high performance compute cluster with a "fat node". Alternatively it looks like this site (https://genexa.ch/sars2-bioinformatics-resources/) has some free centrifuge databases built with recent (March 2020) RefSeq data. Best of luck!

from centrifuge.

GastonViarengo avatar GastonViarengo commented on September 18, 2024

Ok Matthew, thanks again! Yes, I downloaded those DBs, but I wanted to work with a bacteria only RefSeq index (or at least a human+bacteria). Anyway, I'll keep trying and looking for options. Bests!

from centrifuge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.