Comments (8)
I am having an issue as well on building the nt index. I just followed the manual and I got this error :
centrifuge-build -p 16 --bmax 1342177280 --conversion-table gi_taxid_nucl.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt.fa nt
_
...
Warning: taxomony id doesn't exists for NM_001348188.1!
Warning: taxomony id doesn't exists for NM_001348191.1!
Warning: taxomony id doesn't exists for NM_001348189.1!
Warning: taxomony id doesn't exists for NM_001348193.1!
Warning: taxomony id doesn't exists for NM_001348190.1!
Warning: taxomony id doesn't exists for NM_001348194.1!
Warning: taxomony id doesn't exists for NM_001348192.1!
Warning: taxomony id doesn't exists for NM_003831.4!
Warning: taxomony id doesn't exists for NG_001488.3!
Warning: taxomony id doesn't exists for NG_052639.1!
Warning: taxomony id doesn't exists for NG_052638.1!
Warning: taxomony id doesn't exists for NR_145473.1!
Warning: taxomony id doesn't exists for NR_145472.1!
Warning: taxomony id doesn't exists for NR_145477.1!
Warning: taxomony id doesn't exists for NR_145465.1!
Warning: taxomony id doesn't exists for NR_145474.1!
Warning: taxomony id doesn't exists for NR_145475.1!
Warning: taxomony id doesn't exists for NR_103441.2!
Warning: taxomony id doesn't exists for NR_145476.1!
Warning: taxomony id doesn't exists for NR_145471.1!
Error: taxonomy/nodes.dmp doesn't exist!
Total time for call to driver() for forward index: 02:09:20
Error: Encountered internal Centrifuge exception (#1)
Command: centrifuge-build-bin --wrapper basic-0 -p 16 --bmax 1342177280 --conversion-table gi_taxid_nucl.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt.fa nt
Deleting "nt.1.cf" file written during aborted indexing attempt.
Deleting "nt.2.cf" file written during aborted indexing attempt.
Deleting "nt.3.cf" file written during aborted indexing attempt.
_
Do you have any idea what is wrong with my command?
Thanks a lot.
Flo
from centrifuge.
The same issue ..............
Any news ????
from centrifuge.
@xgnusr Building the nt index takes a lot of memory, I managed to build it successfully on an AWS server with 488 GB of memory, but it fails on a server with 196 GB.
Command used:
centrifuge-build -p 16 --conversion-table acc2tax.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt.fa nt
from centrifuge.
@xgnusr Building the nt index takes a lot of memory, I managed to build it successfully on an AWS server with 488 GB of memory, but it fails on a server with 196 GB.
Command used:
centrifuge-build -p 16 --conversion-table acc2tax.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp nt.fa nt
Is this the memory requeriment also for the RefSeq database? I'm trying to make it work but been unsuccesful in creating a bacterial index (#199 (comment)). How can I get access to the AWS with that amount of RAM??? Thanks for any help!
from centrifuge.
@GastonViarengo the memory required for RefSeq could be even higher. I believe RefSeq complete is around 120GB at the moment, whereas nt is around 94GB in BLAST db form. As for accessing an AWS with a large amount of RAM, you would first need to have an account on Amazon Web Services (https://aws.amazon.com/). From there I would recommend building a machine image on a free micro instance with centrifuge already installed, then creating an AWS storage drive with the concatenated RefSeq files in FASTA format. Once you are all set up and ready to build then you would want to reserve an AWS VM with a large amount of memory, load the image, mount the drive and begin the build process. It is a somewhat complicated process so you may want to have someone familiar with AWS walk you through. Please note that running AWS machines and reserving storage is not free and you will need to pay for any resources you use. Good luck!
from centrifuge.
Thanks Matthew for the quick response! I have downloaded bacterial RefSeq database (through centrifuge-download) and it's around 70GB in FASTA form, that's why I was wondering why it takes so much memory to build the index. I'll check out AWS but it's complicated for us to be able to pay such services, do you (or anyone) know a FREE server with that amount of memory? Another approach I am thinking is to split the bacterial DB and create multiple indexes, but that's annoying for further analysis! Thanks so much for your help!
from centrifuge.
@GastonViarengo Ah 70GB should be much more manageable than 120. I don't know of any free resources with that amount of memory but if you are in academia you might be able to reach out to nearby universities and find one that has a high performance compute cluster with a "fat node". Alternatively it looks like this site (https://genexa.ch/sars2-bioinformatics-resources/) has some free centrifuge databases built with recent (March 2020) RefSeq data. Best of luck!
from centrifuge.
Ok Matthew, thanks again! Yes, I downloaded those DBs, but I wanted to work with a bacteria only RefSeq index (or at least a human+bacteria). Anyway, I'll keep trying and looking for options. Bests!
from centrifuge.
Related Issues (20)
- centrifuge download fails HOT 2
- The second file of customer db is empty HOT 4
- Database download taking a lot of disk space + taking too long HOT 3
- not able to build database, taxonomy does not exist warnings, latest version by git HOT 1
- M. tuberculosis genome size error
- Unrecognised `--temp-directory` option, however specified in code? HOT 9
- ERROR): Expected centrifuge to be in same directory with centrifuge-class: HOT 2
- Database problem HOT 1
- Database question HOT 10
- Can using "--mm" flag reduce memory usage? HOT 2
- NCBI nt index HOT 6
- option --packed
- nt index incomplete build HOT 21
- centrifuge-kreport values do not ad HOT 3
- how do i setup the SILVA database? HOT 4
- Couldn't find parent of taxID 138 - directly assigned to root. HOT 6
- Centrifuge Classifying Reads, but the Report is Empty HOT 3
- Viral new File format? HOT 4
- centrifuge-inspect python script is incompatible with python >=3.12 HOT 2
- Issues with Centrifuge Output for Functional Annotation HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from centrifuge.