GithubHelp home page GithubHelp logo

Memory Requirements about alignqc HOT 4 CLOSED

martynakgajos avatar martynakgajos commented on August 12, 2024
Memory Requirements

from alignqc.

Comments (4)

jason-weirather avatar jason-weirather commented on August 12, 2024

Hi @martynakgajos .. the memory requirements of AlignQC are something I unfortunately have not had time to revisit. Sorry for some off-the cuff guesses, but if you have modest memory available i.e. 20GB then you should be able to handle small batches of long reads fine, ... ie. a few thousand reads, but > 100k reads, the memory requirements may be considerably higher .. i.e. 100GB or more. This especially makes running large sequencing runs like illumina hiseq memory prohibitive. If you have a large batch of reads and memory issues, I would recommend downsampling your reads prior to processing if you want to look at things like the error profile. Also multiprocessing in AlignQC is not implemented very nicely, so it does not use shared memory objects, this means for each multiprocessing you add you need that additional amount of memory available. ... So my main recommendations to address this are to a) declare the number of threads you tell it use for multiprocessing and make this number small or b) downsample the input reads.

from alignqc.

martynakgajos avatar martynakgajos commented on August 12, 2024

So I guess dealing with almost 3 million long reads, subsampling is my only option.

from alignqc.

jason-weirather avatar jason-weirather commented on August 12, 2024

Thats the easiest approach @martynakgajos , next best option would be to run with a single thread and on a machine with a lot of memory and see how it goes, but this would probably take days to run.

from alignqc.

martynakgajos avatar martynakgajos commented on August 12, 2024

I was finally able to run it in reasonable time (74 minutes, 35 GB) for 1% of the reads. For 10% of the reads, I wasn't seeing any progress after 3 days (max memory usage: 410 GB) and the traverse_preprocessed.py seemed to be the problematic part for the bigger sample.
However, I really love the insight to my data that the reports give me, thank you!

from alignqc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.