Comments (4)
Hi @martynakgajos .. the memory requirements of AlignQC are something I unfortunately have not had time to revisit. Sorry for some off-the cuff guesses, but if you have modest memory available i.e. 20GB then you should be able to handle small batches of long reads fine, ... ie. a few thousand reads, but > 100k reads, the memory requirements may be considerably higher .. i.e. 100GB or more. This especially makes running large sequencing runs like illumina hiseq memory prohibitive. If you have a large batch of reads and memory issues, I would recommend downsampling your reads prior to processing if you want to look at things like the error profile. Also multiprocessing in AlignQC is not implemented very nicely, so it does not use shared memory objects, this means for each multiprocessing you add you need that additional amount of memory available. ... So my main recommendations to address this are to a) declare the number of threads you tell it use for multiprocessing and make this number small or b) downsample the input reads.
from alignqc.
So I guess dealing with almost 3 million long reads, subsampling is my only option.
from alignqc.
Thats the easiest approach @martynakgajos , next best option would be to run with a single thread and on a machine with a lot of memory and see how it goes, but this would probably take days to run.
from alignqc.
I was finally able to run it in reasonable time (74 minutes, 35 GB) for 1% of the reads. For 10% of the reads, I wasn't seeing any progress after 3 days (max memory usage: 410 GB) and the traverse_preprocessed.py seemed to be the problematic part for the bigger sample.
However, I really love the insight to my data that the reports give me, thank you!
from alignqc.
Related Issues (20)
- ValueError HOT 14
- Adjust sort requirement to be compatible with any samtools sorted file HOT 1
- Isoforms
- Partial match/annotation HOT 6
- Error while reading reference fasta HOT 2
- Adapter sequences in reads
- Minimap2 compatibility HOT 3
- Error while reading index HOT 1
- ModuleNotFoundError: No module named 'analyze' HOT 4
- `bioconda` package HOT 3
- Error X11 module cannot be loaded
- input bam
- Short reads analysis HOT 1
- transcript read count from alignQC
- IOError: Not a gzipped file HOT 6
- ValueError: Expected lines to be ordered but they appear not to be ordered on line 49447 HOT 1
- Hi @rojinsafavi Sorry for the delay. I'm a busy these days so if lose track of these I appreciate getting the reminder :) It looks like a problem streaming the data. Is your alignment file sorted by genomic position? If they are ... I have a second more complicated problem that this may be due to. If they are supported by position, do you know if the index of your chromosomes are in alphabetical order? I notice that different aligners have different behaviors when it comes to sorting and sometimes they sort chromosomes alphabetically ... sometimes they do other things. And i may be making the alphabetical assumption in the ordering-check. Something you can try is my sort tool thats in seqtools
- Expected lines to be ordered but they appear not to be ordered HOT 1
- problem with the installation via Conda HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alignqc.