Comments (10)
Actually, I forgot what it was doing. It makes a system call to samtools which converts cram to bam, then parses the bam. So the sampling stuff (the parts that you notice are filled out from covstats) work fine with cram. But the mapped reads is taken from the bam index and not present in the cram index so it can't estimate coverage.
from goleft.
P.S. I think I've figured out how to calculate median coverage from the mosdepth output. It's not as fast as covstats, but still faster than most other tools out there.
from goleft.
I think it should be straight-forward to support CRAM in covstats, I'll just do a system call to samtools view. I'll have a look at this next week if not sooner.
from goleft.
Much appreciated, thank you!
from goleft.
I just had a look at this and the cram index does not store the total number of mapped reads so it's not possible to estimate the coverage like that as it is for the bam index. we could iterate over a few cram slices, count reads, and not the byte offsets in the index, then use that rate and the total file size to estimate. this will be relatively accurate, but less-so than even the bam index estimate.
from goleft.
Are there any updates on this? I've been trying to implement a workflow involving covstats, and when it runs on CRAM files, the coverage reports as zero. Other stats appear to be accurate.
from goleft.
given the lack of cram parser in go, I don't plan to support this. It's possible to get actual coverage for a 30X WGS cram in < 5 minutes using mosdepth.
from goleft.
Thanks for your reply. I've noticed that covstats seems to run slowly on crams, even small ones. When you say there's no cram parser in go... what is it doing for the other stats? Apologies if this is a silly question, I'm very new to go.
In other words -- is covstats running on crams supported, just not for coverage? Or should I consider crams as not supported, period?
from goleft.
What would you recommend to very quickly extract mean/median coverage (+the insert size info) for CRAMs?
5 mins with mosdepth is pretty good but say we don't want that much info, just the median depth.
Would it make sense to slice the CRAM to a few random regions, convert to a small BAM and then use covstats?
We'd need to do this manually outside covstats, right? I see a "Regions" option but I'm not sure how it's used in the code, including if the full CRAM would be converted to BAM before computing coverage on these regions or not.
from goleft.
I suppose I could make it calculate the mean for the first part of the chromosome with covstats.
As soon as you starting converting to bam, you're going to be going up in time, even if it's only part of the file.
from goleft.
Related Issues (20)
- Differential header stringency depending on file format HOT 4
- goleft depth sliding window analysis HOT 1
- Feature request: New option --exclude for indexsplit HOT 1
- indexcov support for csi indexes HOT 2
- ndexcv: excluding chromosome: NC_031965 because of exclude-pattern: ^chrEBV$|^NC|_random$|Un_|^HLA\-|_alt$|hap\d HOT 2
- goleft_linux64 covstats error: panic: EOF HOT 7
- coverage of the customized genome HOT 2
- error: bam is required HOT 4
- link to golang typo
- suggestion: zoomable chromosome depth plots in indexcov HOT 1
- indexcov shows stepwise coverage distribution HOT 4
- Feature request: New option --exclude for covstats HOT 2
- Indexcov errors HOT 2
- error on depth, seems to make temp file it can't find
- Negative alnSpan error HOT 2
- No longer possible to build from source or include in a Docker image HOT 2
- Running Indexcov on cram files with different paths HOT 2
- What is the value of the insert size ? HOT 2
- indexcov: usage with long reads ? HOT 1
- Indexcov searching for cram.bai HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from goleft.