Comments (2)
Actually I am going to kick off another project to build a comprehensive mapping file QC tool, which will work for BAM/SAM/CRAM.
But this mapping QC tool will be majorly designed for mapped BAM/SAM/CRAM, not for unmapped bam.
I remember that unmapped bam need larger space and will discard some information (some information in the first line of 4-line FASTQ record). What's the biggest benefit for using unmapped bam to store FASTQ files?
Although it's designed and promoted by BROAD, I still saw a lot of debates about it. And I would like to listen to your considerations before we can start unmapped bam support.
Thanks
Shifu
from fastp.
Our primary concern is to keep metadata specific to Illumina sequencing run such as RTA software version, instrument model, SBS kit version etc. You can add this in the bam header but you can't do that with fastq files. What is more, we can archive the multiplexed bam files and delete Illumina run folders. If we expperience a problem with barcodes, it is much easier to rerun demultiplexing on bam files instead of .tar.gz
BCL files. bam files might take a bit more space but we don't think it is important considering their added benefit.
IMHO, you can simply use HTSlib to read bam files and process them just like the reads that you get from fastq files. You can check this unmapped bam file as an example:
https://drive.google.com/file/d/1j1Yjy1zU1F8dPpf-lQVkoP3FkK--FOqL/view?usp=sharing
I am not sure if there is any information that we lose by using bam files, can you please explain a bit more?
Best,
Bekir
from fastp.
Related Issues (20)
- Can fastq remove read sequences with duplicate IDs?
- Feature request: add option to set lower limit of unqualified quality
- Missing most reads after given r2 adapter HOT 1
- Interpretation help file?
- Store duplicate reads
- Split interleaved output
- interleaved output is not reproducible with multiple threads HOT 2
- Not able to install on Mac book M1 HOT 4
- Keep occurred error message from the beginning < igzip: invalid gzip header found >
- Nanopore data filtering using fastp HOT 1
- No adapter detected for read and Q20 bases: 4747174600(99.9999%)
- fastp not removing all Illumina universal adapter sequences as indicated by FastQC HOT 4
- few options throw 'undefined error' -reg
- Error is raised for problematic rows HOT 3
- not support arm? HOT 1
- Not reproducible HOT 7
- Running fastp in quiet mode. HOT 2
- Feature Request: fastp operation on input and output directories
- ERROR: '+' expected HOT 5
- Feature request: remove reads with poly_X tails and polyX in general
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastp.