oicr-gsi / bam-qc-metrics Goto Github PK
View Code? Open in Web Editor NEWMetrics for BAM file QC
License: GNU General Public License v3.0
Metrics for BAM file QC
License: GNU General Public License v3.0
In new versions of the bam-qc workflow, filtering and downsampling will be done upstream by other workflow tasks. So, the filtering/downsampling capabilities of bam-qc-metrics itself will no longer be used. We could:
Low-priority, but could be useful. Should we do it?
If the read is reversed by the aligner (flag 16), the CIGAR string will not match the cycle of the machine.
The iterarator needs to be reversed if flag 16 is set
bam-qc-metrics/bam_qc_metrics/bam_qc.py
Line 80 in f923853
This variable is used
bam-qc-metrics/bam_qc_metrics/bam_qc.py
Line 132 in f923853
where it means -q, --trim-quality INT The BWA trimming parameter
(https://sourceforge.net/p/bio-bwa/mailman/message/25597301/)
The same variable is compared to the read MAPQ value
bam-qc-metrics/bam_qc_metrics/bam_qc.py
Line 74 in f923853
These two uses are incompatible.
samtools
uses the -q
flag differently depending on context. In samtools view
: -q INT Skip alignments with MAPQ smaller than INT [0].
The instance variable is always set to 1 at
bam-qc-metrics/bam_qc_metrics/bam_qc.py
Line 44 in 2f72712
regardless of parameter. I think this needs changing to self.sample_rate = sample_rate
If sequences in the BAM file do not appear in the given alignment reference, analysis dies (see below for error).
Make a more informative error message, or (if possible) prevent the error from happening.
(bamqc) ibancarz@ld5312-ibanca:~/playground/bamqc_test_data/20180816/A00469_0047/test$ run_bam_qc.py -b ../../../SWID_14343630_TGL41_0004_nn_R_PE_320_CM_HMC_4_190531_M00146_0054_000000000-D6CW8_GTTACGCA-ATCGCCAT_L001_001.annotated.bam -o test.json -t ../../../hg19_random.genome.sizes.bed -r ../../../hg19.fa [E::faidx_adjust_position] The sequence "chr1_gl000192_random" not found
samtools stats
might allow you to use just one chromosome in the fasta
file. If you pick one of the runt chromosomes (outside of chr1-22, X), it would add less than 1MB to the repo.
Originally posted by @slazicoicr in https://github.com/_render_node/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMTkyMjc3NTU2OnYy/pull_request_review_threads/discussion
Small utility script to update the workflow version number, in JSON files of expected test data.
Analysis crashes if the --target
option is not specified, as follows:
(bamqc) ibancarz@ld5312-ibanca:~/playground/bamqc_test_data/20180816/A00469_0047/test$ run_bam_qc.py -b ../../../SWID_14343630_TGL41_0004_nn_R_PE_320_CM_HMC_4_190531_M00146_0054_000000000-D6CW8_GTTACGCA-ATCGCCAT_L001_001.annotated.bam -o test.json Traceback (most recent call last): File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bin/run_bam_qc.py", line 138, in <module> main() File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bin/run_bam_qc.py", line 134, in main qc = bam_qc(config) File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bam_qc_metrics/bam_qc.py", line 122, in __init__ fast_finder.read_length_summary()) File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bam_qc_metrics/bam_qc.py", line 625, in __init__ self.metrics = self.evaluate_all_metrics() File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bam_qc_metrics/bam_qc.py", line 630, in evaluate_all_metrics self.evaluate_bedtools_metrics(), File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bam_qc_metrics/bam_qc.py", line 642, in evaluate_bedtools_metrics metrics['number of targets'] = targetBedTool.count() File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/pybedtools/bedtool.py", line 2507, in count return sum(1 for _ in iter(self)) File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/pybedtools/bedtool.py", line 2507, in <genexpr> return sum(1 for _ in iter(self)) File "pybedtools/cbedtools.pyx", line 754, in pybedtools.cbedtools.IntervalIterator.__next__ TypeError: NoneType object is not an iterator
bam-qc-metrics/bam_qc_metrics/bam_qc.py
Line 237 in f923853
line will be a string, not digit
Example output for a MiSeq analysis
GLCS_0001_Lv_R_PE_279_WG|1|190305_M00146_0024_000000000-D5N29.txt
bam-qc-metrics/bam_qc_metrics/bam_qc.py
Line 215 in f923853
Won't match ## METRICS CLASS picard.sam.DuplicationMetrics
For low coverage runs, ESTIMATED_LIBRARY_SIZE
is left empty. Unfortunately, the Picard gods did not see fit to add a \t
to signify an empty field, so the error below is raised
bam-qc-metrics/bam_qc_metrics/bam_qc.py
Line 240 in f923853
Tests are a little slow now at ~60s. Speed up by using a smaller input dataset for non-critical tests.
Using the version update script, or even editing the VERSION file without updating test data, causes apparently unrelated errors. See JIRA ticket: https://jira.oicr.on.ca/browse/GP-2242
Until such time as this issue is resolved, the script update_test_data_version.py
is deprecated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.