GithubHelp home page GithubHelp logo

oicr-gsi / bam-qc-metrics Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 10.31 MB

Metrics for BAM file QC

License: GNU General Public License v3.0

Python 100.00%
bioinformatics bioinformatics-pipeline samtools

bam-qc-metrics's People

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bam-qc-metrics's Issues

Remove downsampling and filtering?

In new versions of the bam-qc workflow, filtering and downsampling will be done upstream by other workflow tasks. So, the filtering/downsampling capabilities of bam-qc-metrics itself will no longer be used. We could:

  • Remove the downsampling functionality from bam-qc-metrics to simplify code
  • Remove output fields referring to filtering/downsampling results in bam-qc-metrics, to simplify output and reduce potential confusion. (Eg. "total reads", "unmapped reads".)

Low-priority, but could be useful. Should we do it?

trim_quality usage

This variable is used

result = pysam.stats("-q", str(self.trim_quality), self.bam_path)

where it means -q, --trim-quality INT The BWA trimming parameter (https://sourceforge.net/p/bio-bwa/mailman/message/25597301/)

The same variable is compared to the read MAPQ value

if self.trim_quality != None and read.mapping_quality < self.trim_quality:

These two uses are incompatible.

samtools uses the -q flag differently depending on context. In samtools view: -q INT Skip alignments with MAPQ smaller than INT [0].

Failure with incompatible reference file

If sequences in the BAM file do not appear in the given alignment reference, analysis dies (see below for error).

Make a more informative error message, or (if possible) prevent the error from happening.

(bamqc) ibancarz@ld5312-ibanca:~/playground/bamqc_test_data/20180816/A00469_0047/test$ run_bam_qc.py -b ../../../SWID_14343630_TGL41_0004_nn_R_PE_320_CM_HMC_4_190531_M00146_0054_000000000-D6CW8_GTTACGCA-ATCGCCAT_L001_001.annotated.bam -o test.json -t ../../../hg19_random.genome.sizes.bed -r ../../../hg19.fa [E::faidx_adjust_position] The sequence "chr1_gl000192_random" not found

Version update script

Small utility script to update the workflow version number, in JSON files of expected test data.

Failure if --target not given

Analysis crashes if the --target option is not specified, as follows:

(bamqc) ibancarz@ld5312-ibanca:~/playground/bamqc_test_data/20180816/A00469_0047/test$ run_bam_qc.py -b ../../../SWID_14343630_TGL41_0004_nn_R_PE_320_CM_HMC_4_190531_M00146_0054_000000000-D6CW8_GTTACGCA-ATCGCCAT_L001_001.annotated.bam -o test.json Traceback (most recent call last): File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bin/run_bam_qc.py", line 138, in <module> main() File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bin/run_bam_qc.py", line 134, in main qc = bam_qc(config) File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bam_qc_metrics/bam_qc.py", line 122, in __init__ fast_finder.read_length_summary()) File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bam_qc_metrics/bam_qc.py", line 625, in __init__ self.metrics = self.evaluate_all_metrics() File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bam_qc_metrics/bam_qc.py", line 630, in evaluate_all_metrics self.evaluate_bedtools_metrics(), File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/bam_qc_metrics/bam_qc.py", line 642, in evaluate_bedtools_metrics metrics['number of targets'] = targetBedTool.count() File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/pybedtools/bedtool.py", line 2507, in count return sum(1 for _ in iter(self)) File "/home/ibancarz/playground/bam-qc-metrics-v0.1.6/pybedtools/bedtool.py", line 2507, in <genexpr> return sum(1 for _ in iter(self)) File "pybedtools/cbedtools.pyx", line 754, in pybedtools.cbedtools.IntervalIterator.__next__ TypeError: NoneType object is not an iterator

read_mark_duplicates_metrics

msg = "Failed to parse duplicate metrics path %s, section %d, line %d" % params

line will be a string, not digit

Example output for a MiSeq analysis

GLCS_0001_Lv_R_PE_279_WG|1|190305_M00146_0024_000000000-D5N29.txt

if re.match('## METRICS CLASS\s+net\.sf\.picard\.sam\.DuplicationMetrics', line):

Won't match ## METRICS CLASS picard.sam.DuplicationMetrics

For low coverage runs, ESTIMATED_LIBRARY_SIZE is left empty. Unfortunately, the Picard gods did not see fit to add a \t to signify an empty field, so the error below is raised

raise ValueError("Key and value lists from %s are of unequal length" % input_path)

Speed up tests

Tests are a little slow now at ~60s. Speed up by using a smaller input dataset for non-critical tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.