GithubHelp home page GithubHelp logo

bccdc-phl / tbprofiler-nf Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 2.0 4.94 MB

Nextflow Wrapper for TBProfiler

Python 60.15% Nextflow 39.85%
bioinformatics-pipeline microbial-genomics tuberculosis-classification

tbprofiler-nf's People

Contributors

dfornika avatar sherrie9 avatar taranewman avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tbprofiler-nf's Issues

Coverage plotting script fails on negative controls

If a negative control (containing reads but few/zero TB reads) is put through the pipeline, it crashes:

plot_coverage.py", line 66, in plot_coverage 
sum(1 if x > threshold else 0 for x in depths.depth)
ZeroDivisionError: division by zero

Coverage summary

Generate a depth-of-coverage summary for each isolate, including:

  • Configurable depth threshold
  • Percent of genome that are below the depth threshold
  • .bed file showing low-coverage regions

...and possibly also

  • A .pdf plot of depth of coverage across the genome

Correct snpit result-checking script

In #21 I accidentally merged before committing my last update to the check_snpit_against_tbprofiler.py script.

We switched from our initial strategy of looking for H37Rv in the tbprofiler lineages to instead looking for lineage4 as the main lineage and looking for >98% reads mapped. We found that the initial approach worked on our positive controls (which are H37Rv) but not on some samples that are close to H37Rv but don't have H37Rv in their tbprofiler reports.

Update that script to reflect the new strategy.

add provenance

We'd like to capture as much 'data provenance' information as possible in our production analysis pipelines. Information such as tool versions, input file checksums, analysis timestamps and database versions should be captured in some persistent form so they can be used to track and reproduce analyses.

Transpose resistance report

TBProfiler generates a resistance report that lists each drug on a separate row. This makes it a bit difficult to gather together resistance info from multiple samples. It would be convenient to have a resistance report that is structured to be one row per library.

tbprofiler error

It is not an error of tbprofiler-nf. but some samples can cause tbprofiler to fail, for unclear reasons.
Could we add ignore errors for those samples so the pipeline can run on the rest of the samples?

Pretty-print tbprofiler_full_report.json file when writing file

Our tbprofiler_full_report.json files don't have any newlines or indentation. They're fairly large/complex documents so this makes it difficult for a person to visually read the files directly. They can be re-formatted with a text editor, but it would be simpler and more convenient to write the files in "pretty-printed" form (with newlines and indentation).

One way to do this is to just stream the file through python's json.tool module before writing to disk:

On this line:

cp results/${sample_id}.results.json ${sample_id}_tbprofiler_full_report.json

We could instead do:

cat results/${sample_id}.results.json | python -m json.tool > ${sample_id}_tbprofiler_full_report.json

Add snpit for subspecies calling

We'd like to incorporate subspecies calling into our analysis, using philipfowler/snpit

In order to do this, we'll need to collect the whole genome .vcf files from TBProfiler and feed those into snpit. We had previously only been collecting the target-specific .vcf files from TBProfiler.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.