GithubHelp home page GithubHelp logo

Comments (5)

brentp avatar brentp commented on June 16, 2024 1

Hi, I tagged a release here: https://github.com/brentp/goleft/releases/tag/v0.2.6

let me know if you have any further suggestions. I see what you mean about increasing the number of sampled reads.

from goleft.

brentp avatar brentp commented on June 16, 2024

thanks for the clear report and diagnosis.
I have pushed a fix. would you try the attached binary (gunzip, chmod +x and ./goleft_linux64 covstats ... )
and verify it looks good for you? If so I will make a new release.
goleft_linux64.gz

from goleft.

aofarrel avatar aofarrel commented on June 16, 2024

Updated version

coverage insert_mean insert_sd insert_5th insert_95th template_mean template_sd pct_unmapped pct_bad_reads pct_duplicate pct_proper_pair read_length bam sample
4.34 -0.44 17.25 -29 29 299.57 17.24 95.33 0 0 4.6 150 SAMN18146222_to_H37Rv.bam SAMN18146222
3.93 -0.5 17.25 -28 28 299.47 17.47 97.35 0 0 2.6 150 SAMN18146202_to_H37Rv.bam SAMN18146202
18.61 -0.55 17.35 -29 29 299.45 17.36 0.2 0 0 98.7 150 SAMN18146198_to_H37Rv.bam SAMN18146198

Previous version

coverage insert_mean insert_sd insert_5th insert_95th template_mean template_sd pct_unmapped pct_bad_reads pct_duplicate pct_proper_pair read_length bam sample
4.34 -0.44 17.25 -29 29 299.57 17.24 2043.06 0 0 98.6 150 SAMN18146222_to_H37Rv.bam SAMN18146222
3.93 -0.5 17.25 -28 28 299.47 17.47 3671.93 0 0 98.9 150 SAMN18146202_to_H37Rv.bam SAMN18146202
18.61 -0.55 17.35 -29 29 299.45 17.36 0.2 0 0 98.9 150 SAMN18146198_to_H37Rv.bam SAMN18146198

TBProfiler

TBProfiler is was also run on the fastqs before variants were called. Because it uses a different aligner, it can be expected that its results may differ from covstats, but they should be similar since they align to the same reference genome (H37Rv).

sample % mapped 100 - % mapped median coverage (always rounds to integer)
SAMN18146222 18.45% 81.55% 4
SAMN18146202 16.8% 83.2% 4
SAMN18146198 99.83% 0.17% 18

The fact TBProfiler seems to be saying 81.55% of a sample is unmapped while neo-covstats says 95.33% of a sample is unmapped is worth noting, but I don't think it's unreasonable, especially considering these are samples specifically designed to be ornery.

from goleft.

brentp avatar brentp commented on June 16, 2024

Hi, thanks for following up. You could try increasing the number of sampled reads, e.g.:

goleft covstats -n 10000000 ...

to see if it converges to match TBProfiler a bit better.

from goleft.

aofarrel avatar aofarrel commented on June 16, 2024

Sorry for the slow response, this fell off my radar. The samples I'm processing vary hugely in size -- we're running this on almost every Illumina-processed tuberculosis sample on SRA, and some of that is in a bit of a rough state -- so we're concerned adjusting the number of sampled reads may cause issues with the smaller samples.

In any case, this is definitely much more accurate than what we were seeing earlier. Would it be appropriate to make a release?

from goleft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.