GithubHelp home page GithubHelp logo

brentp / goleft Goto Github PK

View Code? Open in Web Editor NEW
207.0 16.0 24.0 5.22 MB

goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary

License: MIT License

Go 89.81% Shell 5.28% Python 4.91%
genomics bioinformatics coverage golang depth

goleft's Introduction

goleft

Build Status

goleft is a collection of bioinformatics tools written in go distributed together as a single binary under a liberal (MIT) license.

Running the binary goleft will give a list of subcommands with a short description. Running any subcommand without arguments will give a full help for that command.

Installation

The easiest way to install goleft is to download the latest binary from the releases and make sure to chmod +x the resulting binary.

If you are using go, you can build from source with:

go get -u github.com/brentp/goleft/...
go install github.com/brentp/goleft/cmd/goleft

goleft is also available in bioconda

Commands

  • covstats : estimate coverage and insert-size statistics on bams by sampling
  • depth : parallelize calls to samtools in user-defined windows
  • depthwed : matricize output from depth to n-sites * n-samples
  • indexcov : quick coverage estimate using only the bam index
  • indexsplit : generate regions of even data across a cohort (for parallelization)
  • samplename: report samplename(s) from a bam's SM tag

goleft's People

Contributors

arq5x avatar brentp avatar chapmanb avatar colindaven avatar timtribu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goleft's Issues

How to generate the X-Y plots?

Hi all,
I have some bam files and corresponding bai files (of unknown gender).
I used the indexcov function. But didn't get any X-Y plot. What are the arguments to be supplied for generating X-Y plot. Getting some error as:
ngslab@ngslab-OptiPlex-3050:~/Downloads/goleft-master/indexcov/input$ goleft indexcov --directory /home/ngslab/Downloads/goleft-master/indexcov/ '/home/ngslab/Downloads/goleft-master/indexcov/input/NA-180.bam' 2018/10/19 16:18:02 indexcov: running on 1 indexes 2018/10/19 16:18:04 indexcov: found chromosome "chrX", wanted "X" please use exact chromosome names for --sex. 2018/10/19 16:18:04 indexcov: found chromosome "chrY", wanted "Y" please use exact chromosome names for --sex. (WARNING) indexcov: expected 2 sex chromosomes, found: 0. you can set the expected with --sex '' 2018/10/19 16:18:04 sex chromosomes not found. 2018/10/19 16:18:04 got: 1 principal components 2018/10/19 16:18:04 indexcov: 1 principal components, not plotting indexcov finished: see /home/ngslab/Downloads/goleft-master/indexcov//index.html for overview of output
Please help me to resolve the issue.

Interactive indexcov-depth html's missing (and no warning given for 100 input files)

Hi Brent,

I've installed version 0.1.17 as binary and ran indexcov on 100 BAM files (and for testing also on 100 corresponding crai files). I get all expected files except interactive depth html files. indexcov completed without or error or warning. Digging in the code, it seems that maxSamples=100 is a hardcoded cutoff for these files, which is fine (would be nice to have a command line option), but there must be an off by one error, because if it's exactly 100 no warning is printed and no html files are created.

Thanks a lot for goleft!

Andreas

indexcov: incorrect results on chrM / small contig

Wanted to check coverage on hg19/chrM. Ran the following command w/ latest master goleft:
~/code/goleft/bin/goleft indexcov --directory possorted_bam possorted_bam.bam

chrM isn't in the HTML output, even though it doesn't appear to match the default --excludepatt.
chrM does appear in the bed file, but the results seem to be incorrect. Is there a problem getting good results on small chromosomes?

$ zmore possorted_bam/possorted_bam-indexcov.bed.gz | grep chrM
chrM 0 16384 6

$ samtools view possorted_bam.bam chrM:0-20000 | wc -l
212317980

call to unintended version of samtools

Testing goleft on Ubuntu 16.04 under zsh shell 5.1.1.

I was very confused with this error:

depth: invalid option -- 'd'
open: No such file or directory
depth: invalid option -- 'd'
open: No such file or directory
ERROR with command: Command('echo 'chrM:1-16571'; samtools depth -Q 1 -d 2500 -r 'chrM:1-16571' '/data/...', , stdout[:20]: '/data/', exit-code: -1, error: signal: segmentation fault (core dumped), run-time: 1.616101091s)

Some checking I found out that I have serveral versions of samtools lying around with various paths. I want to run goleft with a lastest samtools from bioconda as following:

export PATH="/path/to/bionconda/samtools:$PATH".
then call goleft. 

Goleft, however, does not know about the samtools in "/path/to/bioconda/samtools". It keeps using samtools in /usr/local/bin. When I removed the samtools in /usr/local/bin, it then uses the one in /usr/bin. I had to remove all other old version of samtools.

Should there be an option to specify full samtools path?

indexcov: Unexpectedly long run time

I interrupted goleft indexcov after 43 minutes, whereas samtools depth completed in 2 minutes.

❯❯❯ time goleft indexcov foo.bam
^C interrupt
real	42m56.588s
user	44m41.414s
sys	0m26.671s
❯❯❯ time samtools depth -a foo.bam
real	2m11.683s
user	2m3.448s
sys	0m3.460s
❯❯❯ du -h foo.bam
3.0G	foo.bam

The BAM file is of reads aligned to a de novo assembled draft genome with 205 Mbp in 1.6 million contigs. The files reside on an NFS file system. Any thoughts?

indexcov error

Hi,

I'm trying to run indexcov on my WGS samples. Here is the error I got when I execute

panic: parsing time "2010-10-19T00:00:00.000+00:00": extra text: +00:00: line 88: "@RG\tID:DKFZ:100630_SN143_0256_A15006043_5\tPL:ILLUMINA\tCN:DKFZ\tPI:353\tDT:2010-10-19T00:00:00.000+00:00\tLB:WGS:DKFZ:ICGC_BL12\tSM:52c198b4-7bda-4f81-8101-a322787a10a6\tPU:DKFZ:100630_SN143_0256_A15006043_5\tPG:fastqtobam"

goroutine 1 [running]:
github.com/brentp/goleft/indexcov.RefsFromBam(0x7fff9331bb5e, 0x48, 0x0, 0x0, 0x1, 0x0, 0xc420174e10)
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:294 +0x509
github.com/brentp/goleft/indexcov.getReferences(0x7fff9331bb52, 0xb, 0xc420177001)
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:311 +0xb9
github.com/brentp/goleft/indexcov.Main()
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:340 +0x20e
main.main()
        /home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:68 +0x17f

Is something wrong with BAM files or any dependencies are missing in my machine? Help would be greatly appreciated.

Thank you.

indexcov returns "no reference stats found"

Hi Brent,
I try to run goleft indexcov on two small BAM files, that we store on GitHub here.
But unfortunately I get this error (I cutted it, first line below is repeated from 0th to 85th):

2018/10/10 15:07:33 no reference stats found for 79th reference
2018/10/10 15:07:33 no reference stats found for 80th reference
2018/10/10 15:07:33 no reference stats found for 81th reference
2018/10/10 15:07:33 no reference stats found for 82th reference
2018/10/10 15:07:33 no reference stats found for 83th reference
2018/10/10 15:07:33 no reference stats found for 84th reference
2018/10/10 15:07:33 no reference stats found for 85th reference
2018/10/10 15:07:33 indexcov: running on 1 indexes
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x8568f6]

goroutine 1 [running]:
github.com/brentp/goleft/indexcov/crai.(*Index).Sizes(0x0, 0x0, 0x0, 0x0)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/crai/crai.go:46 +0x26
github.com/brentp/goleft/indexcov.(*Index).init(0xc0001a6410)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:83 +0x471
github.com/brentp/goleft/indexcov.(*Index).NormalizedDepth(0xc0001a6410, 0x0, 0x30d40, 0xc00038e000, 0x0)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:124 +0x1ed
github.com/brentp/goleft/indexcov.run(0xc0002ac000, 0x56, 0x80, 0xc0000de0d0, 0x1, 0x1, 0xc00019e6a0, 0x1, 0x1, 0xc00034c580, ...)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:559 +0x829
github.com/brentp/goleft/indexcov.Main()
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:402 +0x51e
main.main()
	/home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:68 +0x179

I have no idea about what is happening here...

indexcov: limit to standard chromosomes

Brent;
Would it be possible to add a general option to limit goleft indexcov output to standard chromosomes (1-22 + gender)? The current GL removal works for GRCh37 but not hg19 or hg38 with chr prefixes and won't handle other non-standard alt contigs. I'd be happy to pass a list of chromosomes to the command line so the tool itself doesn't need to hard code these, but it seems like the cur --chr option is meant for specifying a single chromosome only. Thanks for any thoughts/suggestions.

indexcov support for csi indexes

Hi @brentp. Any chance to add support for CSI indexes for indexcov? Generating lots of assemblies, and many have larger scaffolds, which require the CSI rather than BAI...

indexcov: producing .html but no other files

I'm running indexcov on three test .bam files. My code is
goleft indexcov -d ./ *.bam --sex ''

I get the following output
2017/08/09 16:52:39 indexcov: running on 3 indexes
2017/08/09 16:52:42 sex chromosomes not found.
2017/08/09 16:52:43 got: 3 principal components
indexcov finished: see .//index.html for overview of output

No other files except index.html file are produced and the coverage plots are missing from the .html output.

The .bam.bai files can be found here https://osf.io/c4bdx/

Any help would be greatly appreciated.

Feature request: option to ignore sex chromosomes?

Great tool - we've been using it to look at coverage on some plant genome samples. One minor issue is that right now we have to specify dummy sex chromosomes. It would be great to have a flag to just ignore that option.

coverage of the customized genome

Dear Brent;

I build a customized genome sequence by adding a new sequence "VectorA" (~ 10kb) to hg19, and I would like to have coverage of the VectorA via your tool indexcov. I have tried different combination of the command arguments, including --includegl and -c. However, none of them to have the VectorA depth either in the plot or in the file indexcov-indexcov.bed.gz. The vector is less than 16k, so it is reasonable there is no plot. But could you tell me how to have the result in *-indexcov.bed.gz file?

Thanks,

Wei

index out of range

I ran into this error:
$ goleft_linux64 indexcov --prefix merged/30J.rmdup.bam
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/brentp/goleft/indexcov.Main()
/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:218 +0xe70
main.main()
/home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:65 +0x191

Could this be due to the fact that our chromosome names are not in "chr" format? Is there any way to change the defaults so that we can use genomes without chromosomal scaffolds? Here is the output of samtools idxstats on the "30J.rmdup.bam" alignment file:
CM002977.3 225584828 8338981 465136
CM002980.3 204787373 7537439 390998
CM002984.2 185818997 6517117 332638
CM002983.2 172585720 6076609 306025
CM002981.2 190429646 6355739 298694
CM002982.3 180051392 6521339 342746
CM002991.3 169600520 6305670 350299
CM002985.3 144306982 5341132 285889
CM002987.3 129882849 4826300 276518
CM002992.3 92844088 3560246 217738
CM002989.3 133663169 4869039 254070
CM002979.2 125506784 4406793 224347
CM002978.2 108979918 4114634 231117
CM002988.2 127894412 4823764 274264
CM002986.2 111343173 4088611 220325
CM002994.2 77216781 2713115 147784
CM002990.2 95684472 3447325 168426
CM002995.2 70235451 2527018 132779
CM002996.3 53671032 1790224 86788
CM002993.2 74971481 2857223 168165
CM002997.3 149150640 5750773 315248
CM003438.1 11753682 38388 3424

Thanks,
Noah

new tool: vcfeval

should take a vcf and a .ped and show transmission rates, rates of denovos for various cutoffs.

a cutoff can be variant quality, genotype quality, depth, etc.

should output in a way that makes it possible to make something like a ROC curve.

panic: bam: magic number mismatch

Hi I am getting this error while running indexCov with a folder of BAM files. Is their a way to find out that which BAM has a problem?

goleft indexcov --directory indexCovQGP/ indexQGP/*.bam
panic: bam: magic number mismatch

goroutine 12 [running]:
github.com/brentp/goleft/indexcov.readIndex(0x7fff6666d84a, 0x27, 0x129, 0xc454a61101, 0xc45995ab20, 0x12, 0x125)
/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:313 +0x2d5
github.com/brentp/goleft/indexcov.Main.func1(0xc420032840, 0xc4202e8000, 0xcef, 0xcef, 0xc42017f500, 0xcef, 0xcef, 0xc420213c10)
/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:270 +0xaa
created by github.com/brentp/goleft/indexcov.Main
/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:275 +0x5e4

indexcov: select or view individual sample tracks

When dealing with many samples, the bam coverage plots in indexcov quickly become rather noisy. Currently I can try and highlight tracks to get the sample, but this doesn't always distinguish the track from all the others. Curious if there might be a way to deselect sample tracks or rather just select the desired samples that you want to view for more individualized inspection? In my case I was hoping to view specific samples for potential large-scale duplications or deletions. Or maybe some kind of zoom in feature or something like that would suffice.

Synchronizing depth callable output with bcbio

Brent;
Thanks again for this implementation. I've got it integrated into bcbio and working on checking differences with current callable calculation output. I found a couple of differences I wasn't able to resolve with my poor golang skills. An example run is here:

wget https://s3.amazonaws.com/chapmanb/testcases/goleft_callable.tar.gz

when comparing against current bcbio output I get two major differences. The first is that initial blocks defined by the input BED file have an extra base relative the input regions (99 instead of 100 for the first block here):

@@ -1,6 +1,6 @@
-chrM   100     1000    CALLABLE
-chrM   2000    5000    CALLABLE
-chr22  14250   14257   LOW_COVERAGE
+chrM   99      1000    CALLABLE
+chrM   1999    5000    CALLABLE
+chr22  14249   14257   LOW_COVERAGE
 chr22  14257   14258   NO_COVERAGE
 chr22  14258   14270   LOW_COVERAGE
 chr22  14270   14277   CALLABLE

This is caused by the -1 when retrieving from the cache, but I couldn't figure out the right way to push these initial blocks onto the cache with a +1 for the start.

The second is that the depth output is missing some NO_COVERAGE regions. I guess this is due to not using -a for depth:

@@ -15,6 +15,5 @@
 chr22  14487   14495   LOW_COVERAGE
 chr22  14495   14496   CALLABLE
 chr22  14496   14595   LOW_COVERAGE
-chr22  14595   15068   NO_COVERAGE
 chr22  15068   15128   LOW_COVERAGE
 chr22  15128   15500   CALLABLE

Thanks for any thoughts and suggestions.

goleft depth sliding window analysis

Hi there,
I'm using "goleft depth" for estimating the sequencing depth of sliding window (10 kb window and 2 kb step) on the genome. I tried to use bed file "--bed" with sliding window information, however, the result looks like this:

Scaffold_3 0 10000 1
Scaffold_3 2000 10000 0
Scaffold_3 4000 10000 0
Scaffold_3 6000 10000 0
Scaffold_3 8000 10000 0
Scaffold_3 10000 12000 0
Scaffold_3 10000 14000 0
Scaffold_3 10000 16000 0

So I was wondering if I use the bed file properly. My command is as follows:
goleft_linux64 depth --reference genome.fasta --prefix sample1 sample1_sorted.bam --processes 20 --windowsize 10000 --mincov 0 --bed genome.windows.bed &

Thank you for any advice on this!

YY

Sambamba Comparison

Have you compared goleft to Sambamba?

There are a few small issues with it in terms of flexibility, but would you say goleft performs better or provides orthogonal functionality? (For example, one issue is that sambamba view can't take bitwise flags, unlike samtools view.)

covmed overestimates coverage?

I've noticed that covmed estimates higher median coverage than other tools. For example for a particular whole genome covmed estimates 33.4, while Picard CollectWgsMetrics estimates 27.
I've performed similar calculations on exomes where I get median coverage of 199.71 with covmed (using the region argument) compared with 189 using bedtools (take the median of counts per base over target region). I've found consistently higher results from covmed compared with picard and bedtools across a number of exomes and genomes. The size of the difference is variable.

I wonder if you have any idea why this is occurring?

One possibility that springs to mind for exomes in particular is that reads outside the target region could be counted and so cause it to overestimate the coverage.

indexcov --sex ''

Is it somehow possible to disable the sex chromosome plotting or reporting ? Having worked well on mammals I'd like to apply it to bacteria (for global missing regions detection).

Is that in scope ?

Thanks
Colin

Running indexcov on small input BAMs

Brent;
I'm running into a goleft indexcov error when running small integration tests with goleft indexcov. I'm guessing this is due to the file being too small to have reasonable bins but it would be great if it didn't fail for these edge cases so I don't have to add checks about when to run it.

Here is a small test case with a run.sh to reproduce the problem:

wget https://s3.amazonaws.com/chapmanb/testcases/goleft_indexcov_small.tar.gz

Let me know if I'm doing anything wrong on my side, looking forward to having these coverage estimates integrated.

2017/01/18 14:23:13 indexcov: running on 1 indexes
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/brentp/goleft/indexcov.(*Index).init(0xc4200177d0)
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:70 +0x3df
github.com/brentp/goleft/indexcov.(*Index).NormalizedDepth(0xc4200177d0, 0x0, 0x0, 0x40bb, 0x0, 0x30d40, 0x1)
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:86 +0x304
github.com/brentp/goleft/indexcov.run(0xc42010a420, 0x2, 0x2, 0xc42000c0d8, 0x1, 0x1, 0xc42010a650, 0x1, 0x1, 0x0, ...)
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:315 +0x73c
github.com/brentp/goleft/indexcov.Main()
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:263 +0x99d
main.main()
        /home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:65 +0x191

covstats supporting cram

Any chance of covstats supporting cram format? Unsurprisingly, when I tried I got an error:

$ goleft covstats input.cram

panic: gzip: invalid header

goroutine 1 [running]:
github.com/brentp/goleft/covstats.pcheck(0xa8ae60, 0xc4200d8410)
	/home/brentp/go/src/github.com/brentp/goleft/covstats/covstats.go:28 +0x4a
github.com/brentp/goleft/covstats.Main()
	/home/brentp/go/src/github.com/brentp/goleft/covstats/covstats.go:231 +0x913
main.main()
	/home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:68 +0x17f```

indexcov Unable to run on contigs without a sex chr

Hi,

I wanted to test indexcov on a sample I have, but it seems this tool does not function without a sex chromosome present. It wasn't clear in the documentation that sex chromosomes need to be present for the tool to function, and it's also not clear how to provide an argument to the --sex parameter.

$ goleft_linux64_v0.1.18 indexcov --directory VirusA_results VirusA_bwa_alignment.bam
2018/03/27 10:48:12 indexcov: running on 1 indexes
(WARNING) indexcov: expected 2 sex chromosomes, found: 0.
you can set the expected with --sex ''
2018/03/27 10:48:12 sex chromosomes not found.
2018/03/27 10:48:12 got: 0 principal components
2018/03/27 10:48:12 indexcov: 0 principal components, not plotting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x8fe719]

goroutine 1 [running]:
github.com/brentp/goleft/indexcov.plotBins(0xc42000c148, 0x1, 0x1, 0xc4200672f0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, ...)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/plot.go:194 +0x5c9
github.com/brentp/goleft/indexcov.writeIndex(0xc420075e60, 0xc42000c148, 0x1, 0x1, 0xc4200f06c0, 0x2, 0x2, 0xc4200672f0, 0x1, 0x1, ...)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:697 +0x2b1
github.com/brentp/goleft/indexcov.Main()
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:396 +0x82f
main.main()
	/home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:68 +0x17f
$ goleft_linux64_v0.1.18 indexcov --sex 0 --directory VirusA_results VirusA_bwa_alignment.bam
2018/03/27 10:48:30 indexcov: running on 1 indexes
2018/03/27 10:48:30 (FATAL) indexcov: expected 1 sex chromosomes, found: 0.
you can set the expected with --sex ''
$ goleft_linux64_v0.1.18 indexcov --sex false --directory VirusA_results VirusA_bwa_alignment.bam
2018/03/27 10:49:07 indexcov: running on 1 indexes
2018/03/27 10:49:07 (FATAL) indexcov: expected 1 sex chromosomes, found: 0.
you can set the expected with --sex ''

Regards,
Mahesh.

indexcov: panic parsing time

Seems like +00:00 is breaking the parsing of the time RG tag. Not sure if + is allowed here, I would assume it is.

$ goleft indexcov --directory goleft_output/ my.bam
panic: parsing time "2017-08-28T14:53:35.804417+00:00": extra text: +00:00: line 88: "@RG\tID:MY_SAMPLE\tSM:MY_SAMPLE\tLB:MY_SAMPLE\tPL:ILLUMINA\tPU:HNNKFAFXX-L004\tCN:GL\tDT:2017-08-28T14:53:35.804417+00:00"

goroutine 1 [running]:
github.com/brentp/goleft/indexcov.RefsFromBam(0x7fff82d67923, 0x38, 0x0, 0x0, 0x1, 0x0, 0xc42011ee10)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:294 +0x509
github.com/brentp/goleft/indexcov.getReferences(0x7fff82d67914, 0xe, 0xc420120f01)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:311 +0xb9
github.com/brentp/goleft/indexcov.Main()
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:340 +0x20e
main.main()
	/home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:68 +0x17f

0.1.9 release linux binary is actually 0.1.6?

$ wget https://github.com/brentp/goleft/releases/download/v0.1.9/goleft_linux64 -O goleft
$ chmod +x goleft
$ goleft -h
goleft Version: 0.1.6

depth : parallelize calls to samtools in user-defined windows

ndexcv: excluding chromosome: NC_031965 because of exclude-pattern: ^chrEBV$|^NC|_random$|Un_|^HLA\-|_alt$|hap\d

Hi Brent
I tried to run indexcov on my bam files but it excludes all my chromosomes but one which has a different naming.
I am working with an NCBI reference genome and chromosomes are named NC_12344. How can I overwrite the default setting which excludes this pattern?
I tried this way:
./goleft_linux64 indexcov --chrom NC_031983 --directory /scicore/home/salzburg/boehne/test/ /scicore/home/salzburg/boehne/*realn.bam
which did nor work and the same command without the --chrom flag
Any help would be great
Thanks
Astrid

indexsplit for 2000 samples fail

Hi, I am trying to run indexsplit for around 2000 samples with cram indices as input
Commandline works perfectly fine for 10 samples. Even for 20 samples, I get the following error.
Is there a limitation for number of samples? Thanks.

goleft indexsplit --n 5000 --fai Homo_sapiens_assembly38.fasta.fai 1.crai 2.crai 3.crai.......2000.crai

`panic:runtime error: index out of range

goroutine 19 [running]:
github.com/brentp/goleft/indexsplit.Split.func1(0xc4201c2200, 0x14, 0x20, 0xc4203aa000, 0xd26, 0xd26, 0xc4201c4060, 0x1388, 0x0)
/home/brentp/go/src/github.com/brentp/goleft/indexsplit/indexsplit.go:101 +0x1058
created by github.com/brentp/goleft/indexsplit.Split
/home/brentp/go/src/github.com/brentp/goleft/indexsplit/indexsplit.go:84 +0xc3
`
Regards

panic in crai.makeSizes

Trying to run indexcov (v0.1.18 from bioconda) on long read WGS (fai and crai attached) but keep getting an obscure error message:

$ goleft indexcov  --excludepatt '[a-zA-VZ]'  --fai hs37d5_viral.fa.fai --directory test 9370NK.filtered.sorted.cram.crai  
2018/04/26 09:39:33 -19749 16384 {2355931 707216 13182576 1395 3515035} 2355931 2375680
panic: logic error

goroutine 16 [running]:
github.com/brentp/goleft/indexcov/crai.makeSizes(0xc420211500, 0x13c, 0x220, 0xc420686000, 0xd5e, 0xd5e)
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/crai/crai.go:83 +0x958
github.com/brentp/goleft/indexcov/crai.(*Index).Sizes(0xc420184060, 0x7feb6fc45070, 0x450920, 0x7feb6fc9f000)
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/crai/crai.go:48 +0xc4
github.com/brentp/goleft/indexcov.(*Index).init(0xc420260050)
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:81 +0x465
github.com/brentp/goleft/indexcov.readIndex(0x7ffc2b92815b, 0x2d, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:431 +0x5a9
github.com/brentp/goleft/indexcov.Main.func1(0xc420256000, 0xc4201641a0, 0x1, 0x1, 0xc42025a000, 0x1, 0x1, 0xc4204ac140)
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:365 +0x8a
created by github.com/brentp/goleft/indexcov.Main
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:363 +0x386

bug_fai_crai.zip

Suggestions : paths ; case/control

Hi Brent , here are two suggestions for indexcov

  • using a file containing the path to the bams (to avoid something like xargs )

  • if we could include the fact that some samples are 'cases' or 'controls', would it improve your algorithm ?

thanks

goleft_linux64 covstats error: panic: EOF

Hi,

I got the following error messages when running goleft_linux64 covstats.

$ goleft_linux64 covstats CN000245.marked.realigned.recal.bam 
coverage	insert_mean	insert_sd	insert_5th	insert_95th	template_mean	template_sd	pct_unmapped	pct_bad_reads	pct_duplicate	pct_proper_pair	read_length	bam	sample
panic: EOF

goroutine 1 [running]:
github.com/brentp/goleft/covstats.pcheck(...)
	/home/brentp/go/src/github.com/brentp/goleft/covstats/covstats.go:30
github.com/brentp/goleft/covstats.Main()
	/home/brentp/go/src/github.com/brentp/goleft/covstats/covstats.go:249 +0xdd1
main.main()
	/home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:68 +0x179

Could you please help me with this? Please tell me if you need other information.

Bests,
Yiwei Niu

indexcov: "panic: sam: malformed header line"

Hi,
an issue similar to #33 ?

I got the following message for indexcov :

panic: sam: malformed header line: line 89: "@PG\tID:bwa\tPN:bwa\tVN:0.7.12-r1039\tCL:/commun/data/packages/bwa/bwa-0.7.12/bwa mem -t 10 -M -H @CO\t20180608.isidor.: Mapping de bams pour BI/ CHU-Nantes. Les Bams viennent de [email protected]  -R @RG\\tID:18D0609\\tLB:18D0609\\tSM:18D0609\\tPL:illumina\\tCN:Nantes /commun/data/pubdb/broadinstitute.org/bundle/1.5/b37/index-bwa-0.7.12/human_g1k_v37.fasta /mnt/beegfs/lindenb/WORK/2018/20180607.XXX.YY/FASTQS/18D0609_S1_R1_001.fastq.gz /mnt/beegfs/lindenb/WORK/2018/20180607.XX.YY/FASTQS/18D0609_S1_R2_001.fastq.gz"

goroutine 24 [running]:
github.com/brentp/goleft/indexcov.readIndex(0x7fffffffda38, 0x76, 0x39, 0xc4204d81e0, 0xc421e88590, 0x5, 0x35)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:460 +0x6ba
github.com/brentp/goleft/indexcov.Main.func1(0xc4201627e0, 0xc4200c1c00, 0x3a, 0x3a, 0xc42028a000, 0x3a, 0x3a, 0xc420229340)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:365 +0x8a
created by github.com/brentp/goleft/indexcov.Main
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:363 +0x386

$ ./goleft -v
goleft Version: 0.1.19

samtools view -c 18D0609.bam
works fine

Support for more than one RG?

Hi @brentp,
Love the indexcov tool! It would be very useful for cancer BAMs, but the limitation of allowing only one @rg record is a blocker for this. We tag each sequencing lane with its own @rg. Is this a restriction that can be lifted?

panic: bam reagroup: more than one RG for tumour.bam

cheers,
Mark

indexcov: panic: runtime error: index out of range

❯❯❯ goleft_linux64 indexcov -d indexcov foo.bam 
2017/05/18 15:33:16 indexcov: running on 1 indexes
(WARNING) indexcov: expected 2 sex chromosomes, found: 0.
you can set the expected with --sex ''
2017/05/18 15:33:17 sex chromosomes not found.
2017/05/18 15:33:17 got: 1 principal components
2017/05/18 15:33:17 indexcov: 1 principal components, not plotting
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/brentp/goleft/indexcov.plotSex(0xc4203f8840, 0xc4201591a0, 0x2, 0x2, 0xc420294600, 0x1, 0x1, 0x0, 0x0, 0x0, ...)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/plot.go:371 +0xc22
github.com/brentp/goleft/indexcov.writeIndex(0xc4203f8840, 0xc42000c130, 0x1, 0x1, 0xc4201591a0, 0x2, 0x2, 0xc420294600, 0x1, 0x1, ...)
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:742 +0x1cda
github.com/brentp/goleft/indexcov.Main()
	/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:382 +0x8ca
main.main()
	/home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:68 +0x191
❯❯❯ goleft_linux64 --version
goleft Version: 0.1.16

The BAM is of reads mapped to an assembly of those reads. I can send along the BAI file if it helps troubleshooting.

indexcov 0.1.11 failing on tiny test bam

Running the latest release of index cov on this test bam fails
tiny_bam.zip

user@d8fb9fcad6fb:/data$ goleft indexcov --directory  tmp/muster/NA12878_TinyTest out/NA12878_TinyTest.dupmarked.realigned.recalibrated.bam
2017/01/22 04:51:01 indexcov: running on 1 indexes
2017/01/22 04:51:02 got: 1, principal components
2017/01/22 04:51:02 indexcov: 1 principal components, not plotting
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/brentp/goleft/indexcov.writeIndex(0xc42023cd50, 0xc420258030, 0x1, 0x1, 0xebfc00, 0x2, 0x2, 0xc4202e0190, 0x1, 0x1, ...)
    /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:541 +0x29f5
github.com/brentp/goleft/indexcov.Main()
    /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:288 +0x857
main.main()
    /home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:64 +0x191
user@d8fb9fcad6fb:/data$ goleft --version
goleft Version: 0.1.11

covmed   : calculate median coverage on a bam by sampling
depth    : parallelize calls to samtools in user-defined windows
depthwed : matricize output from depth to n-sites * n-samples
indexcov : quick coverage estimate using only the bam index

thanks!

Incorporate indexcov into MultiQC

Brent;
The new indexcov command and outputs are fabulous, and a brilliant way to get a quick overview of coverage across chromosomes. What would you think about outputting raw data we could incorporate into MultiQC (https://github.com/ewels/MultiQC)?

MultiQC consolidates all the outputting reporting in bcbio into a single HTML and allows displays like tabbing which we could use to provide the multiple chromosome plots. It has built in interactive charts, cutomized hiding of samples and other nice features to scale up for bigger sample sizes.

Practically this would require dumping the outputs you currently plot into tab delimited files we could suck up as a MultiQC module for indexcov (http://multiqc.info/docs/#plotting-functions). I haven't compiled and run the latest indexcov but looking at the code it looks like you might already do some of this. Thanks for considering this approach for making indexcov outputs available.

indexcov: Handles at most 31 input files (bigger inputs result in nasty error message)

31 samples work fine but with 32 the output ends in:

2

017/11/06 10:57:13 indexcov: running on 32 indexes
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/brentp/goleft/indexcov.(*Index).NormalizedDepth(0xc4200b6820, 0x19, 0x0, 0xc488f646c0, 0x46)
/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:113 +0x1c6
github.com/brentp/goleft/indexcov.run(0xc4200f2840, 0x56, 0x56, 0xc4200d4300, 0x20, 0x20, 0xc4200d6600, 0x20, 0x20, 0xc48801d710, ...)
/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:520 +0x8fb
github.com/brentp/goleft/indexcov.Main()
/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:364 +0x529
main.main()
/home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:68 +0x17f

Feature request: New option --exclude for indexsplit

Hi Brent,

I would like to propose a new option for goleft indexsplit (and would have implemented it myself if my Go was any good): --exclude to be able to exclude chromosomes. A typical use case would be unplaced contigs, decoys etc. They are part of the BAM header, but usually you don't want to call variants on them. Sure, I can always remove them after running indexsplit, but then I cannot control the number of splits N properly.

Thanks,
Andreas

PS: Thanks for Goleft!

indexcov : won't work with multiple crais

Works fine with a single crai but appears to fail when running multiple crais. Found this error before?
~/src/go/bin/goleft indexcov -d SEQCAP_WGS_GDAP_Uganda/goleft --sex "chrX,chrY" --fai fasta/Homo_sapiens.GRCh38_full_analysis_set_plus_decoy_hla.fa.fai 21601_6.cram.crai 21722_8.cram.crai
panic: runtime error: makeslice: cap out of range

goroutine 23 [running]:
github.com/brentp/goleft/indexcov/crai.makeSizes(0xc421132f00, 0x1, 0x10, 0xc4204224d8, 0x0, 0x1)
/nfs/users/nfs_j/jpm/src/go/src/github.com/brentp/goleft/indexcov/crai/crai.go:64 +0x13d
github.com/brentp/goleft/indexcov/crai.(*Index).Sizes(0xc4202a5b20, 0xc4201340f0, 0x50, 0x50)
/nfs/users/nfs_j/jpm/src/go/src/github.com/brentp/goleft/indexcov/crai/crai.go:48 +0xa6
github.com/brentp/goleft/indexcov.(*Index).init(0xc4201340f0)
/nfs/users/nfs_j/jpm/src/go/src/github.com/brentp/goleft/indexcov/indexcov.go:80 +0x4cb
github.com/brentp/goleft/indexcov.readIndex(0x7ffd139db8b1, 0x33, 0x0, 0x1, 0x0, 0x0, 0x0)
/nfs/users/nfs_j/jpm/src/go/src/github.com/brentp/goleft/indexcov/indexcov.go:417 +0x67d
github.com/brentp/goleft/indexcov.Main.func1(0xc42012a3c0, 0xc4202a5ae0, 0x2, 0x2, 0xc4201299f0, 0x2, 0x2, 0xc420129a00)
/nfs/users/nfs_j/jpm/src/go/src/github.com/brentp/goleft/indexcov/indexcov.go:351 +0xaa
created by github.com/brentp/goleft/indexcov.Main
/nfs/users/nfs_j/jpm/src/go/src/github.com/brentp/goleft/indexcov/indexcov.go:356 +0x3b9

crai -- panic: runtime error: index out of range

I am trying to run goleft indexcov on crai files but it doesn't seem to work. Has anyone else found the same problem?

jpm@farm3-head3> goleft_linux64 indexcov --sex "chrX,chrY" -d goleft/ --fai fasta/Homo_sapiens.GRCh38_full_analysis_set_plus_decoy_hla.fa.fai 21772_6.cram.crai
panic: runtime error: index out of range

goroutine 22 [running]:
github.com/brentp/goleft/indexcov/crai.ReadIndex(0xc80c80, 0xc420160580, 0xc420160580, 0x0, 0x0)
/home/brentp/go/src/github.com/brentp/goleft/indexcov/crai/crai.go:159 +0x5f5
github.com/brentp/goleft/indexcov.readIndex(0x7ffe764c7913, 0x11, 0x0, 0x1, 0x0, 0x0, 0x0)
/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:412 +0x5e1
github.com/brentp/goleft/indexcov.Main.func1(0xc4201203c0, 0xc42011fa40, 0x1, 0x1, 0xc4201240a8, 0x1, 0x1, 0xc42011fa50)
/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:351 +0xaa
created by github.com/brentp/goleft/indexcov.Main
/home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:356 +0x3b9

goleft indexcov error parsing header on hg38 BAMs

Brent;
Thanks for the new goleft version. I'm testing on some hg38 runs and getting an issue parsing the BAM reference headers:

bash run_header_problem.sh
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/biogo/hts/sam.equalRefs(0xc42028d5e0, 0xc420464bd0, 0xc420466260)
        /home/brentp/go/src/github.com/biogo/hts/sam/reference.go:292 +0x4ad
github.com/biogo/hts/sam.(*Header).AddReference(0xc420132c60, 0xc420464bd0, 0x0, 0x0)
        /home/brentp/go/src/github.com/biogo/hts/sam/header.go:400 +0xbf
github.com/biogo/hts/sam.(*Header).DecodeBinary(0xc420132c60, 0xc640c0, 0xc42015a200, 0x0, 0x0)
        /home/brentp/go/src/github.com/biogo/hts/sam/parse_header.go:72 +0x49b
github.com/biogo/hts/bam.NewReader(0xc64e80, 0xc420108098, 0x2, 0x0, 0x0, 0x110)
        /home/brentp/go/src/github.com/biogo/hts/bam/reader.go:50 +0x115
github.com/brentp/goleft/indexcov.Main()
        /home/brentp/go/src/github.com/brentp/goleft/indexcov/indexcov.go:245 +0x302
main.main()
        /home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:64 +0x191

Normally the naming on the HLA contigs is the issue since they have both colon and asterisks which can mess up different parsing assumptions:

@SQ     SN:HLA-A*01:01:01:01    LN:3503 AH:*

Here is a reproducible test case:

https://s3.amazonaws.com/chapmanb/testcases/goleft_indexcov_hg38.tar.gz

Thanks much for looking at this, I need to try to get a biogo development environment setup so I can provide fixes directly.

Can it be used for NIPT samples?

I've some maternal blood cfDNA (containing both maternal and fetal DNAs) sequenced bam files. Whether this method can be used for predicting fetal sex?

Differential header stringency depending on file format

I noticed this in I'm observing differential error checking depending on whether or not the input file is a CRAM or a BAM. The files in question have multiple sample names listed (a different one for each @rg line). When the input is a CRAM, no error is thrown. When the input is a BAM, I see: panic: bam reagroup: more than one RG for /build/test.bam

At the moment, it seems as if indexcov doesn't check CRAM headers? i.e. https://github.com/brentp/goleft/blob/master/indexcov/indexcov.go#L202-L231

I assume this error is thrown because the assumption is that there is a single sample for the whole file and there isn't handling of multiple samples. What is being reported when these problem CRAMs are provided? Stats for all the samples pooled together?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.