gabaldonlab / jloh Goto Github PK
View Code? Open in Web Editor NEWA tool to extract LOH blocks from VCF, BAM and FASTA data
Home Page: http://jloh.readthedocs.io
License: GNU General Public License v3.0
A tool to extract LOH blocks from VCF, BAM and FASTA data
Home Page: http://jloh.readthedocs.io
License: GNU General Public License v3.0
Hey,
I ran into this error and do not know how to proceed
jloh extract --vcf /Users/emmannaemeka/Documents/cglabarata/cglbarata_analaysis/cglabara_analysis/ger_bam/jloh/b9_jlohcall_con.vcf --ref /Users/emmannaemeka/Documents/cglabarata/cglbarata_analaysis/cglabara_analysis/cbs138.fasta --bam /Users/emmannaemeka/Documents/cglabarata/cglbarata_analaysis/cglabara_analysis/ger_bam/b9.bam
[Thu Apr 13 15:33:29 2023] Preparing workspace...
[Thu Apr 13 15:33:29 2023] Running in default mode...
[Thu Apr 13 15:33:29 2023] Extracting heterozygous and homozygous SNPs...
[Thu Apr 13 15:33:29 2023] Found 3410 heterozygous SNPs and 59669 homozygous SNPs
[Thu Apr 13 15:33:29 2023] Getting hetero- and homozygous SNP density...
Traceback (most recent call last):
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1464, in
main(args, tmp_bams)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1450, in main
run_in_default_mode(args, tmp_bams)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1164, in run_in_default_mode
hetero_div, homo_div = hetero_and_homo_snp_densities(Vcfs["hetero"][0], Vcfs["homo"][0], args.ref)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 390, in hetero_and_homo_snp_densities
Het_snp_densities = calculate_chrom_snp_densities(Hetero_lines, ref)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 372, in calculate_chrom_snp_densities
Snps_by_chrom[snp[0]].append(snp[1])
KeyError: 'chr_A'
Hello @MatteoSchiavinato ,
Following a successfull installation of the tool, I was able to do a test run with jloh sim --fasta S_para.chrXII.fa
which had no errors.
However with the stats
option it failed. Please see the code and error below. Please advise.
Kind regards,
(base) abdurahman@Abdul-Rahmans-MacBook-Pro jloh % ./jloh stats --vcf test_data/out.ff.vcf
[Fri Oct 13 12:52:22 2023] Reading SNPs
[Fri Oct 13 12:52:22 2023] found 19 het SNPs and 114 homo SNPs
[Fri Oct 13 12:52:22 2023] Reading chrom lengths from VCF header
[Fri Oct 13 12:52:22 2023] Read 1 chromosome names and their lengths
[Fri Oct 13 12:52:22 2023] Calculating heterozygous SNP densities
Traceback (most recent call last):
File "/Users/abdurahman/jloh/src/stats", line 331, in <module>
main(args)
File "/Users/abdurahman/jloh/src/stats", line 263, in main
Het, Het_quant = calculate_chrom_snp_densities(Hetero_lines, Chrom_lengths, args)
File "/Users/abdurahman/jloh/src/stats", line 244, in calculate_chrom_snp_densities
df_quant = df.quantile(q=[0.05, 0.10, 0.15, 0.50, 0.85, 0.90, 0.95])
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/frame.py", line 10927, in quantile
res = data._mgr.quantile(qs=q, axis=1, interpolation=interpolation)
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1587, in quantile
blocks = [
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1588, in <listcomp>
blk.quantile(axis=axis, qs=qs, interpolation=interpolation)
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 1461, in quantile
result = quantile_compat(self.values, np.asarray(qs._values), interpolation)
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/array_algos/quantile.py", line 37, in quantile_compat
return quantile_with_mask(values, mask, fill_value, qs, interpolation)
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/array_algos/quantile.py", line 95, in quantile_with_mask
result = _nanpercentile(
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/array_algos/quantile.py", line 216, in _nanpercentile
return np.percentile(
File "<__array_function__ internals>", line 180, in percentile
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4166, in percentile
return _quantile_unchecked(
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4424, in _quantile_unchecked
r, k = _ureduce(a,
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 3725, in _ureduce
r = func(a, **kwargs)
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4593, in _quantile_ureduce_func
result = _quantile(arr,
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4710, in _quantile
result = _lerp(previous,
File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4527, in _lerp
diff_b_a = subtract(b, a)
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Hi
I managed to get jloh to run but this error came up. Find below the error message
/Users/emmannaemeka/Documents/bioiformatics_program/jloh/jloh extract --vcf b9_jlohcall_con.vcf --ref /Users/emmannaemeka/Documents/cglabarata/cbs138.fasta --bam /Users/emmannaemeka/Documents/cglabarata/cglbarata_analaysis/cglabara_analysis/ger_bam/b9.bam
[Mon Mar 27 20:31:58 2023] Preparing workspace...
[Mon Mar 27 20:31:58 2023] Running in default mode...
[Mon Mar 27 20:31:58 2023] Extracting heterozygous and homozygous SNPs...
[Mon Mar 27 20:31:59 2023] Found 3410 heterozygous SNPs and 59669 homozygous SNPs
[Mon Mar 27 20:31:59 2023] Getting hetero- and homozygous SNP density...
[Mon Mar 27 20:31:59 2023] Homozygous SNPs/kbp: 4.73
[Mon Mar 27 20:31:59 2023] Heterozygous SNPs/kbp: 0.31
[Mon Mar 27 20:31:59 2023] Creating a file with chromosome lengths...
[Mon Mar 27 20:31:59 2023] Done
[Mon Mar 27 20:31:59 2023] Creating temporary bam files by chromosome ...
[Mon Mar 27 20:32:52 2023] Done
[Mon Mar 27 20:32:52 2023] Clustering heterozygous and homozygous SNPs into blocks...
Traceback (most recent call last):
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1436, in
main(args, tmp_bams)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1423, in main
tmp = run_in_default_mode(args, tmp_bams)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1151, in run_in_default_mode
Het_blocks, Homo_blocks_REF, Homo_blocks_ALT = snps_to_bed_blocks(args, Vcfs["hetero"][0], Vcfs["homo"][0], genome_file, args.min_snps_kbp)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 525, in snps_to_bed_blocks
Het_bed_blocks = BedTool(Het_bed_blocks).merge(d=merge_len, c=4, o="sum")
File "/Users/emmannaemeka/mambaforge/lib/python3.9/site-packages/pybedtools/bedtool.py", line 923, in decorated
result = method(self, *args, **kwargs)
File "/Users/emmannaemeka/mambaforge/lib/python3.9/site-packages/pybedtools/bedtool.py", line 402, in wrapped
stream = call_bedtools(
File "/Users/emmannaemeka/mambaforge/lib/python3.9/site-packages/pybedtools/helpers.py", line 460, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:
bedtools merge -o sum -i /var/folders/qr/x2s0dbd137d0pg3c4sfcfhhm0000gn/T/pybedtools.huqezd4e.tmp -d 99 -c 4
Error message was:
Error: Unable to open file /var/folders/qr/x2s0dbd137d0pg3c4sfcfhhm0000gn/T/pybedtools.huqezd4e.tmp. Exiting.
Hi!
I'm trying to run jloh using docker, but I am running into some errors when following instructions (https://jloh.readthedocs.io/en/latest/usage/run_test_data.html)
for the test dataset provided.
For the "jloh stats" command, I get:
[Wed Jan 24 17:05:59 2024] Reading SNPs
[Wed Jan 24 17:05:59 2024] found 19 het SNPs and 114 homo SNPs
[Wed Jan 24 17:05:59 2024] Reading chrom lengths from VCF header
[Wed Jan 24 17:05:59 2024] Read 1 chromosome names and their lengths
[Wed Jan 24 17:05:59 2024] Calculating heterozygous SNP densities
Traceback (most recent call last):
File "/root/src/jloh/src/stats", line 331, in <module>
main(args)
File "/root/src/jloh/src/stats", line 263, in main
Het, Het_quant = calculate_chrom_snp_densities(Hetero_lines, Chrom_lengths, args)
File "/root/src/jloh/src/stats", line 244, in calculate_chrom_snp_densities
df_quant = df.quantile(q=[0.05, 0.10, 0.15, 0.50, 0.85, 0.90, 0.95])
File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 11834, in quantile
res = data._mgr.quantile(qs=q, interpolation=interpolation)
File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 1507, in quantile
blocks = [
File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 1508, in <listcomp>
blk.quantile(qs=qs, interpolation=interpolation) for blk in self.blocks
File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1587, in quantile
result = quantile_compat(self.values, np.asarray(qs._values), interpolation)
File "/usr/local/lib/python3.9/site-packages/pandas/core/array_algos/quantile.py", line 39, in quantile_compat
return quantile_with_mask(values, mask, fill_value, qs, interpolation)
File "/usr/local/lib/python3.9/site-packages/pandas/core/array_algos/quantile.py", line 97, in quantile_with_mask
result = _nanpercentile(
File "/usr/local/lib/python3.9/site-packages/pandas/core/array_algos/quantile.py", line 218, in _nanpercentile
return np.percentile(
File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4283, in percentile
return _quantile_unchecked(
File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4555, in _quantile_unchecked
return _ureduce(a,
File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3823, in _ureduce
r = func(a, **kwargs)
File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4721, in _quantile_ureduce_func
result = _quantile(arr,
File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4840, in _quantile
result = _lerp(previous,
File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4655, in _lerp
diff_b_a = subtract(b, a)
TypeError: unsupported operand type(s) for -: 'str' and 'str'
For the "jloh extract", everything runs as expected.
And for the "jloh plot", I get:
INFO: Pandarallel will run on 12 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
[Wed Jan 24 17:07:47 2024] Reading input information
[Wed Jan 24 17:07:47 2024] Quantizing heterozygosity in windows of 10000 bp
Parsing rows: 100.0%
[Wed Jan 24 17:07:47 2024] Sorting by genome coordinate
[Wed Jan 24 17:07:47 2024] Quantizing intervals in windows of 10000 bp
Parsing rows: 100.0%
[Wed Jan 24 17:07:47 2024] Sorting by genome coordinate
[Wed Jan 24 17:07:48 2024] Writing table to output
[Wed Jan 24 17:07:48 2024] Plotting
Plot command that was run:
Rscript /root/src/jloh/src/scripts/loh-bin-plots_one-ref.Rscript jloh_out/plot.LOH_rate.tsv jloh_out/plots by_chromosome /input/jloh.LOH_blocks.tsv 0.35,2000,750,250 REF,ALT \#F7C35C,\#EF6F6C,\#64B6AC,\#ffffff no plot max
Error in library(reshape2) : there is no package called ‘reshape2’
Execution halted
Can you please advise?
Thanks in advance
Hello, I'm getting numerous error when running jloh
on the test data.
version: 1.0.2
Command:
./jloh extract --vcf test_data/out.ff.vcf --bam test_data/out.fs.bam --ref test_data/S_para.chrXII.fa --threads 40
I have attached the full log:
jloh_error.log
Thanks in advance!
Hello @MatteoSchiavinato ,
Thanks for resolving the previous issues.
Using the new version, I was able to successfully ran the stats
command /jloh stats --vcf out.ff.vcf
on the test data.
I then tried command on my own snp data for the C. albicans WGS data called with gatk.
This gave the error below
Traceback (most recent call last): File "/Users/abdurahman/JLOH_1/jloh/src/stats", line 313, in <module> main(args) File "/Users/abdurahman/JLOH_1/jloh/src/stats", line 256, in main Hetero_lines, Homo_lines = hetero_and_homo_snps(args.vcf) File "/Users/abdurahman/JLOH_1/jloh/src/stats", line 93, in hetero_and_homo_snps dict = { annotations[i]:values[i] for i in range(0, len(annotations)) } File "/Users/abdurahman/JLOH_1/jloh/src/stats", line 93, in <dictcomp> dict = { annotations[i]:values[i] for i in range(0, len(annotations)) } IndexError: list index out of range
I tried with different samples and each gave the same error.
Any suggestions on this?
Kind regards,
Abdul-Rahman
Hello,
I was able to successfully pull the docker image using
docker pull cgenomics/jloh
However when I run the test command docker run --rm -t -i cgenomics/jloh --help
, I get this error below
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "--help": executable file not found in $PATH: unknown.
Please advice,
Regards,
Abdul-Rahman
Hello, I am getting an error when running jloh plot
on my dataset.
The command I used was:
./jloh plot --one-ref --loh jloh/jloh.LOH_blocks.tsv --het jloh/jloh.exp.het_blocks.bed --output-dir jloh/plot/
The error that I'm getting is:
[Tue Oct 31 15:32:29 2023] Quantizing heterozygosity in windows of 10000 bp
Parsing rows: 100.0%
[Tue Oct 31 15:32:45 2023] Sorting by genome coordinate
[Tue Oct 31 15:32:46 2023] Quantizing intervals in windows of 10000 bp
Parsing rows: 100.0%
[Tue Oct 31 15:33:41 2023] Sorting by genome coordinate
Traceback (most recent call last):
File "/home/igib/jloh/jloh-1.0.2/src/plot", line 974, in <module>
main(args)
File "/home/igib/jloh/jloh-1.0.2/src/plot", line 964, in main
run_oneref_mode(args)
File "/home/igib/jloh/jloh-1.0.2/src/plot", line 905, in run_oneref_mode
df = fill_missing_windows(df, "one_ref", args)
File "/home/igib/jloh/jloh-1.0.2/src/plot", line 699, in fill_missing_windows
df_new.columns = ["Sample", "Chromosome", "W_start", "W_end", "Het_pos", "Het_ratio", "LOH_pos", "LOH_ratio"]
File "/home/igib/miniforge3/envs/jloh/lib/python3.7/site-packages/pandas/core/generic.py", line 5500, in __setattr__
return object.__setattr__(self, name, value)
File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
File "/home/igib/miniforge3/envs/jloh/lib/python3.7/site-packages/pandas/core/generic.py", line 766, in _set_axis
self._mgr.set_axis(axis, labels)
File "/home/igib/miniforge3/envs/jloh/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 216, in set_axis
self._validate_set_axis(axis, new_labels)
File "/home/igib/miniforge3/envs/jloh/lib/python3.7/site-packages/pandas/core/internals/base.py", line 58, in _validate_set_axis
f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 0 elements, new values have 8 elements```
Hi @MatteoSchiavinato,
I am trying to run jloh plot but I am getting a coding error
I am attaching the stderr
INFO: Pandarallel will run on 48 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
[Fri Oct 13 15:25:50 2023] Reading input information
[Fri Oct 13 15:25:50 2023] Quantizing heterozygosity in windows of 10000 bp
Parsing rows: 100.0%
[Fri Oct 13 15:26:28 2023] Sorting by genome coordinate
[Fri Oct 13 15:26:31 2023] Quantizing intervals in windows of 10000 bp
Parsing rows: 100.0%
[Fri Oct 13 15:28:14 2023] Sorting by genome coordinate
[Fri Oct 13 15:28:48 2023] Writing table to output
[Fri Oct 13 15:28:49 2023] Plotting
Plot command that was run:
Rscript /gpfs/projects/bsc40/project/pipelines/JLOH/src/scripts/loh-bin-plots_one-ref.Rscript /gpfs/projects/bsc40/current/vdelolmo/JLOH/CANOR/JLOH_plot/plot.LOH_rate.tsv /gpfs/projects/bsc40/current/vdelolmo/JLOH/CANOR/JLOH_plot/plots by_chromosome MCO456.LOH_blocks.A.tsv,s1799.LOH_blocks.A.tsv,s423.LOH_blocks.A.tsv,s434.LOH_blocks.A.tsv,B8323.LOH_blocks.A.tsv,s425.LOH_blocks.A.tsv,s426.LOH_blocks.A.tsv,s427.LOH_blocks.A.tsv,s435.LOH_blocks.A.tsv,s436.LOH_blocks.A.tsv,s504.LOH_blocks.A.tsv,s831.LOH_blocks.A.tsv,B8274.LOH_blocks.A.tsv,s151.LOH_blocks.A.tsv,s1540.LOH_blocks.A.tsv,s1825.LOH_blocks.A.tsv,s185.LOH_blocks.A.tsv,s421.LOH_blocks.A.tsv,s422.LOH_blocks.A.tsv,s433.LOH_blocks.A.tsv,s437.LOH_blocks.A.tsv,s599.LOH_blocks.A.tsv,s282.LOH_blocks.A.tsv,s320.LOH_blocks.A.tsv,s424.LOH_blocks.A.tsv,s498.LOH_blocks.A.tsv,s748.LOH_blocks.A.tsv 0.6,2000,750,250 REF,ALT #F7C35C,#EF6F6C,#64B6AC,#ffffff no plot off
hash-2.2.6.2 provided by Decision Patterns
Error in if (loh_contrast == "off") { :
valor ausente donde TRUE/FALSE es necesario
Ejecución interrumpida
Hello,
I was wondering if there was a way to add labels when plotting LOH blocks using jloh plot
? I can successfully plot the LOH blocks but the y-axis label that should contain the sample name is being labelled as NA. Also, is there a way to display the genome size on the x-axis when using the --merge
flag?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.