GithubHelp home page GithubHelp logo

gabaldonlab / jloh Goto Github PK

View Code? Open in Web Editor NEW
20.0 4.0 2.0 58.73 MB

A tool to extract LOH blocks from VCF, BAM and FASTA data

Home Page: http://jloh.readthedocs.io

License: GNU General Public License v3.0

Python 78.37% Nextflow 12.55% Shell 0.60% R 6.55% Dockerfile 1.92%
heterozygosity bioinformatics genomics

jloh's People

Contributors

dfupa avatar matteoschiavinato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

jloh's Issues

Error at calculate_chrom_snp_densities step

Hey,
I ran into this error and do not know how to proceed

jloh extract --vcf /Users/emmannaemeka/Documents/cglabarata/cglbarata_analaysis/cglabara_analysis/ger_bam/jloh/b9_jlohcall_con.vcf --ref /Users/emmannaemeka/Documents/cglabarata/cglbarata_analaysis/cglabara_analysis/cbs138.fasta --bam /Users/emmannaemeka/Documents/cglabarata/cglbarata_analaysis/cglabara_analysis/ger_bam/b9.bam
[Thu Apr 13 15:33:29 2023] Preparing workspace...
[Thu Apr 13 15:33:29 2023] Running in default mode...
[Thu Apr 13 15:33:29 2023] Extracting heterozygous and homozygous SNPs...
[Thu Apr 13 15:33:29 2023] Found 3410 heterozygous SNPs and 59669 homozygous SNPs
[Thu Apr 13 15:33:29 2023] Getting hetero- and homozygous SNP density...
Traceback (most recent call last):
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1464, in
main(args, tmp_bams)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1450, in main
run_in_default_mode(args, tmp_bams)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1164, in run_in_default_mode
hetero_div, homo_div = hetero_and_homo_snp_densities(Vcfs["hetero"][0], Vcfs["homo"][0], args.ref)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 390, in hetero_and_homo_snp_densities
Het_snp_densities = calculate_chrom_snp_densities(Hetero_lines, ref)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 372, in calculate_chrom_snp_densities
Snps_by_chrom[snp[0]].append(snp[1])
KeyError: 'chr_A'

Error running the jloh stats command

Hello @MatteoSchiavinato ,

Following a successfull installation of the tool, I was able to do a test run with jloh sim --fasta S_para.chrXII.fa which had no errors.

However with the stats option it failed. Please see the code and error below. Please advise.
Kind regards,

(base) abdurahman@Abdul-Rahmans-MacBook-Pro jloh % ./jloh stats --vcf test_data/out.ff.vcf      
[Fri Oct 13 12:52:22 2023] Reading SNPs
[Fri Oct 13 12:52:22 2023] found 19 het SNPs and 114 homo SNPs
[Fri Oct 13 12:52:22 2023] Reading chrom lengths from VCF header
[Fri Oct 13 12:52:22 2023] Read 1 chromosome names and their lengths
[Fri Oct 13 12:52:22 2023] Calculating heterozygous SNP densities
Traceback (most recent call last):
  File "/Users/abdurahman/jloh/src/stats", line 331, in <module>
    main(args)
  File "/Users/abdurahman/jloh/src/stats", line 263, in main
    Het, Het_quant = calculate_chrom_snp_densities(Hetero_lines, Chrom_lengths, args)
  File "/Users/abdurahman/jloh/src/stats", line 244, in calculate_chrom_snp_densities
    df_quant = df.quantile(q=[0.05, 0.10, 0.15, 0.50, 0.85, 0.90, 0.95])
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/frame.py", line 10927, in quantile
    res = data._mgr.quantile(qs=q, axis=1, interpolation=interpolation)
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1587, in quantile
    blocks = [
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1588, in <listcomp>
    blk.quantile(axis=axis, qs=qs, interpolation=interpolation)
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 1461, in quantile
    result = quantile_compat(self.values, np.asarray(qs._values), interpolation)
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/array_algos/quantile.py", line 37, in quantile_compat
    return quantile_with_mask(values, mask, fill_value, qs, interpolation)
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/array_algos/quantile.py", line 95, in quantile_with_mask
    result = _nanpercentile(
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/array_algos/quantile.py", line 216, in _nanpercentile
    return np.percentile(
  File "<__array_function__ internals>", line 180, in percentile
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4166, in percentile
    return _quantile_unchecked(
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4424, in _quantile_unchecked
    r, k = _ureduce(a,
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 3725, in _ureduce
    r = func(a, **kwargs)
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4593, in _quantile_ureduce_func
    result = _quantile(arr,
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4710, in _quantile
    result = _lerp(previous,
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4527, in _lerp
    diff_b_a = subtract(b, a)
TypeError: unsupported operand type(s) for -: 'str' and 'str'

Bedtool error

Hi

I managed to get jloh to run but this error came up. Find below the error message

/Users/emmannaemeka/Documents/bioiformatics_program/jloh/jloh extract --vcf b9_jlohcall_con.vcf --ref /Users/emmannaemeka/Documents/cglabarata/cbs138.fasta --bam /Users/emmannaemeka/Documents/cglabarata/cglbarata_analaysis/cglabara_analysis/ger_bam/b9.bam
[Mon Mar 27 20:31:58 2023] Preparing workspace...
[Mon Mar 27 20:31:58 2023] Running in default mode...
[Mon Mar 27 20:31:58 2023] Extracting heterozygous and homozygous SNPs...
[Mon Mar 27 20:31:59 2023] Found 3410 heterozygous SNPs and 59669 homozygous SNPs
[Mon Mar 27 20:31:59 2023] Getting hetero- and homozygous SNP density...
[Mon Mar 27 20:31:59 2023] Homozygous SNPs/kbp: 4.73
[Mon Mar 27 20:31:59 2023] Heterozygous SNPs/kbp: 0.31
[Mon Mar 27 20:31:59 2023] Creating a file with chromosome lengths...
[Mon Mar 27 20:31:59 2023] Done
[Mon Mar 27 20:31:59 2023] Creating temporary bam files by chromosome ...
[Mon Mar 27 20:32:52 2023] Done
[Mon Mar 27 20:32:52 2023] Clustering heterozygous and homozygous SNPs into blocks...
Traceback (most recent call last):
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1436, in
main(args, tmp_bams)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1423, in main
tmp = run_in_default_mode(args, tmp_bams)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 1151, in run_in_default_mode
Het_blocks, Homo_blocks_REF, Homo_blocks_ALT = snps_to_bed_blocks(args, Vcfs["hetero"][0], Vcfs["homo"][0], genome_file, args.min_snps_kbp)
File "/Users/emmannaemeka/Documents/bioiformatics_program/jloh/src/extract", line 525, in snps_to_bed_blocks
Het_bed_blocks = BedTool(Het_bed_blocks).merge(d=merge_len, c=4, o="sum")
File "/Users/emmannaemeka/mambaforge/lib/python3.9/site-packages/pybedtools/bedtool.py", line 923, in decorated
result = method(self, *args, **kwargs)
File "/Users/emmannaemeka/mambaforge/lib/python3.9/site-packages/pybedtools/bedtool.py", line 402, in wrapped
stream = call_bedtools(
File "/Users/emmannaemeka/mambaforge/lib/python3.9/site-packages/pybedtools/helpers.py", line 460, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:

bedtools merge -o sum -i /var/folders/qr/x2s0dbd137d0pg3c4sfcfhhm0000gn/T/pybedtools.huqezd4e.tmp -d 99 -c 4

Error message was:
Error: Unable to open file /var/folders/qr/x2s0dbd137d0pg3c4sfcfhhm0000gn/T/pybedtools.huqezd4e.tmp. Exiting.

Errors running test dataset in docker

Hi!
I'm trying to run jloh using docker, but I am running into some errors when following instructions (https://jloh.readthedocs.io/en/latest/usage/run_test_data.html) for the test dataset provided.

For the "jloh stats" command, I get:

[Wed Jan 24 17:05:59 2024] Reading SNPs
[Wed Jan 24 17:05:59 2024] found 19 het SNPs and 114 homo SNPs
[Wed Jan 24 17:05:59 2024] Reading chrom lengths from VCF header
[Wed Jan 24 17:05:59 2024] Read 1 chromosome names and their lengths
[Wed Jan 24 17:05:59 2024] Calculating heterozygous SNP densities
Traceback (most recent call last):
  File "/root/src/jloh/src/stats", line 331, in <module>
    main(args)
  File "/root/src/jloh/src/stats", line 263, in main
    Het, Het_quant = calculate_chrom_snp_densities(Hetero_lines, Chrom_lengths, args)
  File "/root/src/jloh/src/stats", line 244, in calculate_chrom_snp_densities
    df_quant = df.quantile(q=[0.05, 0.10, 0.15, 0.50, 0.85, 0.90, 0.95])
  File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 11834, in quantile
    res = data._mgr.quantile(qs=q, interpolation=interpolation)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 1507, in quantile
    blocks = [
  File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 1508, in <listcomp>
    blk.quantile(qs=qs, interpolation=interpolation) for blk in self.blocks
  File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1587, in quantile
    result = quantile_compat(self.values, np.asarray(qs._values), interpolation)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/array_algos/quantile.py", line 39, in quantile_compat
    return quantile_with_mask(values, mask, fill_value, qs, interpolation)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/array_algos/quantile.py", line 97, in quantile_with_mask
    result = _nanpercentile(
  File "/usr/local/lib/python3.9/site-packages/pandas/core/array_algos/quantile.py", line 218, in _nanpercentile
    return np.percentile(
  File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4283, in percentile
    return _quantile_unchecked(
  File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4555, in _quantile_unchecked
    return _ureduce(a,
  File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3823, in _ureduce
    r = func(a, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4721, in _quantile_ureduce_func
    result = _quantile(arr,
  File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4840, in _quantile
    result = _lerp(previous,
  File "/usr/local/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4655, in _lerp
    diff_b_a = subtract(b, a)
TypeError: unsupported operand type(s) for -: 'str' and 'str'

For the "jloh extract", everything runs as expected.

And for the "jloh plot", I get:

INFO: Pandarallel will run on 12 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
[Wed Jan 24 17:07:47 2024] Reading input information
[Wed Jan 24 17:07:47 2024] Quantizing heterozygosity in windows of 10000 bp
Parsing rows: 100.0%
[Wed Jan 24 17:07:47 2024] Sorting by genome coordinate
[Wed Jan 24 17:07:47 2024] Quantizing intervals in windows of 10000 bp
Parsing rows: 100.0%
[Wed Jan 24 17:07:47 2024] Sorting by genome coordinate
[Wed Jan 24 17:07:48 2024] Writing table to output
[Wed Jan 24 17:07:48 2024] Plotting

Plot command that was run:
Rscript /root/src/jloh/src/scripts/loh-bin-plots_one-ref.Rscript jloh_out/plot.LOH_rate.tsv jloh_out/plots by_chromosome /input/jloh.LOH_blocks.tsv 0.35,2000,750,250 REF,ALT \#F7C35C,\#EF6F6C,\#64B6AC,\#ffffff no plot max

Error in library(reshape2) : there is no package called ‘reshape2’
Execution halted

Can you please advise?
Thanks in advance

Error running jloh extract

Hello, I'm getting numerous error when running jloh on the test data.
version: 1.0.2
Command:
./jloh extract --vcf test_data/out.ff.vcf --bam test_data/out.fs.bam --ref test_data/S_para.chrXII.fa --threads 40

I have attached the full log:
jloh_error.log

Thanks in advance!

Error running the jloh stats command

Hello @MatteoSchiavinato ,

Thanks for resolving the previous issues.
Using the new version, I was able to successfully ran the stats command /jloh stats --vcf out.ff.vcf on the test data.

I then tried command on my own snp data for the C. albicans WGS data called with gatk.

This gave the error below

Traceback (most recent call last): File "/Users/abdurahman/JLOH_1/jloh/src/stats", line 313, in <module> main(args) File "/Users/abdurahman/JLOH_1/jloh/src/stats", line 256, in main Hetero_lines, Homo_lines = hetero_and_homo_snps(args.vcf) File "/Users/abdurahman/JLOH_1/jloh/src/stats", line 93, in hetero_and_homo_snps dict = { annotations[i]:values[i] for i in range(0, len(annotations)) } File "/Users/abdurahman/JLOH_1/jloh/src/stats", line 93, in <dictcomp> dict = { annotations[i]:values[i] for i in range(0, len(annotations)) } IndexError: list index out of range

I tried with different samples and each gave the same error.
Any suggestions on this?

Kind regards,
Abdul-Rahman

Error running Jloh from docker

Hello,
I was able to successfully pull the docker image using
docker pull cgenomics/jloh

However when I run the test command docker run --rm -t -i cgenomics/jloh --help , I get this error below

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "--help": executable file not found in $PATH: unknown.

Please advice,
Regards,
Abdul-Rahman

Error running jloh plot

Hello, I am getting an error when running jloh plot on my dataset.
The command I used was:
./jloh plot --one-ref --loh jloh/jloh.LOH_blocks.tsv --het jloh/jloh.exp.het_blocks.bed --output-dir jloh/plot/

The error that I'm getting is:

[Tue Oct 31 15:32:29 2023] Quantizing heterozygosity in windows of 10000 bp
Parsing rows: 100.0%        
[Tue Oct 31 15:32:45 2023] Sorting by genome coordinate
[Tue Oct 31 15:32:46 2023] Quantizing intervals in windows of 10000 bp
Parsing rows: 100.0%        
[Tue Oct 31 15:33:41 2023] Sorting by genome coordinate
Traceback (most recent call last):
  File "/home/igib/jloh/jloh-1.0.2/src/plot", line 974, in <module>
    main(args)
  File "/home/igib/jloh/jloh-1.0.2/src/plot", line 964, in main
    run_oneref_mode(args)
  File "/home/igib/jloh/jloh-1.0.2/src/plot", line 905, in run_oneref_mode
    df = fill_missing_windows(df, "one_ref", args)
  File "/home/igib/jloh/jloh-1.0.2/src/plot", line 699, in fill_missing_windows
    df_new.columns = ["Sample", "Chromosome", "W_start", "W_end", "Het_pos", "Het_ratio", "LOH_pos", "LOH_ratio"]
  File "/home/igib/miniforge3/envs/jloh/lib/python3.7/site-packages/pandas/core/generic.py", line 5500, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
  File "/home/igib/miniforge3/envs/jloh/lib/python3.7/site-packages/pandas/core/generic.py", line 766, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/home/igib/miniforge3/envs/jloh/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 216, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/home/igib/miniforge3/envs/jloh/lib/python3.7/site-packages/pandas/core/internals/base.py", line 58, in _validate_set_axis
    f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 0 elements, new values have 8 elements```

Error with JLOH plot

Hi @MatteoSchiavinato,
I am trying to run jloh plot but I am getting a coding error
I am attaching the stderr

INFO: Pandarallel will run on 48 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
[Fri Oct 13 15:25:50 2023] Reading input information
[Fri Oct 13 15:25:50 2023] Quantizing heterozygosity in windows of 10000 bp
Parsing rows: 100.0%        
[Fri Oct 13 15:26:28 2023] Sorting by genome coordinate
[Fri Oct 13 15:26:31 2023] Quantizing intervals in windows of 10000 bp
Parsing rows: 100.0%        
[Fri Oct 13 15:28:14 2023] Sorting by genome coordinate
[Fri Oct 13 15:28:48 2023] Writing table to output
[Fri Oct 13 15:28:49 2023] Plotting

Plot command that was run:
Rscript /gpfs/projects/bsc40/project/pipelines/JLOH/src/scripts/loh-bin-plots_one-ref.Rscript /gpfs/projects/bsc40/current/vdelolmo/JLOH/CANOR/JLOH_plot/plot.LOH_rate.tsv /gpfs/projects/bsc40/current/vdelolmo/JLOH/CANOR/JLOH_plot/plots by_chromosome MCO456.LOH_blocks.A.tsv,s1799.LOH_blocks.A.tsv,s423.LOH_blocks.A.tsv,s434.LOH_blocks.A.tsv,B8323.LOH_blocks.A.tsv,s425.LOH_blocks.A.tsv,s426.LOH_blocks.A.tsv,s427.LOH_blocks.A.tsv,s435.LOH_blocks.A.tsv,s436.LOH_blocks.A.tsv,s504.LOH_blocks.A.tsv,s831.LOH_blocks.A.tsv,B8274.LOH_blocks.A.tsv,s151.LOH_blocks.A.tsv,s1540.LOH_blocks.A.tsv,s1825.LOH_blocks.A.tsv,s185.LOH_blocks.A.tsv,s421.LOH_blocks.A.tsv,s422.LOH_blocks.A.tsv,s433.LOH_blocks.A.tsv,s437.LOH_blocks.A.tsv,s599.LOH_blocks.A.tsv,s282.LOH_blocks.A.tsv,s320.LOH_blocks.A.tsv,s424.LOH_blocks.A.tsv,s498.LOH_blocks.A.tsv,s748.LOH_blocks.A.tsv 0.6,2000,750,250 REF,ALT #F7C35C,#EF6F6C,#64B6AC,#ffffff no plot off

hash-2.2.6.2 provided by Decision Patterns

Error in if (loh_contrast == "off") { : 
  valor ausente donde TRUE/FALSE es necesario
Ejecución interrumpida

Adding labels in jloh plot

Hello,
I was wondering if there was a way to add labels when plotting LOH blocks using jloh plot? I can successfully plot the LOH blocks but the y-axis label that should contain the sample name is being labelled as NA. Also, is there a way to display the genome size on the x-axis when using the --merge flag?

P1_132 merged loh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.