GithubHelp home page GithubHelp logo

0xtcg / aldy Goto Github PK

View Code? Open in Web Editor NEW
51.0 4.0 20.0 26.31 MB

Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes

Home Page: http://aldy.csail.mit.edu

License: Other

Python 38.22% Cython 19.04% C 40.90% Jupyter Notebook 1.84%
sequencing pgrnseq illumina cyp2d6 adme genotype allele bioinformatics

aldy's People

Contributors

inumanag avatar lhon avatar twesigomwedavid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

aldy's Issues

Issues calling long read whole genome sequencing

Hello,
I am trying to run Aldy to call CYP2D6 on a set of samples that were genotyped using PacBio long-read whole genome sequencing using 8x coverage. The current profiles available for Aldy use targeted long-read sequencing which does not work for these samples as coverage is low. I tried creating my own profile using one of the samples, but I get the error below:

/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/python/bin/python3.9 -m aldy profile --param sam_long_reads=true --genome hg38 123456.bam > long_read.profile
๐Ÿฟ  Aldy v4.4 (Python 3.9.15 on Linux 5.10.0-0.deb10.16-amd64-x86_64-with-glibc2.31)
   (c) 2016-2023 Aldy Authors. All rights reserved.
   Free for non-commercial/academic use only.
Ignoring utr3 in 5.001
Ignoring utr3 in 7.001
Ignoring utr3 in 8.001
Ignoring utr3 in 10.001
Ignoring utr3 in 19.001
Ignoring utr3 in 24.001
Ignoring utr3 in 24.002
Ignoring utr3 in 28.001
Ignoring utr3 in 28.002
Ignoring utr3 in 35.001
Ignoring utr3 in 35.002
Ignoring utr3 in 36.001
Ignoring utr3 in 37.001
Scanning chr1:59890307-109696745...
Scanning chr2:233583743-233776299...
Scanning chr4:69050374-69114287...
Scanning chr6:18125310-18161143...
Scanning chr7:977198-117718971...
/bin/bash: line 1: 16913 Killed                  /home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/python/bin/python3.9 -m aldy profile --param sam_long_reads=true --genome hg38 123456.bam > long_read.profile

Thanks,
Andrew

CYP3A4 and rs2740574 handling

Hello,

We have been evaluating Aldy 4.4 alongside other PGx callers and noticed that rs2740574 is handled a little strangely. We've seen two buckets of similar errors:

  1. Despite being a homozygous call, Aldy appears to treat the variant as heterozygous/assign both copies to one diplotype
  2. Or, the correct suballele call is not being made when rs2740574 is present

Aldy command in used both examples:

aldy genotype \
    --profile illumina \
    --gene cyp3a4 \
    --output test.default.cyp3a4.aldy \
    --genome hg19 \
    SAMPLE_1.bam \
    -v T \
    -l test.default.cyp3a4.aldylog

*1B/*1A, should be *1A/\*1A

Full debug log: *1B/*1 test.default.cyp3a4.aldylog.txt

Aldy correctly infers the copy number solution

[cn] result= CNSol[0.00; sol=(2x*1); cn=22222222222222222222222222222] (provided)
Potential CYP3A4 gene structures for SAMPLE_1:
   1: 2x*1 (confidence: 100%)

And then correctly detects a near ~2x copynumber for rs2740574

rs2740574    99382096.C>T    -390G>A    (cov= 344, cn= 1.9;

But in the final solution, instead of calling two *1As one of each *1A and *1B are called:

    Minor: [*1.001] / [*1.002]
    Legacy notation: [*1B] / [*1A]

and both copies of rs2740574 are assigned to allele 1 (*1.002) instead of being split evenly.

*1.002/*(36.001 +rs2740574) should be *1.002, *36.002

Full debug log: sample_2.default.cyp3a4.aldylog.txt

Aldy correctly infers the cn solution:

[cn] result= CNSol[0.00; sol=(2x*1); cn=22222222222222222222222222222] (provided)
Potential CYP3A4 gene structures for SAMPLE_2:
   1: 2x*1 (confidence: 100%)

And correctly detects the rs2740574 cn

rs2740574    99382096.C>T    -390G>A    (cov= 401, cn= 2.0; 

But ultimately calls

    Minor: [*1.002] / [*36.001 +rs2740574]
    Legacy notation: [*1A] / [*36.001 +rs2740574]

*36.002 is in fact *36.001 + rs2740574?


I'm happy to provide subsetted and anonymized BAMs privately if it will help.

ERROR: gene= CYP1A2

Dear sir,
I had trouble excute Aldy 3.1 in CYP1A2 gene.
Actually, this error won't appear if the bam file was aligned to Grch37 human genpome but to Grch38.
image

hg38

Update databases to hg38 and PharmVar

Genotype Interpretation and Possible bugs

Hello,
I have several questions about Aldy genotypes and a possible bug report. I am using the latest version of Aldy with the default ILP solver. I have WGS samples that I am using to call CYP2D6.

I have a few genotype results that I need help interpreting.

  1. *2/*41+rs368858603 - what does the + rsID mean?
  2. *2/*68:2 - what does the ":2" mean?
  3. *4+*4/*4.021.ALDY - what does the ".ALDY" mean?
  4. *1+*94/*10 and *1+*2/*17 - For these last two my question is about the "*1+other" haplotype. *1 is generally defined as the lack of a variant/allele. In these two scenarios, is Aldy saying that one of the suballeles within *1 was detected plus *94 on the same haplotype?

I have a few samples where the stdout call from Aldy is *5/*5. However, the output .tsv file is blank. Why would this be the case? Example is shown below.

๐Ÿฟ  Aldy v4.3.1 (Python 3.9.15 on Linux 5.15.65+-x86_64-with-glibc2.31)
   (c) 2016-2022 Aldy Authors. All rights reserved.
   Free for non-commercial/academic use only.
Genotyping sample 1004207.bam...
Potential CYP2D6 gene structures for 1004207:
   1:  (confidence: 100%)
Potential major CYP2D6 star-alleles for 1004207:
   1:  (confidence: 100%)
Best CYP2D6 star-alleles for 1004207:
   1: *5 / *5 (confidence=100%)
      Minor alleles: 
CYP2D6 results:
  - *5 / *5
    Minor: [*5] / [*5]
    Legacy notation: [*5] / [*5]
Preparing debug archive...

image

I can provide a debug report for anything shown here if needed.

Multiple BAM files?

Is it possible to merge BAM files and make aldy to read them at the same time? Is it necessary to read BAM files one by one? Thank you in advance.

Enquiry

May I ask the following, I've read the Nature and BioRxiv paper and have the following questions:

  1. Could we use exome instead of genome sequencing data in BAM format?
  2. Output file will be in star-allele nomenclature (with major and minor alleles), correct?
  3. Will variant specific guideline be available in the output or one needs to refer to indiviual variant guideline in CPIC?
  4. Is downloading and using the software free for research purpose?
    Thank you very much!

aldy test failed

Hi,
i do aldy test and got a following failed :
Screen Shot 2565-11-07 at 11 14 25
image

env:
image
after dig into it i thought it might be involve this part :
image

Some gene cannot be accessed

Dear sir,
I tried aldy genotype -p illumina -g UGT1A1 my.bam
and got the following information:
๐Ÿฟ Aldy v3.1 (Python 3.8.10 on Linux 5.4.0-121-generic-x86_64-with-glibc2.29)
(c) 2016-2022 Aldy Authors. All rights reserved.
Free for non-commercial/academic use only.
Genotyping sample my.bam...
ERROR: ugt1a1 cannot be accessed
The same massage when doing gene IFNL3, NAT2, UGT1A1, VKORC1

RecursionError: maximum recursion depth exceeded in comparison

Hi, I'm trying Aldy for genotyping PGX genes, for some run, I encounter this error:
maximum recursion depth exceeded in comparison

Traceback (most recent call last):
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/__main__.py", line 117, in main
    _genotype(gene, output, args)
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/__main__.py", line 433, in _genotype
    run(None)
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/__main__.py", line 392, in run
    debug=debug,
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/genotype.py", line 185, in genotype
    debug=debug,
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/major.py", line 75, in estimate_major
    gene, alleles, coverage, cn_solution, solver, gap, identifier, debug
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/major.py", line 259, in solve_major_model
    for status, opt, sol in model.solutions(gap):
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/lpinterface.py", line 268, in solutions
    yield from self.solutions(gap, best_obj, limit, iteration + 1, init)
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/lpinterface.py", line 268, in solutions
    yield from self.solutions(gap, best_obj, limit, iteration + 1, init)
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/lpinterface.py", line 268, in solutions
    yield from self.solutions(gap, best_obj, limit, iteration + 1, init)
  [Previous line repeated 972 more times]
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/lpinterface.py", line 267, in solutions
    self.addConstr(self.quicksum(vv.values()) <= len(vv) - 1)
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/aldy/lpinterface.py", line 397, in quicksum
    return self.model.Sum(expr)
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/ortools/linear_solver/pywraplp.py", line 468, in Sum
    result = SumArray(expr_array)
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/ortools/linear_solver/linear_solver_natural_api.py", line 209, in __init__
    self.__array = [CastToLinExp(elem) for elem in array]
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/ortools/linear_solver/linear_solver_natural_api.py", line 209, in <listcomp>
    self.__array = [CastToLinExp(elem) for elem in array]
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/site-packages/ortools/linear_solver/linear_solver_natural_api.py", line 53, in CastToLinExp
    if isinstance(v, numbers.Number):
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/abc.py", line 190, in __instancecheck__
    subclass in cls._abc_negative_cache):
  File "/home/nguyen/anaconda3/envs/aldy/lib/python3.6/_weakrefset.py", line 75, in __contains__
    return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison

I found that the error comes from reaching maximum recursive depth and can be raised by using sys.setrecursionlimit(1500), but I also heard that it's not safe for doing so. Can this be improved?

I also wonder about genotyping multiple genes at once, is this possible to do it?

interpreting aldy outputs

Hi,

We've been running aldy on our local cohort and are seeing that some alleles have the suffix .ALDY, e.g.:

cat example.aldy
# #Sample Gene    SolutionID      Major   Minor   Copy    Allele  Location        Type    Coverage        Effect  dbSNP   Code    Status
# #Solution 1: *10.002 -rs28371738, *10.004 -rs28371738 -rs28735595, *36.1001 -rs28735595
# example        CYP2D6  1       *10/*36.ALDY+*10        10.002;10.004;36.1001   0       10.002  42126610        C>G     -1      S486T   rs1135840
# ...

How shall we interpret those outputs? I.e. what's the difference between *10/*36.ALDY+*10 and *10/*36+*10?

Thanks

AttributeError("'NoneType' object has no attribute 'startswith'") for gene= UGT1A1

Hi there,

Aldy has been working fine for all the other genes however only when I try UGT1A1 do i get this error.

  Aldy v4.4 (Python 3.8.10 on Linux 5.15.0-1042-azure-x86_64-with-glibc2.29)
   (c) 2016-2023 Aldy Authors. All rights reserved.
   Free for non-commercial/academic use only.
Genotyping sample P000205.hc.vqsr.hs38.vcf.gz...
WARNING: Cannot detect genome, defaulting to hg19.
WARNING: Using VCF file. Copy-number calling is not available.
Using VCF sample P000205
ERROR: gene= UGT1A1, file= /dbfs/mnt/s3/s3/vcf/LKCGP-P000205-251422-02-04-07-G1/hs38/haplotypecaller/v4_2_5_0/P000205.hc.vqsr.hs38.vcf.gz
AttributeError("'NoneType' object has no attribute 'startswith'")
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/aldy/__main__.py", line 122, in main
    _genotype(args.gene, output, args)
  File "/usr/local/lib/python3.8/dist-packages/aldy/__main__.py", line 443, in _genotype
    run(None)
  File "/usr/local/lib/python3.8/dist-packages/aldy/__main__.py", line 393, in run
    _ = genotype(
  File "/usr/local/lib/python3.8/dist-packages/aldy/genotype.py", line 176, in genotype
    sample = sam.Sample(gene, profile, sam_path, debug=debug)
  File "/usr/local/lib/python3.8/dist-packages/aldy/sam.py", line 116, in __init__
    self._make_coverage(norm, muts)
  File "/usr/local/lib/python3.8/dist-packages/aldy/sam.py", line 482, in _make_coverage
    self.coverage = Coverage(
  File "/usr/local/lib/python3.8/dist-packages/aldy/coverage.py", line 50, in __init__
    if not (indel_coverage and op.startswith("ins")):
AttributeError: 'NoneType' object has no attribute 'startswith'

wondering if anyone else has run into this issue?

Alignment to PGx regions only

Hello,

Do you think it would be a good idea to subset the human reference genome file so as it includes only PGx regions + some regions for normalization? Is this something feasible? Will Aldy work with an approach like that?

I want to speed up the process using WGS data.

multiple-warn-level argument for genotype isn't being converted to int

When attempting to use a non-default value for multiple-warn-level, the following error is generated:
Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/aldy/__main__.py", line 120, in main _genotype(gene, output, args) File "/usr/local/lib/python3.6/dist-packages/aldy/__main__.py", line 450, in _genotype run(None) File "/usr/local/lib/python3.6/dist-packages/aldy/__main__.py", line 412, in run min_cov=args.min_coverage, File "/usr/local/lib/python3.6/dist-packages/aldy/genotype.py", line 211, in genotype if multiple_warn_level >= 3 and len(cn_sols) > 1: TypeError: '>=' not supported between instances of 'str' and 'int'

It looks like an int cast is needed in the genotype call in __main__.py
I'd be happy to add this in a PR if you're open to external entities contributing to the project.

G6PD genotype

Hi,

I would like to report an error in the genotyping of G6PD. If I am not mistaken, G6PD is located on the X chromosome, and therefore we can have homozygous or heterozygous females, but males will only be hemizygous. I think Aldy interprets this gene as diploid in all cases, because when running it on patient exomes it does not distinguish between those with XX or XY.

Thanks

Question about low coverage sequencing samples

Hello,
We have a number of short read whole genome sequencing samples (~200) where Aldy is unable to assign a genotype due to low coverage. I was able to successfully run Cyrius and PyPGx on most of these samples and most have some kind of structural variant as part of a tandem. I.e *1/*68+*4 or *68/*68+4, etc. Coverage across the full dataset is roughly 35x, however it is possible that these samples have lower coverage than the rest. Is there a way to force Aldy to try to call these samples or would that just result in too much uncertainty in the call? Attached is debug file from one of the samples.

๐Ÿฟ  Aldy v4.5 (Python 3.9.15 on Linux 5.15.133+-x86_64-with-glibc2.31)
   (c) 2016-2023 Aldy Authors. All rights reserved.
   Free for non-commercial/academic use only.
Genotyping sample 1020075.bam...
Potential CYP2D6 gene structures for 1020075:
   1: 2x*68 (confidence: 100%)
Potential major CYP2D6 star-alleles for 1020075:
   1: 2x*68 & rs1135840 (confidence: 100%)
ERROR: gene= CYP2D6, profile= wgs, file= filtered_bams/1020075.bam
Aldy could not phase any major solution.
Possible solutions:
 - Check the coverage. Extremely low coverage prevents Aldy from calling star-alleles.
 - Run with --debug parameter and notify the authors of Aldy.
Preparing debug archive...

debug.info.tar.gz

Thanks,
Andrew

Add CES1 genes

Hi,

I would like to add CES1 and CES2 gene for variant and CNV detection. Is it possible ? Do I just need to edit a new yaml file ?

Issue on some runs

Hello,

I'm running Aldy on some samples I have and almost all of them worked flawlessly for all the gene profiles that are available. However, I have two samples that are both failing to run DPYD and they're receiving an identical error message:

Gurobi not found. Please install Gurobi and gurobipy Python package.
No module named 'gurobipy'
*** Aldy v1.2 (Python 3.7.0) ***
(c) 2017 SFU, MIT & IUB. All rights reserved.
Arguments:
  Gene:      DPYD
  Profile:   illumina
  Threshold: 50%
  Input:     /data_mount/sample.bam
  Output:    /data_output/DPYD.aldy
  Log:       /data_output/DPYD.aldy.log
  Phasing:   False
Gurobi not found. Please install Gurobi and gurobipy Python package.
No module named 'gurobipy'
Gurobi not found. Please install Gurobi and gurobipy Python package.
No module named 'gurobipy'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aldy/genotype.py", line 59, in genotype
    score, init_sol = protein.get_initial_solution(gene, sample, cn_sol, solver)
  File "/usr/local/lib/python3.7/site-packages/aldy/protein.py", line 44, in get_initial_solution
    structure
  File "/usr/local/lib/python3.7/site-packages/aldy/protein.py", line 235, in solve_ilp
    status, opt, solutions = c.solveAll(objective, dict(list(A.items()) + [((a, m), M[a][m]) for a in M for m in M[a]]))
  File "/usr/local/lib/python3.7/site-packages/aldy/lpinterface.py", line 157, in solveAll
    solutions = [tuple(sorted(y for y in x)) for x in solutions]
  File "/usr/local/lib/python3.7/site-packages/aldy/lpinterface.py", line 157, in <listcomp>
    solutions = [tuple(sorted(y for y in x)) for x in solutions]
TypeError: '<' not supported between instances of 'tuple' and 'str'

For setup, I'm running inside Docker using SCIP (I can send more if necessary). Thoughts?

Update Aldy gene databases to latest PharmVar Version

Could you update Aldyโ€™s gene databases that are based on Pharmvar. The current pharmvar database is version 5.1.6. The ones in the current Aldy v3.3 is from 4.1.7. Also, there is a new gene added to PharmVar, SLCO1B1. In Aldy v3.3, the SLCO1B1 allele definitions are based on PharmGKB. Could you also update SLCO1B1 allele definitions to be based on PharmVar instead of PharmGKB?

-Best,

Reynold

Aldy test Assertion Error for the new year :)

I have a working version of Aldy 1.2 but wanted to try 3.0. When I tested version 3.0 with aldy test I received 16 failed test because of an assertion error in the test script test_full.py.
Changing the date in test_full.py from 2016-2020 to 2016-2021 resolved those errors.

Using Aldy with nanopore data - error for some genes

Dear

Thanks for the great tool. We know that the combination of Aldy with Nanopore data is not fully supported yet, however, could you help me out with this error?

EDIT: I added the debug .tar file for CYP2A6.
DEBUG.tar.gz

I created a specific profile for our nanopore data by using

aldy profile /data/projects/pass_sorted.bam --genome hg38 > nanopore.profile

Next, I used the code to get the diplotypes for my data:

aldy genotype -p nanopore.profile --genome hg38 pass_sorted.bam

For some genes (e.g. COMT), Aldy runs succesful. For others, it gives error ('IndexError: string index out of range') as stated below. Sometimes, a result is provided despite the error (e.g. CFTR), sometimes not (e.g. CYP2A6). Examples are given below. Any advice is much appreciated!

๐Ÿฟ Aldy v4.4 (Python 3.7.12 on Linux 3.10.0-1062.12.1.el7.x86_64-x86_64-with-centos-7.7.1908-Core) (c) 2016-2023 Aldy Authors. All rights reserved. Free for non-commercial/academic use only. Genotyping sample pass_sorted.bam...

Gene CFTR
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
Potential CFTR gene structures for pass_sorted:
1: 2x1 (confidence: 100%)
Potential major CFTR star-alleles for pass_sorted:
1: 2x
WT (confidence: 100%)
Best CFTR star-alleles for pass_sorted:
1: *WT / *WT (confidence=100%)
Minor alleles: *WT, *WT
CFTR results:

  • *WT / *WT
    Minor: [*WT] / [*WT]
    Legacy notation: [*WT] / [*WT]
    Estimated activity for *WT: unknown
    Estimated activity for *WT: unknown

Gene COMT
Potential COMT gene structures for pass_sorted:
1: 2x1 (confidence: 100%)
Potential major COMT star-alleles for pass_sorted:
1: 1x
Met, 1x*ValA (confidence: 100%)
Best COMT star-alleles for pass_sorted:
1: *Met / *ValA (confidence=100%)
Minor alleles: *(Met +rs174699 +rs165599 +rs165728), *(ValA +rs2020917 +rs13306278 +rs737866 +rs737865 +rs737864 +rs5746849 +rs740603 +rs4646312 +rs2239393 +rs174699 +rs9332377 +rs165728)
COMT results:

  • *Met / *ValA
    Minor: [*Met +rs174699 +rs165599 +rs165728] / [*ValA +rs2020917 +rs13306278 +rs737866 +rs737865 +rs737864 +rs5746849 +rs740603 +rs4646312 +rs2239393 +rs174699 +rs9332377 +rs165728]
    Legacy notation: [*Met +rs174699 +rs165599 +rs165728] / [*ValA +rs2020917 +rs13306278 +rs737866 +rs737865 +rs737864 +rs5746849 +rs740603 +rs4646312 +rs2239393 +rs174699 +rs9332377 +rs165728]
    Estimated activity for *Met: unknown
    Estimated activity for *ValA: unknown

Gene CYP2A6
Ignoring utr3 in 5.001
Ignoring utr3 in 7.001
Ignoring utr3 in 8.001
Ignoring utr3 in 10.001
Ignoring utr3 in 19.001
Ignoring utr3 in 24.001
Ignoring utr3 in 24.002
Ignoring utr3 in 28.001
Ignoring utr3 in 28.002
Ignoring utr3 in 35.001
Ignoring utr3 in 35.002
Ignoring utr3 in 36.001
Ignoring utr3 in 37.001
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
IndexError: string index out of range
Exception ignored in: 'aldy.indelpost.utilities.count_lowqual_non_ref_bases'
IndexError: string index out of range
ERROR: gene= all, file= pass_sorted.bam
IndexError('string index out of range')
Traceback (most recent call last):
File "/home/kdserran/miniconda3/envs/PGx/lib/python3.7/site-packages/aldy/main.py", line 122, in main
_genotype(args.gene, output, args)
File "/home/kdserran/miniconda3/envs/PGx/lib/python3.7/site-packages/aldy/main.py", line 443, in _genotype
run(None)
File "/home/kdserran/miniconda3/envs/PGx/lib/python3.7/site-packages/aldy/main.py", line 403, in run
**{k: v for k, v in params.items() if v is not None},
File "/home/kdserran/miniconda3/envs/PGx/lib/python3.7/site-packages/aldy/genotype.py", line 140, in genotype
**params,
File "/home/kdserran/miniconda3/envs/PGx/lib/python3.7/site-packages/aldy/genotype.py", line 186, in genotype
sample = sam.Sample(gene, profile, sam_path, reference, debug)
File "/home/kdserran/miniconda3/envs/PGx/lib/python3.7/site-packages/aldy/sam.py", line 111, in init
norm, muts = self._load_sam(path, reference, debug)
File "/home/kdserran/miniconda3/envs/PGx/lib/python3.7/site-packages/aldy/sam.py", line 162, in _load_sam
self._realign_indels(tmp, sam, reference)
File "/home/kdserran/miniconda3/envs/PGx/lib/python3.7/site-packages/aldy/sam.py", line 395, in _realign_indels
exact_match_for_shiftable=exact_match_for_shiftable,
File "aldy/indelpost/varaln.pyx", line 169, in aldy.indelpost.varaln.VariantAlignment.cinit
File "aldy/indelpost/varaln.pyx", line 205, in aldy.indelpost.varaln.VariantAlignment.__parse_pileup
File "aldy/indelpost/gappedaln.pyx", line 34, in aldy.indelpost.gappedaln.find_by_normalization
File "aldy/indelpost/gappedaln.pyx", line 130, in aldy.indelpost.gappedaln.is_target_by_normalization
File "aldy/indelpost/localn.pyx", line 123, in aldy.indelpost.localn.findall_mismatches
IndexError: string index out of range

Aldy profile Creation

I want to run aldy for a targeted sequencing.so am planing to create profile using bam file i have(same bam file is using for genotype).But i don't know about the copy number neutral region.Is there any way to find copy number neutral region from bam file?also i have one doubt will aldy create any wrong call when am run with this profile (created using aldy profile with default (cyp2d8)copy number neutral region?)

Docker container

Hi,

thanks for this useful tool!

In case you would consider also providing this as a docker container, please see my fork: https://github.com/ikmb/aldy

All you need to do is change the docker hub repo in the respectice workflow files (.github/workflows) and add the relevant repo secrets (DOCKERHUB_USERNAME and DOCKERHUB_PASS) and you are basically good to go. It should then auto-build on commits on master and on any release you do.

If you do not allow others to build public Docker containers of your code, please let me know. I actually need that for my compute environment...

For testing, you can simply do:
docker pull docker://ikmb/aldy:latest
or
singularity pull docker://ikmb/aldy:latest

Cheers,
Marc

Failed pytest after installation

Hello! I just installed aldy without any errors using 'pip install aldy', but when I do 'aldy test' a few of the tests failed. Is this expected?

image

ERROR: gene= ifnl3, profile= wgs, file= 1_HB00001.cram The average coverage of the sample is too low (1.3).

Hi there,

I have tried genotyping with CRAM file using Aldy.
But I have a issue.

$ aldy genotype -p wgs -g ifnl3 1_HB00001.cram --reference GRCh38.p12.genome.fa -o test_1_HB00001-ifnl3.aldy
๐Ÿฟ Aldy v4.2.1 (Python 3.9.13 on Linux 3.10.0-1062.4.1.el7.x86_64-x86_64-with-glibc2.17)
(c) 2016-2023 Aldy Authors. All rights reserved.
Free for non-commercial/academic use only.
Genotyping sample 1_HB00001.cram...
ERROR: gene= ifnl3, profile= wgs, file= 1_HB00001.cram
The average coverage of the sample is too low (1.3).

Is there any way to ignore the average coverage of the VCF file I used as input and just get the output when I genotyping using Aldy?
Please help me.....

Thak you

Incorrect label for CYP2E1*7.005?

I was looking through the allele definitions for CYP2E1 and I believe the label for one of the suballeles is incorrect. The label for CYPE2E17.005 is the same as for CYP2E17.004, but the first SNP is not the same as for the *7A allele:

CYP2E17.005:
label: CYP2E1
7A_1B
mutations:
- [4963, G>T, rs6413420, 5'UTR]
- [15271, G>C, rs2070676]

Can someone double check this or explain it to me?

Error: The average coverage of the sample is too low

I have a set of 4 exomes and running aldy 4.4 on them as so:

aldy genotype -g cyp2e1 -p wxs ${bam_location}/*/${sample_name}.bam -o ${output}/${sample_name}.cyp2d6.aldy -l ${output}/${sample_name}.cyp2e1.aldylog

for 3 out of the 4 exomes I am getting errors about the The average coverage of the sample is too low

[main] arguments= subparser=genotype verbosity=INFO file=/home/ryan/NGS_Data/Exome_9-14-23/dragen_analysis/CYP-028/CYP-028.bam gene=cyp2e1 profile=wxs reference=None genome=None cn_neutral_region=None output=/home/ryan/NGS_Data/Exome_9-14-23/vcf/haplotype_aldy4_results_June_2023/CYP-028.CYP2E1.aldy solver=any debug=None cn=None log=/home/ryan/NGS_Data/Exome_9-14-23/vcf/haplotype_aldy4_results_June_2023/CYP-028.CYP2E1.aldylog multiple_warn_level=1 simple=False param=None Genotyping sample CYP-028.bam... WARNING: Copy-number calling is not available for exome data. WARNING: Aldy will NOT be able to detect gene duplications, deletions and fusions. WARNING: Calling of alleles that are defined by non-exonic mutations is not available. Results might not be biologically relevant! [genotype] gene=cyp2e1; start=2023-09-27 18:52:17.572901 [lp] solver= cbc [genotype] reference= hg19 [params] neutral_value=786.0; cn_parsimony=1.0; min_coverage=5.0 [sam] path= /home/ryan/NGS_Data/Exome_9-14-23/dragen_analysis/CYP-028/CYP-028.bam [sam] Read SAM took 1.22s [coverage] scale_ratio: 1.5 [sam] avg_coverage= 43.8x ERROR: gene= cyp2e1, profile= wxs, file= /home/ryan/NGS_Data/Exome_9-14-23/dragen_analysis/CYP-028/CYP-028.bam The average coverage of the sample is too low (1.9).

I have looked at the metrics for coverage for this particular sample and dragen reports 113X coverage over the target bed for the IDT exome capture probes. How should I resolve this? Thanks!

Edit: adding the debug info for one of these samples:
debuginfo.tar.gz

chr tag in bam header gives error when creating profile

As the titles says, I think that aldy needs a chromosome notation in the bam header without the 'chr' tag. At least for my files.

aldy profile out.bam > my_profile
*** Aldy v2.2.6 (Python 3.7.6, linux) ***
*** (c) 2016-2020 Aldy Authors & Indiana University Bloomington. All rights reserved.
*** Free for non-commercial/academic use only.
Generating profile for DPYD (1:97541297-98388616)
Cannot fetch gene DPYD (1:97541297-98388616)
Generating profile for CYP2C19 (10:96444999-96615001)
Cannot fetch gene CYP2C19 (10:96444999-96615001)
Generating profile for CYP2C9 (10:96690999-96754001)
Cannot fetch gene CYP2C9 (10:96690999-96754001)
Generating profile for CYP2C8 (10:96795999-96830001)
Cannot fetch gene CYP2C8 (10:96795999-96830001)
Generating profile for CYP4F2 (19:15618999-16009501)
Cannot fetch gene CYP4F2 (19:15618999-16009501)
Generating profile for CYP2A6 (19:41347499-41400001)
Cannot fetch gene CYP2A6 (19:41347499-41400001)
Generating profile for CYP2D6 (22:42518899-42553001)
Cannot fetch gene CYP2D6 (22:42518899-42553001)
Generating profile for TPMT (6:18126540-18157375)
Cannot fetch gene TPMT (6:18126540-18157375)
Generating profile for CYP3A5 (7:99244999-99278001)
Cannot fetch gene CYP3A5 (7:99244999-99278001)
Generating profile for CYP3A4 (7:99353999-99465001)
Cannot fetch gene CYP3A4 (7:99353999-99465001)

Thus, I transformed the header using sth. like

 samtools view -H input.bam > header.sam
 sed "s/chr//" header.sam > header_corrected.sam
 samtools reheader  header_corrected.sam input.bam > out.bam

Then I was able to run

samtools index out.bam
aldy profile out.bam > my_profile
*** Aldy v2.2.6 (Python 3.7.6, linux) ***
*** (c) 2016-2020 Aldy Authors & Indiana University Bloomington. All rights reserved.
*** Free for non-commercial/academic use only.
Generating profile for DPYD (1:97541297-98388616)
Generating profile for CYP2C19 (10:96444999-96615001)
Generating profile for CYP2C9 (10:96690999-96754001)
Generating profile for CYP2C8 (10:96795999-96830001)
Generating profile for CYP4F2 (19:15618999-16009501)
Generating profile for CYP2A6 (19:41347499-41400001)
Generating profile for CYP2D6 (22:42518899-42553001)
Generating profile for TPMT (6:18126540-18157375)
Generating profile for CYP3A5 (7:99244999-99278001)
Generating profile for CYP3A4 (7:99353999-99465001)

Aldy does not recognize [23936, A>G, rs7088784]

Hi!

When running Aldy, the tool does not recognize [23936, A>G, rs7088784], a variant characteristic for *3.002. I checked my input file manually and found 5 variants also present in PharmVar. However, Aldy only picks up 4 of those 5 and does not return [23936, A>G, rs7088784].
I checked the cyp2c19.yml file of the Aldy resources;
image
The variant is there, so that is not the problem.
Did someone maybe encounter the same?

Aldy test error

Dear sir,
I do aldy test, and show the following error:
image
It seens that I got a missing file.
image

Command argument -n --cn-neutral-region not updating neutral region

Hello, hope all is well. I haven noticed that the -n and --cn-neutral-region commands do not update my neutral region. Is there a way I can provide an updated neutral region to the algorithm?

Example commands;

[ceisenhart@localhost foo]$ aldy genotype --cn-neutral-region chr1:10000-20000 -p illumina -g final.bam
๐Ÿฟ Aldy v4.4 (Python 3.9.16 on Linux 5.14.0-284.25.1.el9_2.x86_64-x86_64-with-glibc2.34)
(c) 2016-2023 Aldy Authors. All rights reserved.
Free for non-commercial/academic use only.
Genotyping sample POC_LAB_1_S19_L00N_R1_001.true.final.bam...
ERROR: gene= cyp2d6, profile= illumina, file= POC_LAB_1_S19_L00N_R1_001.true.final.bam
CN-neutral region 22:42151472-42152258 has no reads. Double check your input file for CYP2D8 (are you using hg19?), or pass an alternative CN-neutral region via -n parameter.

[ceisenhart@localhost foo]$ aldy genotype -n chr1:10000-20000 -p illumina -g final.bam
๐Ÿฟ Aldy v4.4 (Python 3.9.16 on Linux 5.14.0-284.25.1.el9_2.x86_64-x86_64-with-glibc2.34)
(c) 2016-2023 Aldy Authors. All rights reserved.
Free for non-commercial/academic use only.
Genotyping sample POC_LAB_1_S19_L00N_R1_001.true.final.bam...
ERROR: gene= cyp2d6, profile= illumina, file= POC_LAB_1_S19_L00N_R1_001.true.final.bam
CN-neutral region 22:42151472-42152258 has no reads. Double check your input file for CYP2D8 (are you using hg19?), or pass an alternative CN-neutral region via -n parameter.

I provide an updated copy number region (chr1:10000-20000) and the algorithm still looks for the internally defined region (22:42151472-42152258) then fails when it is not found.

PS: I've noticed similar behavior when trying to specify hg38 as the genome using --genome hg38. The algorithm still runs with hg19 as if I did not specify.

cn neutral region option does not work properly

Hi, I am trying to use aldy with cram files.
When I try to use other cn neutral region, aldy seems to go back to cyp2d8 region.
aldy genotype --genome hg38 -r GRCh38_full_analysis_set_plus_decoy_hla.fa -p wes -o example.wes_cram.out -l example.wes_cram.log example.cram -n chr7:55019016-55211628

First part of the log looks like this:
๐Ÿฟ Aldy v4.4 (Python 3.6.5 on Linux 5.15.0-1031-aws-x86_64-with-debian-buster-sid)
(c) 2016-2023 Aldy Authors. All rights reserved.
Free for non-commercial/academic use only.
Genotyping sample example.cram...

Gene CFTR
Failed gene CFTR
Message: CN-neutral region 22:42151472-42152258 has no reads. Double check your input file for CYP2D8 (are you using hg19?), or pass an alternative CN-neutral region via -n parameter.

Any help on this will be appreciated..

Providing --phase parameter for Aldy

Hi, I am working with Aldy and need to provide --phase parameter for more accurate calling of novel star alleles.
I used HapCUT2, based on the related documentation and received both "haplotype_output_file" and "haplotype_output_file.phased.VCF".
But when I use any of them as phase parameter, I will face these errors respectively:

ERROR: gene= CYP2D6, file= A2_3_11_s53_S48.recal.bam
ValueError("invalid literal for int() with base 10: 'POS'")

and

ERROR: gene= cyp2d6, profile= illumina, file= A2_3_11_s53_S48.recal.bam
Invalid phasing line 1 in haplotype_output_file.phased.VCF (less than 7 columns)

Is anybody can tell me why it's happening and what should I do about that?

phase error

Hi,
When im trying to run the --phase , im getting the following error ,
Error
Please advise on how to rectify it.

Aldy: next steps

The following things are to be implemented for Aldy 2.0:

Improvements:

  • VCF output
    • Basic VCF support
    • Fusion VCF support
  • User-provided copy-number
  • WXS support (needs profiles)
  • Project 1 (@faridrashidi)
  • Confidence scores (note: hard)
    • Ad-hoc scores (should be easy)
  • Move to hg38
  • Explore free solvers (e.g. lpSolve)

Integration:

  • API
    • API notebook tutorial
  • Documentation
    • FIx LaTeX in Sphinx
    • Integrate with ReadTheDocs
    • Integrate into pip
  • Tests
    • Test gene reading
    • Test SAM reading
    • Test CN model on toy data
    • Test CN model on real data
    • Test major model on toy data
    • Test major model on real data
    • Test minor model on toy data
    • Test minor model on real data
    • Integrate with pip
    • Integrate with Travis
  • Repositories
    • Full pip
    • Bioconda

Misc:

  • Fix the license
  • Provide scripts for the database creation

Bug fixes:

  • #1
  • minor detection returns no solutions although there are some major solutions found by the previous stages (needs confirmation: might be fixed?)

Multiple Bam Files Into a Single Output File?

Is there a way to run multiple BAM files in Aldy and have the results from each BAM file contained in a single output file? When I try running each BAM file at a time to the same output file, it overwrites the data from the previous one.

ERROR: truncated file cannot be accessed

Hi there,

I've tried genotyping my sample using Aldy. But there is a issue below.

$ aldy genotype -p wgs -g cftr 1_HB00001.cram -o test_1_HB00001-cftr.aldy
๐Ÿฟ Aldy v4.2.1 (Python 3.9.13 on Linux 3.10.0-1062.4.1.el7.x86_64-x86_64-with-glibc2.17)
(c) 2016-2023 Aldy Authors. All rights reserved.
Free for non-commercial/academic use only.
Genotyping sample 1_HB00001.cram...
ERROR: truncated file cannot be accessed

I can not find the solution of this error message.
Could you help me to handle this error?

Thank you.

ERROR: gene= cyp2d6, file= S000029_S4842Nr1.bam KeyError('hg38')

Hi there,

I've generated my own profile using:
aldy profile S000029_S4842Nr1.bam > panel.profile

But when I attempt to run this with:
aldy genotype -p panel.profile -g cyp2d6 S000029_S4842Nr1.bam

It returns:
๐Ÿฟ Aldy v4.4 (Python 3.10.6 on Linux 5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.35)
(c) 2016-2023 Aldy Authors. All rights reserved.
Free for non-commercial/academic use only.
Genotyping sample S000029_S4842Nr1.bam...
ERROR: gene= cyp2d6, file= S000029_S4842Nr1.bam
KeyError('hg38')
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/aldy/main.py", line 122, in main
_genotype(args.gene, output, args)
File "/usr/local/lib/python3.10/dist-packages/aldy/main.py", line 443, in _genotype
run(None)
File "/usr/local/lib/python3.10/dist-packages/aldy/main.py", line 393, in run
_ = genotype(
File "/usr/local/lib/python3.10/dist-packages/aldy/genotype.py", line 183, in genotype
profile = Profile.load(gene, profile_name, cn_region, **params)
File "/usr/local/lib/python3.10/dist-packages/aldy/profile.py", line 262, in load
GRange(*prof["neutral"][gene.genome]),
KeyError: 'hg38'

Just wondering if this is an error on my side and if it can be rectified?

Thank you.

Import Error for Conftest

Hi,

May I ask how to fix the "ImportError while loading conftest" issue? I have tried installing all of Aldy's components on a separate Anaconda environment, but it keeps giving me this error? How do I fix this?

Problem in generating profile on our exome samples

Hi,

I have exome samples by using Illumina exome kit and I got problem in generating profile on trying a few different samples. It's fine to run the bundled test cases.

-bash-4.2$ aldy profile /home/ramsar1971/project/data/RAP028/RAP028-FAT.sorted.markdup.realign.recal.bam > aldy.asd.profile
*** Aldy v2.2.3 (Python 3.6.4, linux) ***
*** (c) 2016-2019 Aldy Authors & Indiana University Bloomington. All rights reserved.
*** Free for non-commercial/academic use only.
Generating profile for DPYD (1:97541297-98388616)
Cannot fetch gene DPYD (1:97541297-98388616)
Generating profile for CYP2C19 (10:96444999-96615001)
Cannot fetch gene CYP2C19 (10:96444999-96615001)
Generating profile for CYP2C9 (10:96690999-96754001)
Cannot fetch gene CYP2C9 (10:96690999-96754001)
Generating profile for CYP2C8 (10:96795999-96830001)
Cannot fetch gene CYP2C8 (10:96795999-96830001)
Generating profile for CYP4F2 (19:15618999-16009501)
Cannot fetch gene CYP4F2 (19:15618999-16009501)
Generating profile for CYP2A6 (19:41347499-41400001)
Cannot fetch gene CYP2A6 (19:41347499-41400001)
Generating profile for CYP2D6 (22:42518899-42553001)
Cannot fetch gene CYP2D6 (22:42518899-42553001)
Generating profile for TPMT (6:18126540-18157375)
Cannot fetch gene TPMT (6:18126540-18157375)
Generating profile for CYP3A5 (7:99244999-99278001)
Cannot fetch gene CYP3A5 (7:99244999-99278001)
Generating profile for CYP3A4 (7:99353999-99465001)
Cannot fetch gene CYP3A4 (7:99353999-99465001)

May I know any advice?

Thanks!

Regards,
Mullin

Copy number neutral region

Hi,

I tried Aldy using 'vdr' gene region as -n instead of CYP2D8 default region. Below is the command I ran.

$aldy genotype -p illumina -g cyp2d6 -n 12:47811535-47945000 -o $out $bam

My Bam files are GRCh38 aligned. Aldy works well when I use default CYP2D8 as CN neutral region. But for VDR gene (also I tried egfr - both of these are used as CN neutral in Stargazer tool) I get an error 'coverage for CYP2D6 is too low for copy number calling'. But my Bam files have ~30X coverage. Any comments on why am I getting this error?

Thanks
Best
Sumudu

IndexError: string index out of range for Nanopore data

I am trying to genotype cyp2d6 gene from Nanopore data (target sequencing) but kept giving me this error:

Traceback (most recent call last): ... /aldy/sam.py", line 389, in _realign_indels
valn = VariantAlignment( # type: ignore
IndexError: string index out of range
IndexError: string index out of range

I tried everything but couldn't fix it.

Would you please help me?

Thank you

ALDY's detection of insertion and deletion variants

Hello,

I am currently using ALDY for genotyping my samples, and I have come across an issue regarding the detection of insertion and deletion variants in the samples. It seems that ALDY is unable to identify variants with insertions or deletions, which leads to incorrect star allele assignments.

Examples of the problem are as follows:
(1) In the case of CYP2C9*6, the variant rs9332131 is a deletion of 'A'. However, ALDY fails to detect this deletion and wrongly assigns *6 as *1.
(rs9332131 recorded in cyp2c9.yml as [16126, delA, rs9332131, K273fs])

(2) For CYP3A5*7.001, one of its variants (rs41303343) contains an insertion 'T' that ALDY cannot detect, resulting in misidentification of *7 as 1.002.

(3) UGT1A1 *28 , UGT1A1 *36 , and UGT1A1 *37 all have rs3064744 with insertions 'TA' or deletions 'TA,' and ALDY is unable to detect these, failing to identify the correct alleles.

(4) Additionally, in the case of CYP2C19 *39, one of the ten variants is 4193delT, causing ALDY to produce a result of (CYP2C19 *39.001 - rs17880036).

Is there any specific configuration or setting that needs to be adjusted in ALDY to enable the correct detection of insertion and deletion variants? Your guidance and support would be highly appreciated.

aldy genotype -p illumina --gene cyp2c9 --genome hg38 my.vcf -o my.cyp2c9.aldy

Thank you for your attention and support.

'utf-8' codec can't decode byte 0x8b

When I run the following command :

python -m aldy genotype -p 1813PG.bam -g CYP2D6 1856PG.bam

I get the following error

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Traceback (most recent call last):
  File "/mydata/miniconda/envs/gatk/lib/python3.6/site-packages/aldy/__main__.py", line 116, in main
    _genotype(gene, output, args)
  File "/mydata/miniconda/envs/gatk/lib/python3.6/site-packages/aldy/__main__.py", line 434, in _genotype
    run(None)
  File "/mydata/miniconda/envs/gatk/lib/python3.6/site-packages/aldy/__main__.py", line 393, in run
    debug=debug,
  File "/mydata/miniconda/envs/gatk/lib/python3.6/site-packages/aldy/genotype.py", line 125, in genotype
    debug=debug,
  File "/mydata/miniconda/envs/gatk/lib/python3.6/site-packages/aldy/sam.py", line 98, in __init__
    self.detect_cn(gene, profile, cn_region)
  File "/mydata/miniconda/envs/gatk/lib/python3.6/site-packages/aldy/sam.py", line 512, in detect_cn
    prof = self._load_profile(profile)
  File "/mydata/miniconda/envs/gatk/lib/python3.6/site-packages/aldy/sam.py", line 571, in _load_profile
    for line in f:
  File "/mydata/miniconda/envs/gatk/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I am investigating...

Aldy v4.4 incorrectly calls GeT-RM sample NA21781 that was correctly called by Aldy v2.2.6

Hello there,

while testing Aldy v4.4 on CYP2D6-calling on samples from GeT-RM I encountered something unexpected. On a publication that used Aldy v2.2.6, sample NA21781 is correctly genotyped as *2x2/*68+*4 (supplementary materials), however, Aldy v4.4 that I used calls this sample as *4/*63+*65.

The BAM file for this sample was downloaded from the ENA website and used as is.

Aldy was used as :

aldy genotype -p wgs -g cyp2d6 path_to_file.bam

and the output was:

๐Ÿฟ  Aldy v4.4 (Python 3.10.6 on Linux 5.15.0-58-generic-x86_64-with-glibc2.35)
   (c) 2016-2023 Aldy Authors. All rights reserved.
   Free for non-commercial/academic use only.
Genotyping sample NA21781.bam...
Potential CYP2D6 gene structures for NA21781:
   1: 2x*1,1x*141.1001 (confidence: 100%)
   2: 2x*1,1x*61 (confidence: 100%)
   3: 2x*1,1x*63 (confidence: 100%)
Potential major CYP2D6 star-alleles for NA21781:
   1: 1x*2, 1x*4.021.ALDY, 1x*65 (confidence: 100%)
   2: 1x*4.021, 1x*63, 1x*65 (confidence: 100%)
Best CYP2D6 star-alleles for NA21781:
   1: *4.021 / *63 + *65 (confidence=100%)
      Minor alleles: *4.021, *63.001, *(65.001 +rs28371701 +rs28735595)
CYP2D6 results:
  - *4.021 / *63 + *65
    Minor: [*4.021] / [*63.001] + [*65.001 +rs28371701 +rs28735595]
    Legacy notation: [*4.021] / [*63] + [*65 +rs28371701 +rs28735595]
    Estimated activity for *65: uncertain function (evidence: L); see https://www.pharmvar.org/haplotype/187 for details
    Estimated activity for *4.021: no function (evidence: D); see https://www.pharmvar.org/haplotype/652 for details
    Estimated activity for *63: unknown

Could this be due to an error on my end?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.