wglab / intervar Goto Github PK

View Code? Open in Web Editor NEW

187.0 30.0 93.0 92.87 MB

A bioinformatics software tool for clinical interpretation of genetic variants by the 2015 ACMG-AMP guideline

Python 100.00%

intervar's Introduction

InterVar

A bioinformatics software tool for clinical interpretation of genetic variants by the ACMG-AMP 2015 guidelines

SYNOPSIS

Intervar.py [options]

WHAT DOES IT DO

InterVar is a python script for variant interpretation of clinical significance.

PREREQUISITE

You need install Python >=2.6.6.
You need install ANNOVAR version >= 2016-02-01.
You need download other files such as mim2gene.txt from OMIM.
Please use the updated files(should be generated: >= 2016-09) from OMIM, outdated files will bring problems of InterVar.

OPTIONS

-h, --help
show this help message and exit
--version
show program''s version number and exit
--config=config.ini Load your config file. The config file contains all options.

if you use this options,you can ignore all the other options bellow.

-i INPUTFILE, --input=INPUTFILE
input file of variants for analysis
--input_type=AVinput The input file type, it can be AVinput(Annovar''sformat),VCF
-o OUTPUTFILE, --output=OUTPUTFILE
prefix the output file (default:output)
-b BUILDVER, --buildver=BUILDVER
version of reference genome: hg18, hg19(default)
-t intervardb, --database_intervar=intervardb The database location/dir for the InterVar dataset files
-s your_evidence_file, --evidence_file=your_evidence_file

This potion is for user specified evidence file for each variant,

How to add your own Evidence for each Variant:

Prepare your own evidence file as tab-delimited,the line format:

(The code for additional evidence should be as: PS5/PM7/PP6/BS5/BP8 ;

The format for upgrad/downgrade of criteria should be like:

grade_PS1=2;           1 for Strong; 2 for Moderate; 3 for Supporting)

 Chr Pos Ref_allele Alt_allele  evidence_list

 1 123456 A G PM1=1;BS2=1;BP3=0;PS5=1;grade_PM1=1

--table_annovar=./table_annovar.pl The Annovar perl script of table_annovar.pl
--convert2annovar=./convert2annovar.pl The Annovar perl script of convert2annovar.pl
--annotate_variation=./annotate_variation.pl The Annovar perl script of annotate_variation.pl
-d humandb, --database_locat=humandb The database location/dir for the Annovar annotation datasets

EXAMPLE

    ./InterVar.py -c config.ini  # Run the examples in config.ini
    ./InterVar.py  -b hg19 -i your_input  --input_type=VCF  -o your_output

HOW DOES IT WORK

InterVar takes either pre-annotated files, or unannotated input files in VCF format or ANNOVAR input format, where each line corresponds to one genetic variant; if the input files are unannotated, InterVar will call ANNOVAR to generate necessary annotations. The execution of InterVar mainly consists of two major steps: 1) automatically interpret 28 evidence codes; and 2) manual adjustment by users to re-interpret the clinical significance. However, users can specify their own evidence code and import into InterVar by using the argument "-evidence_file=your_evidence_file" so that one single step is sufficient to generate the final results. In the output, based on all 28 pieces of evidence codes that are either automatically generated or supplied by the user, each variant will be assigned as "pathogenic", "likely pathogenic", "uncertain significance", "likely benign" or "benign" by rules specified in the ACMG2015 guidelines 24.

We also developed a web server of InterVar called wInterVar, which can be accessed at http://wintervar.wglab.org. The user can directly input their missense variants in wInterVar by chromosomal position, by dbSNP identifier, or by gene name with nucleic acid change information. The wInterVar server will provide full details on the variants, including all the evidence codes for the variants. The user then has the ability to manually adjust these evidence codes and resubmit to the server to perform re-interpretation. Since all evidence codes for all possible non-synonymous variants have been pre-computed by us, the execution of wInterVar is very fast, typically less than 1 second to obtain the results. However, the wInterVar server cannot process other types of variants (such as indels), and the user will need to use InterVar instead.

Web server

wInterVar: http://wintervar.wglab.org

LICENSE

InterVar is free for non-commercial use without warranty. Users need to obtain licenses such as OMIM and ANNOVAR by themselves. Please contact the authors for commercial use.

REFERENCE

Quan Li and Kai Wang. InterVar: Clinical interpretation of genetic variants by ACMG-AMP 2015 guideline. The American Journal of Human Genetics 100(2):267-280, 2017,http://dx.doi.org/10.1016/j.ajhg.2017.01.004

The ACMG 2015 guide Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in medicine : official journal of the American College of Medical Genetics 17, 405-424 (2015).

intervar's People

Contributors

Stargazers

Watchers

Forkers

conerade67 robinqi jiaolongsun yzharold microtsiu nina1992 tiehan welyt abrusell tomorrow121 alangalanggo genomicsnx biocodings csardas rjsicko bennyyu686 polojacky 0820ll hongming-wang veronicaandric yyxql hongiiv peter-xbs bbz525 haoziyeung cailili890215 vangalamaheshh vsubasri rezajf ngslabex adelinegoudal xiasanshi hexy0515 ttsien snashraf gavinwinner ghxdghxd kange2014 zhangxiujuan n-damo chengcheng-debug basakabak newalbert123456 flywind2 miaoranzhang asassheng wave-wu habush lvmt nick-su246 ftucos afredcomma oiiio cmguodong gokalpcelik luca3t chaoxiansen shunsunsun novapyth knickknackz suryamk09 bin-guan charlie-ck-y koopmann zhenningyoung achalneupane mital14 shicheng-guo xiaobo199405 hongping1018 1seokyoo lisawanghsu zeromtmu youfenghao zuber-bioinfo pang-hd jennybb123 853tony bentjordan genostack reto169 zhangming880102 mailmrcai sophie3101 nilbarsa namxle-btg jeantristanb cimorgh-it glsofort

intervar's Issues

Can I use ExAC database for PM2?

Hi.

I appreciate your efforts to make this amazing tool.

I want to use ExAC database for detecting PM2.

But in InterVar, 1000g is default.

Can I add ExAC database? How?

I added "exac03nonntcga" to "database_names" in config.ini, but it is only able to check whether there is exac data in humandb.

Isn't it?

Thanks

Oh.

Download and unzip error for ljb26_all and 1000g2015aug.

Hi,

I'm trying the following command
perl annotate_variation.pl -downdb -webfrom annovar --buildver hg19 1000g2015aug humandb/
perl annotate_variation.pl -downdb -webfrom annovar --buildver hg19 ljb26_all humandb/

However, I get the following error.

NOTICE: Uncompressing downloaded files
unzip: cannot find or open hg19_1000g2015aug.zip, hg19_1000g2015aug.zip.zip or hg19_1000g2015aug.zip.ZIP.
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory Done
NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_ljb26_all.txt.idx.gz ... Done
NOTICE: Uncompressing downloaded files
gunzip: hg19_ljb26_all.txt.gz: unexpected end of file
gunzip: hg19_ljb26_all.txt.gz: uncompress failed
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory

It seems unzip error due to failure downloading.
"hg19_ljb26_all.txt.gz" file size is 1.08GB which seems to be smaller than expected.
I also read the issue #2, but still can't figure out how to solve it.
Other file such as esp6500siv2_all, exac03 were successfully downloaded in similar command.

I'm using macOS High Sierra, version 10.13.6.
I downloaded the latest version of Annovar on May 14.

Any help would be appreciated.

Best regards,
Hiroyuki

-c config.ini

OMIM databases update

Hello,
InterVar databases seem to need update regarding the OMIM database domain.
Some variants, while the corresponding phenotypes well annotated in the recent OMIM, are nonexistent in the mim_pheno.txt file.
Currently I'm improvising by updating the mim_pheno.txt file myself, but it would be nice if you could do a systematic update.
Thank you.
Best,
Seomgin

check_BS2()

In my understanding, if snps as heter,it does't matter it dominant or recessive, and if snps as homo, it shoul be recessive. So I don't understand in your code ,you think dominant (heterozygous).

        try:
            if mim_domin_dict[mim1] == "1": # means dominant disorder: check snps as heter
                BS2=0
                try:
                    if BS2_snps_domin_dict[ keys ]=="1":  # key as snp info
                        BS2=1
                except KeyError:
                    pass
                else:
                    pass

How can I add gnomAD database?

The InterVar uses default output of Annovar for selected frequency databases, which are ExAC, ESP6500 and 1000 Genomes. In 2007, Annovar announced that they also support gnomAD database.

How can I integrate the gnomAD database (gnomad_exome) request into Intervar?

check_PVS()

question1:

In ACMG Paper, it writes "in the last 50 base pairs of the penultimate exon" ,
why in our interVar the last 50 base for the last exon ???

      try:
            if (float(knownGeneCanonical_ed_dict[trans_id])-float( cls[Allels_flgs['Start']]  ))<50: # means close  3' of gene 50 bp.
             PVS=0
        except ValueError:
            pass
        else:
            pass

question2:
The ACMG paper means the last base ? why in our interVar it becomes 1 nor last exon?

        if exon==exon_lth or exon =="exon1": # not 1 or last exon
            PVS=0

For PS1 rule

Hi~
When I saw check_PS1（）， you mean the variant has the same allele change and also the same amino acid change in the pathogenic databank, which cannot assign PS1. It will be PM5.

I am a bit confused about this.

The ACMG guidline for PS1:
In most cases, when one missense variant is known to be pathogenic, a different nucleotide change that results in the same amino acid (e.g., c.34G>C (p.Val12Leu) and c.34G>T (p.Val12Leu)) can also be assumed to be pathogenic, particularly if the mechanism of pathogenicity occurs through altered protein function. However, it is important to assess the possibility that the variant may act directly through the specific DNA change (e.g., through splicing disruption as assessed by at least computational analysis) instead of through the amino acid change, in which case the assumption of pathogenicity may no longer be valid.

According to the ACMG guide, I thought that the variant has the same allele change and also the same amino acid change in the pathogenic databank, which should be assigned PS1.

If I have any misunderstanding, please let me know. Looking forward to your reply.

May

Error downloading from Annovar

I am getting the following notices when trying to run the tool:

NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_ALL.sites.2015_08.txt.gz ... Failed
NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_ALL.sites.2015_08.txt.idx.gz ... Failed

These then result in an error:

Error: the required database file ../annovar/humandbm/hg19_ALL.sites.2015_08.txt does not exist.

When I go to the above addresses I get a URL not found error.

Please can you advise how to fix this?

Many thanks,
Nicky

MemoryError

Hi
I am getting this error when I am running the program with and without the annovar

Traceback (most recent call last):
File "Intervar.py", line 2051, in
main()
File "Intervar.py", line 2013, in main
read_datasets()
File "Intervar.py", line 364, in read_datasets
BS2_snps_recess_dict[ keys ]=cls2[4] # key as snp info
MemoryError

PM1 scoring

I'm trying to figure out why PM1 is scored as 1 for a variant. The variant is:
(hg19) 11:36615436C>T - RAG2 NM_001243786:exon3:c.G283A:p.G95R

I checked the PM1 check function:

def check_PM1(line,Funcanno_flgs,Allels_flgs,domain_benign_dict):
    '''
    Located in a mutational hot spot and/or critical and well-established functional domain (e.g., active site of
    an enzyme) without benign variation
    '''
    PM1=0
    PM1_t1=0
    PM1_t2=0
    cls=line.split('\t')
    funcs_tmp=["missense","nonsynony"]
    line_tmp=cls[Funcanno_flgs['Func.refGene']]+" "+cls[Funcanno_flgs['ExonicFunc.refGene']]
    for fc in funcs_tmp:
        if line_tmp.find(fc)>=0 :
            PM1_t1=1;
        # need to wait to check whether in hot spot  or  functional domain/without benign variation
    if cls[Funcanno_flgs['Interpro_domain']]!= '.' :
        keys_tmp2=cls[Allels_flgs['Chr']]+"_"+cls[Funcanno_flgs['Gene']]+": "+cls[Funcanno_flgs['Interpro_domain']]
        try:
            if domain_benign_dict[keys_tmp2] =="1":
                PM1_t2=0
        except KeyError:
            PM1_t2=1
        else:
            pass

    if PM1_t1==1 and PM1_t2==1 :
        PM1=1

    return(PM1)

If I'm interpreting the code correctly, PM_t1 checks to make sure there's at least one one exonic missense annotation, then if 'Interpro_domain' has any annotations check that they are not domains with benign variants.

The example variant has the following domain in PM1_domains_with_benigns.hg19 :
AG2 Galactose oxidase/kelch, beta-propeller

Interpro_domains annotated by annovar:
Galactose oxidase/kelch, beta-propeller;Kelch-type beta propeller

is it the Kelch-type beta propeller domain that is causing PM1_t2 to be 1? If so, I'm not sure it should...
https://www.ebi.ac.uk/interpro/protein/P55895

both domains overlap almost completely. Can you provide any information on how you generate PM1_domains_with_benigns.hg19 ?

Thanks,
Bob

Intervar.py didn't use database_names for annovar section in config.ini

Hi, I customized database_names in config.ini and just realized intervar.py didn't pass database_names when it calls table_annovar.pl to annotate variants.

checked the intervar.py and it appears the annovar input arguments were hard-coded in intervar.py.
....paras['outfile']+" -protocol refGene,esp6500siv2_all,1000g2015aug_all,avsnp147,dbnsfp33a,clinvar_20190305,gnomad_genome,dbscsnv11,dbnsfp31a_interpro,rmsk,ensGene,knownGene -operation g,f,f,f,f,f,f,f,f,r,g,g -nastring ."+annovar_options

would it be possible to use database_names instead of hard-coding the command line parameters in the script?

Thanks!

if there is vcf file have been annotated, how to skip the annotation step by annovar ?

we have been annotated the vcf, and then we run "python /opt/Intervar.py -b hg19 -i /opt/annovar.hg19_multianno.vcf --input_type=VCF -o out.vcf " there show error and dose't work.
I really need to know how to used it after annotated vcf

PS1

http://wintervar.wglab.org/ site is down

Hello, I've been trying to access the InterVar website since last Thursday/Friday. The site seems to be down. Could you let me know if a fix is in progress and the timeline for it to come back online?

TabError

there are 2 error with ”TabError: inconsistent use of tabs and spaces in indentation” at line 942 and 1781.

Taking too long to run on exome

It is taking around 4-5 hours to run on one exome VCF of around 90K variants
(running on Xeon 32 core processor + 120Gb Ram workstation)
Is it normal run time you expect?

when running the example, Intervar gets "stuck"at reading annotation database humandb/hg19_rmsk.txt

Hello,

I am a newbie trying to run Intervar in an Amazon Linux AMI. I downloaded and installed ANNOVAR and then Intervar, and I've followed the instructions to run it. Everything goes well until it is time to read the rmsk file. It's been stuck at this point for a few hours and it seems excessive since I am just running the 18 SNP example "ex1.avinput".
Here is where it's at:

...
...
NOTICE: Scanning filter database humandb/hg19_dbnsfp31a_interpro.txt...Done

NOTICE: Processing operation=r protocol=rmsk

NOTICE: Running with system command <annotate_variation.pl -regionanno -dbtype rmsk -buildver hg19 -outfile example/myanno example/ex1.avinput humandb>
NOTICE: Output file is written to example/myanno.hg19_rmsk
NOTICE: Reading annotation database humandb/hg19_rmsk.txt ...

What could be the problem? The humandb folder contains the right file, so it should be accessible.
Thank you in advance for your help

about PS1

Dear InterVar team,

Greetings

I have a question.

In intervardb, does "PS1.AA.change.patho.hg19 file" mean "pathogenic database" ?

Thanks

intervar_20170202

I really appreciate your work and thanks for your share.
And I also see annovar has intervar_20170202 database, but it only
has clinical interpretation of missense variants. I wonder why insert/del
not add into it. So the annoation speed will very fast .
Thanks again!

dear lee, where can I download “the annotation result file from ANNOVAR”？the last two sentences of ERROR

wdeMacBook:InterVar-1.0.8 wj$ python Intervar.py -i example/ex1.avinput -o example/myanno --evidence_file=evdience.txt

InterVar
Interpretation of Pathogenic/Benign for variants using python scripts of InterVar.

%prog 0.1.7 20170608
Written by Quan LI,[email protected].
InterVar is free for non-commercial use without warranty.
Please contact the authors for commercial use.
Copyright (C) 2016 Wang Genomic Lab

Notice: Your command of InterVar is ['Intervar.py', '-i', 'example/ex1.avinput', '-o', 'example/myanno', '--evidence_file=evdience.txt']
Warning: You provided your own evidence file [ evdience.txt ] for the InterVar.
Warning: Your specified evidence file [ evdience.txt ] is not here,please check the path of your evidence file.
Your analysis will begin without your specified evidence.
INFO: The options are {'pp2_genes': 'intervardb/PP2.genes', 'inputfile': 'example/ex1.avinput', 'exclude_snps': 'intervardb/ext.variants.hg19', 'annotate_variation': './annotate_variation.pl', 'ps4_snps': 'intervardb/PS4.variants.hg19', 'mim_domin': 'intervardb/mim_domin.txt', 'current_version': 'Intervar_20170228', 'bs2_snps': 'intervardb/BS2_hom_het.hg19', 'evidence_file': 'evdience.txt', 'public_dev': 'https://github.com/WGLab/InterVar/releases', 'otherinfo': 'TRUE', 'database_names': 'refGene esp6500siv2_all 1000g201508 avsnp144 dbnsfp30a clinvar_20160302 exac03 dbscsnv11 dbnsfp31a_interpro rmsk ensGene knownGene', 'mim_pheno': 'intervardb/mim_pheno.txt', 'table_annovar': './table_annovar.pl', 'buildver': 'hg19', 'inputfile_type': 'AVinput', 'onetranscript': 'FALSE', 'mim2gene': 'intervardb/mim2gene.txt', 'orpha': 'intervardb/orpha.txt', 'ps1_aa': 'intervardb/PS1.AA.change.patho.hg19', 'mim_adultonset': 'intervardb/mim_adultonset.txt', 'knowngenecanonical': 'intervardb/knownGeneCanonical.txt', 'outfile': 'example/myanno', 'convert2annovar': './convert2annovar.pl', 'database_locat': 'humandb', 'database_intervar': 'intervardb', 'lof_genes': 'intervardb/PVS1.LOF.genes', 'disorder_cutoff': '0.01', 'mim_recessive': 'intervardb/mim_recessive.txt', 'pm1_domain': 'intervardb/PM1_domains_with_benigns', 'mim_orpha': 'intervardb/mim_orpha.txt', 'bp1_genes': 'intervardb/BP1.genes'}
Warning: the folder of humandb is already created!
Warning: The Annovar dataset file of 1000g201508 is not in humandb,begin to download this humandb/hg19_1000g201508.txt ...
perl ./annotate_variation.pl -buildver hg19 -downdb -webfrom annovar 1000g201508 humandb
NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_1000g201508.txt.gz ... Failed
NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_1000g201508.txt.idx.gz ... Failed
WARNING: Some files cannot be downloaded, including http://www.openbioinformatics.org/annovar/download/hg19_1000g201508.txt.gz, http://www.openbioinformatics.org/annovar/download/hg19_1000g201508.txt.idx.gz
perl ./table_annovar.pl example/ex1.avinput humandb -buildver hg19 -remove -out example/myanno -protocol refGene,esp6500siv2_all,1000g2015aug_all,avsnp144,dbnsfp30a,clinvar_20160302,exac03,dbscsnv11,dbnsfp31a_interpro,rmsk,ensGene,knownGene -operation g,f,f,f,f,f,f,f,f,r,g,g -nastring . --otherinfo
Error: the required database file humandb/hg19_ALL.sites.2015_08.txt does not exist.
Warning: The InterVar seems not run correctly, please check your inputs , options and configure file!
ERROR: The InterVar did not find the annotation result file from ANNOVAR!
ERROR: The name of annotation result file should be like example/myanno*.hg19__multianno.txt

Thanks for using InterVar!
Report bugs to [email protected];
InterVar homepage: http://wInterVar.wglab.org

Use of hash to reduce BS2_hom_het loading time

Small suggestion that would improve loading of BS2_hom_het - instead of calling flip_ACGT, create a hash, say flip_ACGT={'A':'T','T':'A','C':'G','G':'C','N':'N','X':'X'}, and use it. Shaves a few seconds off.

PS1 and PM5 also can't be assigned

I run some snps by InerVar, but PS1 and PM5 can't be assigned.

function check_PS1():
try:
if aa_changes_dict[keys_tmp2]:
PS1_t2=0
except KeyError:
for nt in ACGTs:
if nt != cls[Allels_flgs['Alt']]:
keys_tmp3=cls[Allels_flgs['Chr']]+""+cls[Allels_flgs['Start']]+""+cls[Allels_flgs['End']]+"_"+nt
try:
if aa_changes_dict[keys_tmp3] == aa_last:
PS1_t2=1
except KeyError:
pass

I wonder if keys_tmp2 in aa_changes_dict, why not PS1_t2=1?

Is an update coming for gnomAD v2.1.1 recently released by ANNOVAR?

Hi, just as the tittle said, I was wondering if Intervar would work on the gnomAD v2.1.1 by ANNOVAR cause I've seen changes in the headers of the new files?

Thanks!

Intervar Database not available for download

Hi,

First, thanks for all the hard work.

When I try to download the InterVar database using:
./annotate_variation.pl -build hg19 -downdb intervar_20180118 humandb/

the screen output is (no logfile written).
NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/intervar_20180118.txt.gz ... Failed
WARNING: Some files cannot be downloaded, including http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/intervar_20180118.txt.gz

I visited the UCSC web page, and indeed, there is no intervar file at all (i.e., no variant by date).

Since the instructions to download intervar are on the ANNOVAR site, I am reporting this here so that ir might be useful to others.

I have sent e-mail with the same information

The evidence file doesn't work, even with the example shown in the manual

Hi, there:
I put the evidence.txt into the config.ini: evidence_file = evidence.txt
and the evidence is one line as in your manual:
1 67705958 G A PS3=1;PM6=1;grade_PM6=1;

The result for variant (#72: 1 | 67705958 | 67705958 | G | A) always give the same in InterVar column:
InterVar: Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[0, 0, 0, 0, 0, 0, 0] PP=[0, 0, 0, 0, 0, 0] BA1=0 BS=[1, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0]

Tried with other way like --evidence file=evidence.txt, still don't work.

thanks,

WARNING: A total of 43 sequences will be ignored due to lack of correct ORF annotation

number of PS bigger than 4

as for the title,in the code, why the number of PS=[0,0,0,0,0] bigger than 4.Any response will be appreciated.thanks.
best regards.
lianlin

--skip_annovar option

I have used annovar to annotation my mutations,so when I use Interva, I add the --skip_annovar parameter.But it can't work normally.The error is below:
Warning: The InterVar seems not run correctly, please check your inputs , options and configure file!
ERROR: The InterVar did not find the annotation result file from ANNOVAR!
ERROR: The name of annotation result file should be like 8Y7038*.hg38__multianno.txt

My run scripts is :python Intervar.py -b hg38 --input_type=AVinput -i samplename.hg38__multianno.txt -o samplename --skip_annovar

Can you help me solve this problem?Thanks!

ValueError: could not convert string to float: X

Hello, I am trying to run your tool with this command:
./InterVar.py -b hg19 --table_annovar='/annovar/table_annovar.pl' --convert2annovar='/annovar/convert2annovar.pl' --annotate_variation='/annovar/annotate_variation.pl' -i '/vcf_wes/sample.vcf' --input_type=VCF -o '/vcf_wes/sample_output'

After few moment I am getting this error:
``
NOTICE: Multianno output file is written to /home/lab/galaxy-Win-Lin/David/vcf_wes/6788_output.hg19_multianno.txt
Notice: Begin the variants interpretation by InterVar

Traceback (most recent call last):

File "./InterVar.py", line 2022, in
main()

File "./InterVar.py", line 1996, in main
sum2=my_inter_var(annovar_outfile)

File "./InterVar.py", line 1771, in my_inter_var
intervar_bp=assign(BP,line,Freqs_flgs,Funcanno_flgs,Allels_flgs)

File "./InterVar.py", line 1599, in assign
PM2=check_PM2(line,Freqs_flgs,Allels_flgs,Funcanno_flgs,mim2gene_dict,mim2gene_dict2)

File "./InterVar.py", line 1055, in check_PM2
if(cls[Freqs_flgs[key]]!='.' and float(cls[Freqs_flgs[key]])>=cutoff_maf):
ValueError: could not convert string to float: X
``

In attachment you can see few lines of my vcf,
error_intervar.txt

Can you help? Thank you in advancei

Specific transcript specification

Is there a way to specify using annotation for specific transcripts? The genes we target contain multiple alternative transcripts, but we only want to annotate/interpret clinical significance for a subset of the transcripts (LRG if available, specific refseq transcripts when LRG is not available).

Thanks,
Bob

check_PP3()

Why only have MetaSVM_score, dbscSNV_RF_SCORE,ERP++_RS?

such mcap ? sift? revel? polyphen? why not add to prediction?

Someone could help me with this error massage "ValueError: could not convert string to float: 'X'"

I need do obtain a standard matrix but I'm having some problems... Here is my code:

import pandas as pd
df = pd.read_excel(r'C:/Users/DESICHRIS/Desktop/INPUT.xlsx')
df.as_matrix()
print (df)
from sklearn.preprocessing import StandardScaler

Get column names first

names = df.columns

Create the Scaler object

scaler = StandardScaler()

Fit your data on the scaler object

scaled_df = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_df, columns=names)
print (scaled_df)

Python shows an error message, as above:

File "c:/Users/DESICHRIS/Desktop/ATUAL TESE DESIREE/pesquisa Desiree/python/mteste.py", line 11, in
scaled_df = scaler.fit_transform(df)
File "C:\Users\DESICHRIS\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\base.py", line 464, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "C:\Users\DESICHRIS\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\preprocessing\data.py", line 645, in fit
return self.partial_fit(X, y)
File "C:\Users\DESICHRIS\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\preprocessing\data.py", line 669, in partial_fit
force_all_finite='allow-nan')
File "C:\Users\DESICHRIS\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\validation.py", line 527, in check_array
array = np.asarray(array, dtype=dtype, order=order)
File "C:\Users\DESICHRIS\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\numeric.py", line 538, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'X'

Someone can help me?

input file format?

Hi! I'd like to use InterVar with the VEP annotation, so I need to "restyle" VEP output so it become compatible with InterVar. As far as I can figure out from the "Intervar.py" column names of input file are hardcoded inside the code body ( "Ref.Gene" and etc), therefore the way to go is to change names/order... of the columns of VEP tab deliminated output.

My question is - do you have a complete list of columns used by InterVar ? (as far as Annovar can generate different set of columns, depending on used DB, etc. and not all columns in the example file '' myanno.hg38_multianno.txt'' seems to be used by InterVar)

PS If you know an easier way to launch InterVar on VEP output, I'd be happy to hear it

PSS basically I'd like to do what is writtein in the paper: "Users can generate this input file themselves by using an in-house variant analysis workflow;" and need some help with it.

Best, Eugene

Otherinfo1.. columns not seen

How can i see Otherinfo1, Otherinfo2,Otherinfo3...Otherinfo13 columns at multianno.intervar file?
Thanks.

Adding database

I need gnomad_exome database which i managed to download with "perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar gnomad_exome humandb/"

How do I also include it when I run InterVar?

ANNOVAR new version is available ... Failed

Hello,

The latest version of ANNOVAR (2018Apr16) in my computer, however web-based checking to see whether ANNOVAR new version is available ... Failed
Database also cannot be downloaded.
How to solve this problem?
Thanks

Tsai

why 'not 1 or last exon' in PVS1

hello,
In the function of check_PVS1, I found you exclude the '1 or last exon', but in guideline, ' in the last exon or in the last 50 base pairs of the penultimate exon'. So, I want know why you exclude the first exon.

rs369055904 can't assign PVS1

rs369055904
ExonicFunc.refGene is None
ExonicFunc.knownGene is stopgain
In InterVar:
line_tmp=cls[Funcanno_flgs['Func.refGene']]+" "+cls[Funcanno_flgs['ExonicFunc.refGene']]
So PVS1 can' be assigned.
But if line_tmp=cls[Funcanno_flgs['Func.refGene']]+" "+cls[Funcanno_flgs['ExonicFunc.refGene']] +cls[Funcanno_flgs['ExonicFunc.knownGene']]
the snp can be assigned PVS1.
Do we need to combine ExonicFunc.knownGene and ExonicFunc.refGene？

Error

I am getting below error while running this command
Can you help, thanks

./Intervar.py', '-c', 'config.ini'

File "./Intervar.py", line 1631, in
main()
File "./Intervar.py", line 1601, in main
read_datasets()
File "./Intervar.py", line 210, in read_datasets
morbidmap_dict2[ cls2[2] ]='1' # key as mim number
IndexError: list index out of range

Error with skip_annovar

I had a VCF already annotated by AnnoVar that I wanted to run Intervar on. I used the following command:
python Intervar.py -b hg38 --input_type=VCF -i xaa.vcf -o output --skip_annovar
It successfully converted the VCF to an AVinput file of the same name but then resulted in the following errors and did not produce an output file:
ERROR: The InterVar did not find the annotation result file from ANNOVAR!
ERROR: The name of annotation result file should be like output*.hg38__multianno.txt

Is this an issue with my command, the input file, or something else?

check_PS1() won't return 1

check_PS1() won't return 1 and PS1 will still 0.

def check_PS1(line, Funcanno_flgs, Allels_flgs, aa_changes_dict):
'''
PS1 Same amino acid change as a previously established pathogenic variant regardless of nucleotide change
Example: Val->Leu caused by either G>C or G>T in the same codon
AAChange.refGene
NOD2:NM_001293557:exon3:c.C2023T:p.R675W,NOD2:NM_022162:exon4:c.C2104T:p.R702W
'''

PS1 = 0
PS1_t1 = 0
PS1_t2 = 0
PS1_t3 = 0
dbscSNV_cutoff = 0.6  # either score(ada and rf) >0.6 as splicealtering
cls = line.split('\t')
funcs_tmp = ["missense", "nonsynony"]
ACGTs = ["A", "C", "G", "T"]
line_tmp = cls[Funcanno_flgs['Func.refGene']] + " " + cls[Funcanno_flgs['ExonicFunc.refGene']]

for fc in funcs_tmp:
    if line_tmp.find(fc) >= 0:
        PS1_t1 = 1;
        # need to wait to check Same amino acid change as a previously pathogenic variant
        line_tmp2 = cls[Funcanno_flgs['AAChange.refGene']]
        cls0 = line_tmp2.split(',')
        cls0_1 = cls0[0].split(':')
        aa = cls0_1[4]
        aa_last = aa[len(aa) - 1:]
        keys_tmp2 = cls[Allels_flgs['Chr']] + "_" + cls[Allels_flgs['Start']] + "_" + cls[
            Allels_flgs['End']] + "_" + cls[Allels_flgs['Alt']]
        try:
            if aa_changes_dict[keys_tmp2]:
                PS1_t2 = 0
        except KeyError:
            for nt in ACGTs:
                if nt != cls[Allels_flgs['Alt']]:
                    keys_tmp3 = cls[Allels_flgs['Chr']] + "_" + cls[Allels_flgs['Start']] +\
                                "_" + cls[Allels_flgs['End']] + "_" + nt
                    try:
                        if aa_changes_dict[keys_tmp3] == aa_last:
                            PS1_t2 = 1
                    except KeyError:
                        pass
                    else:
                        pass
        else:
            pass
try:
    if float(cls[Funcanno_flgs['dbscSNV_RF_SCORE']]) > dbscSNV_cutoff or float(
            cls[Funcanno_flgs['dbscSNV_ADA_SCORE']]) > dbscSNV_cutoff:  # means alter the splicing
        PS1_t3 = 1
except ValueError:
    pass
else:
    pass

if PS1_t1 != 0 and PS1_t2 != 0:
    if PS1_t3 == 1:  # remove the splicing affect
        PS1 = 0
return (PS1)

how can I use own config.ini ?

I creat my own config.ini ,and not in the InterVar path,but I use : python /path/to/InterVar.py -c /path/to/my_own_config.ini
AND ERROR : Error: The default configure file of [ config.ini ] is not here, exit! Please redownload the InterVar.
SO ,HOW TO DO?

InterVar Website is Down

Hello,

The website (http://wintervar.wglab.org/) has been down for a couple of days. Is there an issue or is there another upgrade occurring? Could you let me know when it is back up so I can communicate it to my lab?

Thanks,
Nikita

ValueError: could not convert string to float: '.'

Hi,when I run the example in config.ini with this command:
python InterVar.py -c config.ini
I got the following error. The version of python and ANNOVAR on my computer are 3.5.2 and 2017-07-17 respectively.

Notice: Begin the variants interpretation by InterVar
Traceback (most recent call last):
File "Intervar.py", line 2083, in
main()
File "Intervar.py", line 2052, in main
sum2=my_inter_var(annovar_outfile)
File "Intervar.py", line 1805, in my_inter_var
intervar_bp=assign(BP,line,Freqs_flgs,Funcanno_flgs,Allels_flgs)
File "Intervar.py", line 1680, in assign
BP7=check_BP7(line,Funcanno_flgs,Allels_flgs)
File "Intervar.py", line 1601, in check_BP7
if float(cls[Funcanno_flgs['GERP++_RS']]) <= float(cutoff_conserv) or cls[Funcanno_flgs['GERP++_RS']] == '.' :
ValueError: could not convert string to float: '.'

Can you help me? Thanks.

Error reading bs2_snps with python version 2.17.3 or higher

Here is the error message that occurs:

Traceback (most recent call last):
File "/home/jeske/tools/InterVar/Intervar.py", line 2066, in
main()
File "/home/jeske/tools/InterVar/Intervar.py", line 2028, in main
read_datasets()
File "/home/jeske/tools/InterVar/Intervar.py", line 358, in read_datasets
for line2 in strs.split('\n'):
TypeError: a bytes-like object is required, not 'str'

FR - Read annovar database names from config.ini

Hi, thanks for the great program!
It would be great if the database_names for annovar were read from the config.ini file instead of hard-coded into Intervar.py. This would allow use of the updated databases avsnp147, dbnsfp33a, clinvar_20170130 and incorporation of gnomad_exome, mcap, revel.

InterVar always downloads 1000g2015aug. Says it is not in the provided directory.

For now, I have just removed Intervar's check for the file and subsequent call to Annovar's Perl script. Perhaps you or I could review this section of Intervar code at some point.

Notice: Your command of InterVar is ['/home/tools/InterVar/Intervar.py', '-c', 'config.ini', '--table_annovar=/home/tools/annovar/table_annovar.pl', '--convert2annovar=/home/tools/annovar/convert2annovar.pl', '--annotate_variation=/home/tools/annovar/annotate_variation.pl', '--database_locat=/home/tools/annovar/humandb', '--input_type=VCF', '--input=/home/ngs-projects/ThuPerfTest/vcf/ThuPerfTest.35genes_woIntron.QCed.SplitMultiAllelic.vcf', '--output=/home/ngsprojects/ThuPerfTest/vcf/intervar/ThuPerfTest.35genes_woIntron.QCed.SplitMultiAllelic']

INFO: The options are {'pp2_genes': 'intervardb/PP2.genes', 'inputfile': '/home/ngs-projects/ThuPerfTest/vcf/ThuPerfTest.35genes_woIntron.QCed.SplitMultiAllelic.vcf', 'exclude_snps': 'interv
ardb/ext.variants.hg19', 'annotate_variation': '/home/tools/annovar/annotate_variation.pl', 'ps4_snps': 'intervardb/PS4.variants.hg19', 'mim_recessive': 'intervardb/mim_recessive.txt', 'curr
ent_version': 'Intervar_20151116', 'bs2_snps': 'intervardb/BS2_hom_het.hg19', 'evidence_file': 'None', 'public_dev': 'https://github.com/WGLab/InterVar/releases', 'otherinfo': 'FALSE', 'data
base_names': 'refGene esp6500siv2_all 1000g2015aug avsnp144 dbnsfp30a clinvar_20160302 exac03 dbscsnv11 dbnsfp31a_interpro rmsk ensGene knownGene', 'mim_pheno': 'intervardb/mim_pheno.txt', '
table_annovar': '/home/tools/annovar/table_annovar.pl', 'buildver': 'hg19', 'inputfile_type': 'VCF', 'onetranscript': 'FALSE', 'mim2gene': 'intervardb/mim2gene.txt', 'orpha': 'intervardb/orp
ha.txt', 'ps1_aa': 'intervardb/PS1.AA.change.patho.hg19', 'mim_adultonset': 'intervardb/mim_adultonset.txt', 'morbidmap': 'intervardb/morbidmap', 'outfile': '/home/ngs-projects/ThuPerfTest/v
cf/intervar/ThuPerfTest.35genes_woIntron.QCed.SplitMultiAllelic', 'csvout': 'FALSE', 'knowngenecanonical': 'intervardb/knownGeneCanonical.txt', 'dot2underline': 'TRUE', 'convert2annovar': '/
home/tools/annovar/convert2annovar.pl', 'database_locat': '/home/tools/annovar/humandb', 'database_intervar': 'intervardb', 'lof_genes': 'intervardb/PVS1.LOF.genes', 'disorder_cutoff': '0.01
', 'mim_domin': 'intervardb/mim_domin.txt', 'pm1_domain': 'intervardb/PM1_domains_with_benigns', 'mim_orpha': 'intervardb/mim_orpha.txt', 'bp1_genes': 'intervardb/BP1.genes'} 

Warning: the folder of /home/tools/annovar/humandb is already created!

Warning: The Annovar dataset file of 1000g2015aug is not in /home/tools/annovar/humandb,begin to download this /home/tools/annovar/humandb/hg19_1000g2015aug.txt ...
perl /home/tools/annovar/annotate_variation.pl -buildver hg19 -downdb -webfrom annovar 1000g2015aug /home/tools/annovar/humandb

My humandb directory contains these files:

GRCh37_MT_ensGeneMrna.fa        hg19_AMR.sites.2015_08.txt.idx  hg19_SAS.sites.2015_08.txt.idx  hg19_dbnsfp31a_interpro.txt.idx  hg19_exac03.txt.idx          hg19_refGeneVersion.txt
annovar_downdb.log              hg19_EAS.sites.2015_08.txt      hg19_avsnp144.txt               hg19_dbscsnv11.txt               hg19_example_db_generic.txt  hg19_rmsk.txt
genometrax-sample-files-gff     hg19_EAS.sites.2015_08.txt.idx  hg19_avsnp144.txt.idx           hg19_dbscsnv11.txt.idx           hg19_example_db_gff3.txt
hg19_AFR.sites.2015_08.txt      hg19_EUR.sites.2015_08.txt      hg19_clinvar_20160302.txt       hg19_ensGene.txt                 hg19_kgXref.txt
hg19_AFR.sites.2015_08.txt.idx  hg19_EUR.sites.2015_08.txt.idx  hg19_clinvar_20160302.txt.idx   hg19_ensGeneMrna.fa              hg19_knownGene.txt
hg19_ALL.sites.2015_08.txt      hg19_MT_ensGene.txt             hg19_dbnsfp30a.txt              hg19_esp6500siv2_all.txt         hg19_knownGeneMrna.fa
hg19_ALL.sites.2015_08.txt.idx  hg19_MT_ensGeneMrna.fa          hg19_dbnsfp30a.txt.idx          hg19_esp6500siv2_all.txt.idx     hg19_refGene.txt```

Unable to see Orphanet Disease Title

As you know, the output of Intervar.py contains two columns related with Orhphanet:

OrphaNumber: Contains the Orpha IDs which are related with the gene of interest.
Orpha: Contains detailed information about the Orphanet record such as synonyms, prevalence etc.

However, none of these columns contains exact title of the disorder. For example for the SNP with Orpha ID#319646, the Orpha column looks like that:

"319646|CDG syndrome type It<br>CDG-It<br>CDG1T|<1 / 1 000 000|Autosomal recessive|Neonatal<br>Infancy|614921 ~"

and the OrphaNumber column is:
319646;

The names seen in record (CDG syndrome type It) is a synonym of actual disorder which is PGM1-CDG. How can I extract the real disorder name with using Intervar instead of synonyms?

CADD_raw and SIFT_score for what

In the *.intervar result , I see SIFT_score and CADD_raw . But CADD_raw and SIFT_score are not used to judge the ACMG category . Why are they in the result?

wglab / intervar Goto Github PK

intervar's Introduction

InterVar

SYNOPSIS

WHAT DOES IT DO

PREREQUISITE

OPTIONS

EXAMPLE

HOW DOES IT WORK

Web server

LICENSE

REFERENCE

intervar's People

Contributors

Stargazers

Watchers

Forkers

intervar's Issues

In my understanding, if snps as heter,it does't matter it dominant or recessive, and if snps as homo, it shoul be recessive. So I don't understand in your code ,you think dominant (heterozygous).

... ... NOTICE: Scanning filter database humandb/hg19_dbnsfp31a_interpro.txt...Done

wdeMacBook:InterVar-1.0.8 wj$ python Intervar.py -i example/ex1.avinput -o example/myanno --evidence_file=evdience.txt

InterVar Interpretation of Pathogenic/Benign for variants using python scripts of InterVar.

%prog 0.1.7 20170608 Written by Quan LI,[email protected]. InterVar is free for non-commercial use without warranty. Please contact the authors for commercial use. Copyright (C) 2016 Wang Genomic Lab

Thanks for using InterVar! Report bugs to [email protected]; InterVar homepage: http://wInterVar.wglab.org

Get column names first

Create the Scaler object

Fit your data on the scaler object

Recommend Projects

Recommend Topics

Recommend Org

Jobs

...
...
NOTICE: Scanning filter database humandb/hg19_dbnsfp31a_interpro.txt...Done

InterVar
Interpretation of Pathogenic/Benign for variants using python scripts of InterVar.

%prog 0.1.7 20170608
Written by Quan LI,[email protected].
InterVar is free for non-commercial use without warranty.
Please contact the authors for commercial use.
Copyright (C) 2016 Wang Genomic Lab

Thanks for using InterVar!
Report bugs to [email protected];
InterVar homepage: http://wInterVar.wglab.org