GithubHelp home page GithubHelp logo

jianhua-wang / easyfinemap Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 0.0 51.83 MB

user-friendly pipeline for GWAS fine-mapping

Home Page: https://Jianhua-Wang.github.io/easyfinemap

License: MIT License

Python 70.19% Makefile 0.28% Jupyter Notebook 29.53%
fine-mapping gwas python conditional gcta ld paintor caviarbf clumping

easyfinemap's Introduction

Jianhua Wang 王建华

Bioinformatician
Tianjin Medical University, Tianjin, China

Github | Twitter | ResearchGate | Google Scholar | ORCID | Person site

  • 🔬 I’m currently working on functional genomics and population genetics.
  • 🔭 I am also very interested in applying machine learning methods to solve biological problems.

easyfinemap's People

Contributors

jianhua-wang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

easyfinemap's Issues

where i can find binary for model search

  • easy_finemap version:
  • Python version:
  • Operating System:

Description

Dear Author,
I am trying to run easyfinemap for my own summary stats. but it is failing to find model_search when I run example data.
Can you please direct me where i can find model_search program.
The error is "ValueError: model_search is not installed. Please install it first and make sure it is in your PATH."
I am using ubuntu subsytem in windows 10.
Thanking You
Yask Gupta

Summary statistics file format issue

  • easy_finemap version: 0.4.1
  • Python version: Python 3.11.5
  • Operating System: Linux RedHat version 7.9

Description

I am getting an error with my summary statistics file (i think) that I am not able to fix. Even after I munge sumstats with the smunged package, I am getting the same error. Here are all my files and the error message:

COMMAND:
easyfinemap fine-mapping
-m all
--ldref /broad/sankaranlab/arora/tools/ldsc/resources/1000G_Phase3_plinkfiles/1000G_EUR_Phase3_plink/1000G.EUR.QC.1-22
--use-ref-eaf
--credible-threshold 0.95
--credible-method susie
-n 1000
/broad/sankaranlab/arora/mpn/GWAMA_METAL_combined_GWAMA-SS_easyfinemap_hg19.smunged.txt.gz
/broad/sankaranlab/arora/mpn/COJO/for_easyfinemap/cojo.5e-8.jma.hg19.conditional.loci.txt
/broad/sankaranlab/arora/mpn/COJO/for_easyfinemap/cojo.5e-8.jma.hg19.conditional.leadsnp.txt
/broad/sankaranlab/arora/mpn/COJO/for_easyfinemap/cojo.5e-8.jma.hg19.ALL

SUMMARY STATISTICS FILE
zcat GWAMA_METAL_combined_GWAMA-SS_easyfinemap_hg19.smunged.txt.gz | head
CHR BP rsID EA NEA EAF MAF BETA SE P
1 70352 rs555652149 T A 0.004428 0.004428 -0.997573 0.448486 0.026149
1 85988 rs531531651 T C 0.001328 0.001328 -0.198839 0.841732 0.81325
1 88370 rs185487977 A G 0.003638 0.003638 0.34252 0.515456 0.506388
1 98608 rs548107800 A G 0.002558 0.002558 0.065282 0.597614 0.912993
1 528642 rs537054240 A G 0.003415 0.003415 -0.225 0.5011 0.653444
1 533574 rs544131969 T C 0.001046 0.001046 -1.10173 0.832018 0.18543
1 541399 rs552127086 T A 0.002411 0.002411 -1.09891 0.614338 0.073659
1 544743 rs562172865 T C 0.002918 0.002918 0.7424 0.5141 0.148704
1 546344 rs780596509 A G 0.001886 0.001886 -0.427 0.6659 0.521387

LOCI FILE:
head cojo.5e-8.jma.hg19.conditional.loci.txt
CHR START END LEAD_SNP LEAD_SNP_P LEAD_SNP_BP
5 11419701 11419701 5-11419701.0-C-A 0.201063 11419701
5 17206608 17206608 5-17206608.0-G-A 0.00558534 17206608
9 4145624 4145624 9-4145624.0-T-C 0.779905 4145624
9 13379241 13379241 9-13379241.0-T-C 0.490938 13379241
5 11081654 11081654 5-11081654.0-T-C 0.113209 11081654
5 10630659 10630659 5-10630659.0-A-G 0.00639208 10630659
5 9624355 9624355 5-9624355.0-C-G 0.0170136 9624355
9 2007878 2007878 9-2007878.0-T-C 0.00973056 2007878
9 12780670 12780670 9-12780670.0-C-T 0.000438603 12780670

LEADSNP FILE:
head cojo.5e-8.jma.hg19.conditional.leadsnp.txt
SNPID CHR BP rsID EA NEA EAF MAF BETA SE P
5-11419701-C-A 5 11419701 rs10053037 A C 0.022854 0.977146 -0.100211 0.07838 0.201063
5-17206608-G-A 5 17206608 rs10076008 A G 0.237728 0.762272 -0.072253 0.026073 0.00558534
9-4145624-T-C 9 4145624 rs10117067 C T 0.65091 0.34909 -0.006423 0.022985 0.779905
9-13379241-T-C 9 13379241 rs10118696 C T 0.016663 0.983337 -0.067668 0.098238 0.490938
5-11081654-T-C 5 11081654 rs1015622 C T 0.597973 0.402027 0.035629 0.022494 0.113209
5-10630659-A-G 5 10630659 rs1026914 G A 0.234234 0.765766 -0.07137 0.026172 0.00639208
5-9624355-C-G 5 9624355 rs10513027 G C 0.2326 0.7674 -0.063092 0.026438 0.0170136
9-2007878-T-C 9 2007878 rs10738556 C T 0.531571 0.468429 -0.057809 0.022361 0.00973056
9-12780670-C-T 9 12780670 rs10809843 T C 0.880128 0.119872 0.120504 0.034276 0.000438603

──────────────────────────────────────────────────────────────────── EasyFinemap ─────────────────────────────────────────────────────────────────────
Version: 0.4.1
Author: Jianhua Wang
Email: [email protected]
[20:31:15] INFO io - Loading summary statistics from /broad/sankaranlab/arora/mpn/GWAMA_METAL_combined_GWAMA-SS_easyfinemap_hg19.smunged.txt.gz
for 5:11419701-11419701
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/unix/arora/.conda/envs/easyfinemap-dev/lib/python3.11/site-packages/easyfinemap/cli.py:170 │
│ in fine_mapping │
│ │
│ 167 │ │ # sumstats = pd.read_csv(sumstats_path, sep="\t") │
│ 168 │ │ loci = pd.read_csv(loci_path, sep="\t") │
│ 169 │ │ lead_snps = pd.read_csv(lead_snps_path, sep="\t") │
│ ❱ 170 │ │ EasyFinemap().finemap_all_loci( │
│ 171 │ │ │ sumstats=sumstats_path, │
│ 172 │ │ │ loci=loci, │
│ 173 │ │ │ lead_snps=lead_snps, │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ cond_snps_wind_kb = 10000 │ │
│ │ conditional = False │ │
│ │ credible_method = 'susie' │ │
│ │ credible_threshold = 0.95 │ │
│ │ ldref = '/broad/sankaranlab/arora/tools/ldsc/resources/1000G_Phase3_plinkfiles… │ │
│ │ lead_snps = │ │ │ │ SNPID CHR BP rsID EA NEA EAF │ │
│ │ MAF BETA SE P │ │
│ │ 0 5-11419701-C-A 5 11419701 rs10053037 A C 0.022854 │ │
│ │ 0.977146 -0.100211 0.078380 2.010630e-01 │ │
│ │ 1 5-17206608-G-A 5 17206608 rs10076008 A G 0.237728 │ │
│ │ 0.762272 -0.072253 0.026073 5.585340e-03 │ │
│ │ 2 9-4145624-T-C 9 4145624 rs10117067 C T 0.650910 │ │
│ │ 0.349090 -0.006423 0.022985 7.799050e-01 │ │
│ │ 3 9-13379241-T-C 9 13379241 rs10118696 C T 0.016663 │ │
│ │ 0.983337 -0.067668 0.098238 4.909380e-01 │ │
│ │ 4 5-11081654-T-C 5 11081654 rs1015622 C T 0.597973 │ │
│ │ 0.402027 0.035629 0.022494 1.132090e-01 │ │
│ │ .. ... ... ... ... .. .. ... │ │
│ │ ... ... ... ... │ │
│ │ 191 16-33432030-G-T 16 33432030 rs8053513 T G 0.428571 │ │
│ │ 0.571429 -0.039137 0.038911 3.145080e-01 │ │
│ │ 192 9-1053465-A-C 9 1053465 rs869939 C A 0.051109 │ │
│ │ 0.948891 0.071603 0.050573 1.568240e-01 │ │
│ │ 193 9-3390735-C-T 9 3390735 rs9298953 T C 0.947515 │ │
│ │ 0.052485 -0.097088 0.059911 1.051170e-01 │ │
│ │ 194 5-6377832-A-G 5 6377832 rs9313147 G A 0.480425 │ │
│ │ 0.519575 0.063812 0.022189 4.029550e-03 │ │
│ │ 195 12-54596827-C-T 12 54596827 rs9739733 T C 0.099889 │ │
│ │ 0.900111 0.223902 0.038443 5.736650e-09 │ │
│ │ │ │
│ │ [196 rows x 11 columns] │ │
│ │ lead_snps_path = '/broad/sankaranlab/arora/mpn/COJO/for_easyfinemap/cojo.5e-8.jma.hg19.… │ │
│ │ loci = │ CHR START END LEAD_SNP LEAD_SNP_P │ │
│ │ LEAD_SNP_BP │ │
│ │ 0 5 11419701 11419701 5-11419701-C-A 2.010630e-01 │ │
│ │ 11419701 │ │
│ │ 1 5 17206608 17206608 5-17206608-G-A 5.585340e-03 │ │
│ │ 17206608 │ │
│ │ 2 9 4145624 4145624 9-4145624-T-C 7.799050e-01 │ │
│ │ 4145624 │ │
│ │ 3 9 13379241 13379241 9-13379241-T-C 4.909380e-01 │ │
│ │ 13379241 │ │
│ │ 4 5 11081654 11081654 5-11081654-T-C 1.132090e-01 │ │
│ │ 11081654 │ │
│ │ .. ... ... ... ... ... │ │
│ │ ... │ │
│ │ 191 16 33432030 33432030 16-33432030-G-T 3.145080e-01 │ │
│ │ 33432030 │ │
│ │ 192 9 1053465 1053465 9-1053465-A-C 1.568240e-01 │ │
│ │ 1053465 │ │
│ │ 193 9 3390735 3390735 9-3390735-C-T 1.051170e-01 │ │
│ │ 3390735 │ │
│ │ 194 5 6377832 6377832 5-6377832-A-G 4.029550e-03 │ │
│ │ 6377832 │ │
│ │ 195 12 54596827 54596827 12-54596827-C-T 5.736650e-09 │ │
│ │ 54596827 │ │
│ │ │ │
│ │ [196 rows x 6 columns] │ │
│ │ loci_path = '/broad/sankaranlab/arora/mpn/COJO/for_easyfinemap/cojo.5e-8.jma.hg19.… │ │
│ │ max_causal = 1 │ │
│ │ methods = [<FinemapMethod.all: 'all'>] │ │
│ │ outfile = '/broad/sankaranlab/arora/mpn/COJO/for_easyfinemap/cojo.5e-8.jma.hg19.… │ │
│ │ prior_file = None │ │
│ │ sample_size = 1000 │ │
│ │ sumstats_path = '/broad/sankaranlab/arora/mpn/GWAMA_METAL_combined_GWAMA-SS_easyfinema… │ │
│ │ threads = 1 │ │
│ │ use_ref_EAF = True │ │
│ │ var_prior = 0.2 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/unix/arora/.conda/envs/easyfinemap-dev/lib/python3.11/site-packages/easyfinemap/easyfinema │
│ p.py:770 in finemap_all_loci │
│ │
│ 767 │ │ │ credible_method = methods[0] │
│ 768 │ │ kwargs_list = [] │
│ 769 │ │ for chrom, start, end, lead_snp in loci[[ColName.CHR, ColName.START, ColName.END │
│ ❱ 770 │ │ │ locus_sumstats = sg.export_sumstats(sumstats, chrom, start, end) │
│ 771 │ │ │ locus_sumstats = sg.make_SNPID_unique(locus_sumstats, ColName.CHR, ColName.B │
│ 772 │ │ │ kwargs = { │
│ 773 │ │ │ │ "sumstats": locus_sumstats, │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ chrom = 5 │ │
│ │ cond_snps_wind_kb = 10000 │ │
│ │ conditional = False │ │
│ │ credible_method = 'susie' │ │
│ │ credible_threshold = 0.95 │ │
│ │ end = 11419701 │ │
│ │ kwargs_list = [] │ │
│ │ ldref = '/broad/sankaranlab/arora/tools/ldsc/resources/1000G_Phase3_plinkfiles… │ │
│ │ lead_snp = '5-11419701.0-C-A' │ │
│ │ lead_snps = │ │ │ │ SNPID CHR BP rsID EA NEA EAF │ │
│ │ MAF BETA SE P │ │
│ │ 0 5-11419701-C-A 5 11419701 rs10053037 A C 0.022854 │ │
│ │ 0.977146 -0.100211 0.078380 2.010630e-01 │ │
│ │ 1 5-17206608-G-A 5 17206608 rs10076008 A G 0.237728 │ │
│ │ 0.762272 -0.072253 0.026073 5.585340e-03 │ │
│ │ 2 9-4145624-T-C 9 4145624 rs10117067 C T 0.650910 │ │
│ │ 0.349090 -0.006423 0.022985 7.799050e-01 │ │
│ │ 3 9-13379241-T-C 9 13379241 rs10118696 C T 0.016663 │ │
│ │ 0.983337 -0.067668 0.098238 4.909380e-01 │ │
│ │ 4 5-11081654-T-C 5 11081654 rs1015622 C T 0.597973 │ │
│ │ 0.402027 0.035629 0.022494 1.132090e-01 │ │
│ │ .. ... ... ... ... .. .. ... │ │
│ │ ... ... ... ... │ │
│ │ 191 16-33432030-G-T 16 33432030 rs8053513 T G 0.428571 │ │
│ │ 0.571429 -0.039137 0.038911 3.145080e-01 │ │
│ │ 192 9-1053465-A-C 9 1053465 rs869939 C A 0.051109 │ │
│ │ 0.948891 0.071603 0.050573 1.568240e-01 │ │
│ │ 193 9-3390735-C-T 9 3390735 rs9298953 T C 0.947515 │ │
│ │ 0.052485 -0.097088 0.059911 1.051170e-01 │ │
│ │ 194 5-6377832-A-G 5 6377832 rs9313147 G A 0.480425 │ │
│ │ 0.519575 0.063812 0.022189 4.029550e-03 │ │
│ │ 195 12-54596827-C-T 12 54596827 rs9739733 T C 0.099889 │ │
│ │ 0.900111 0.223902 0.038443 5.736650e-09 │ │
│ │ │ │
│ │ [196 rows x 11 columns] │ │
│ │ loci = │ CHR START END LEAD_SNP LEAD_SNP_P │ │
│ │ LEAD_SNP_BP │ │
│ │ 0 5 11419701 11419701 5-11419701-C-A 2.010630e-01 │ │
│ │ 11419701 │ │
│ │ 1 5 17206608 17206608 5-17206608-G-A 5.585340e-03 │ │
│ │ 17206608 │ │
│ │ 2 9 4145624 4145624 9-4145624-T-C 7.799050e-01 │ │
│ │ 4145624 │ │
│ │ 3 9 13379241 13379241 9-13379241-T-C 4.909380e-01 │ │
│ │ 13379241 │ │
│ │ 4 5 11081654 11081654 5-11081654-T-C 1.132090e-01 │ │
│ │ 11081654 │ │
│ │ .. ... ... ... ... ... │ │
│ │ ... │ │
│ │ 191 16 33432030 33432030 16-33432030-G-T 3.145080e-01 │ │
│ │ 33432030 │ │
│ │ 192 9 1053465 1053465 9-1053465-A-C 1.568240e-01 │ │
│ │ 1053465 │ │
│ │ 193 9 3390735 3390735 9-3390735-C-T 1.051170e-01 │ │
│ │ 3390735 │ │
│ │ 194 5 6377832 6377832 5-6377832-A-G 4.029550e-03 │ │
│ │ 6377832 │ │
│ │ 195 12 54596827 54596827 12-54596827-C-T 5.736650e-09 │ │
│ │ 54596827 │ │
│ │ │ │
│ │ [196 rows x 6 columns] │ │
│ │ max_causal = 1 │ │
│ │ methods = [<FinemapMethod.all: 'all'>] │ │
│ │ outfile = '/broad/sankaranlab/arora/mpn/COJO/for_easyfinemap/cojo.5e-8.jma.hg19.… │ │
│ │ prior_file = None │ │
│ │ sample_size = 1000 │ │
│ │ self = <easyfinemap.easyfinemap.EasyFinemap object at 0x2ac0c19ebb50> │ │
│ │ start = 11419701 │ │
│ │ sumstats = '/broad/sankaranlab/arora/mpn/GWAMA_METAL_combined_GWAMA-SS_easyfinema… │ │
│ │ threads = 1 │ │
│ │ use_ref_EAF = True │ │
│ │ var_prior = 0.2 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/unix/arora/.conda/envs/easyfinemap-dev/lib/python3.11/site-packages/smunger/io.py:144 in │
│ export_sumstats │
│ │
│ 141 │ │ else: │
│ 142 │ │ │ tb = tabix.open(filename) │
│ 143 │ │ │ indf = pd.DataFrame(columns=ColName.OUTCOLS, data=tb.query(str(chrom), start │
│ ❱ 144 │ │ │ indf = munge(indf) │
│ 145 │ else: │
│ 146 │ │ logger.info(f'Loading summary statistics from {filename}') │
│ 147 │ │ indf = load_sumstats(filename) │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ bgzipped = True │ │
│ │ chrom = 5 │ │
│ │ end = 11419701 │ │
│ │ filename = '/broad/sankaranlab/arora/mpn/GWAMA_METAL_combined_GWAMA-SS_easyfinemap_hg… │ │
│ │ indf = Empty DataFrame │ │
│ │ Columns: [CHR, BP, rsID, EA, NEA, EAF, MAF, BETA, SE, P] │ │
│ │ Index: [] │ │
│ │ out_filename = None │ │
│ │ rename_headers = None │ │
│ │ start = 11419701 │ │
│ │ tb = <tabix │ │
│ │ fn="/broad/sankaranlab/arora/mpn/GWAMA_METAL_combined_GWAMA-SS_easyfinemap… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/unix/arora/.conda/envs/easyfinemap-dev/lib/python3.11/site-packages/smunger/smunger.py:134 │
│ in munge │
│ │
│ 131 │ │ after_n = outdf.shape[0] │
│ 132 │ │ logger.debug(f"Remove {pre_n - after_n} duplicated SNPs.") │
│ 133 │ else: │
│ ❱ 134 │ │ raise ValueError("Missing CHR, BP, EA or NEA column.") │
│ 135 │ │
│ 136 │ # outdf = munge_rsid(outdf) │
│ 137 │ if ColName.BETA in outdf.columns and ColName.SE in outdf.columns: │
│ │
│ ╭───────────────────────────── locals ─────────────────────────────╮ │
│ │ df = Empty DataFrame │ │
│ │ Columns: [CHR, BP, rsID, EA, NEA, EAF, MAF, BETA, SE, P] │ │
│ │ Index: [] │ │
│ │ outdf = Empty DataFrame │ │
│ │ Columns: [] │ │
│ │ Index: [] │ │
│ ╰──────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Missing CHR, BP, EA or NEA column.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.