GithubHelp home page GithubHelp logo

Comments (14)

omerwe avatar omerwe commented on August 31, 2024

Hi,

Can you please let me know if you manage to run the tl;dr example from the main GitHub page? If you can, we need to figure out what's the difference between my example data and the data that you're using. Can you please send me the exact command that you used to generate this output?

Thanks,

Omer

from s-pcgc.

lianyunhuang avatar lianyunhuang commented on August 31, 2024

Hey Omer,

Thanks a lot for your reply. The toy example from the main page runs nicely. How can I show you my data structure? Like .. the head of each file? Then what files do you think are necessary to be listed?

As for the command, the error comes in the step of calculating h2 and genetic correlation (between Case/Control of phenotype A2 in my case). The main codes are like this:

python $dir_softpcgc/pcgc_main.py
--annot-chr $dir_data/baselineLD.
--sync $dir_data/baselineLD.
--sumstats-chr $dir_data/Case_A2.chr,$dir_data/Control_A2.chr
--prodr2-chr $dir_data/baselineLD.goodSNPs.
--out $wdir/pcgc

Thanks!
Lianyun

from s-pcgc.

omerwe avatar omerwe commented on August 31, 2024

Hi Lianyun,

Since the example data works well, there must be something off in your input files. Do you think you could send me a small sample (just the first few lines) of each of these files, so that I'll try to figure out what's wrong? I'll also update the code to give a more meaningful error message if this happens in the future. If it's ok, please send these to [email protected]

Thanks,

Omer

from s-pcgc.

lianyunhuang avatar lianyunhuang commented on August 31, 2024

Hey Omer,

I've sent you the email. Thanks! :))

Lianyun

from s-pcgc.

omerwe avatar omerwe commented on August 31, 2024

Hi Lianyun,

Thanks for sending me the files. It looks like there's a problem in the .prodr2 files --- some of the annotations are missing from the header line (e.g. FetalDHS_Trynka). I also see some annotations that are only in the .prodr2 files (e.g. FetalDHS_TrynkaFetalDHS).

Do you have any idea how this happened? Maybe you used slightly different annotation files in different parts of the pipeline? If you're sure you haven't, can you please send me a small reproducible example that I can run from scratch (using e.g. small/fake files)?

Thanks,

Omer

from s-pcgc.

lianyunhuang avatar lianyunhuang commented on August 31, 2024

Hey Omer,

I see, that is quite interesting. I will check the whole procedure and maybe re-run it before sending you an example, which might take a while. I will let you know how it goes.

Thanks!

Best,
Lianyun

from s-pcgc.

lianyunhuang avatar lianyunhuang commented on August 31, 2024

Hey Omer,

  1. I checked the annotations, they are fine. The dimension of prodr2 file is 97*97. Most of the annotations are the same in annot file and prodr2 file except for 4 more columns in the annotation file which are CHR, BP, SNP and CM. Maybe you get the difference due to an imperfect file format that I sent.
  2. I tried to re-run step2 to generate prodr2 file on another cluster. I get a same prodr2 file as the previous one.
  3. Now i'm re-running step3 to generate sumstats files, which might take a long time.
  4. Then if I send you a small example to run, how should I subset the data to make sure it includes all necessary info?

Thanks!

Best,
Lianyun

from s-pcgc.

omerwe avatar omerwe commented on August 31, 2024

Hi Lianyun,

Thanks for the update. For my understanding, can you please say which of these annotations appeared in the original annotation files: (1) FetalDHS_Trynka; (2) FetalDHS_TrynkaFetalDHS; or (3) both?

I think the simplest possibility for you is to subset a small number of SNPs (e.g. 5000) and run the pipeline on only these SNPs? If you can reproduce the problem, I can work on files derived from these small files.

Thanks,

Omer

from s-pcgc.

lianyunhuang avatar lianyunhuang commented on August 31, 2024

Hey Omer,

I've sent you the detail of annotation as well as the data link per email, please check. Thanks!

Best,
Lianyun

from s-pcgc.

lianyunhuang avatar lianyunhuang commented on August 31, 2024

Hey Omer,

Plus, I get a fresh error in step3 just now (creating sumstats files), which is:

Traceback (most recent call last):
File "/softwares/spcgc/pcgc_sumstats_creator.py", line 590, in
sumstats_creator.compute_all_sumstats(args.chunk_size)
File "/softwares/spcgc/pcgc_sumstats_creator.py", line 271, in compute_all_sumstats
self.set_locus(snp1, snp2)
File "/softwares/spcgc/pcgc_sumstats_creator.py", line 318, in set_locus
snp_maf = self.mafs[snp1+j]
File "/anaconda3/envs/xyb/lib/python3.8/site-packages/pandas/core/series.py", line 821, in getitem
return self._values[key]
IndexError: index 116914 is out of bounds for axis 0 with size 116914

Best,
Lianyun

from s-pcgc.

omerwe avatar omerwe commented on August 31, 2024

Hi,

Apparently the problem was due to duplicate rsids in the input files. I modified the code to allow better handling of this situation. Can you please git pull the latest code and try again?

from s-pcgc.

lianyunhuang avatar lianyunhuang commented on August 31, 2024

Hi Omer,

Thanks a lot! I will try and let you know. :))

Lianyun

from s-pcgc.

lianyunhuang avatar lianyunhuang commented on August 31, 2024

Hi Omer,

A quick update. Seems the new codes are working well. I get the final result files regardless of a lot of Warning messages. I'm runing everything all over again on the data exluding duplicated rsids. Will let you know if there are any news.

Thanks a lot!

Lianyun

from s-pcgc.

lianyunhuang avatar lianyunhuang commented on August 31, 2024

Hi Omer,

I've finished a new run of the same data. Still get some weird results. I've email you the details. Please check.
Thanks a lot for your help!

Lianyun

from s-pcgc.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.