GithubHelp home page GithubHelp logo

diazale / 1kgp_dimred Goto Github PK

View Code? Open in Web Editor NEW
18.0 1.0 4.0 485 KB

Interactive demonstration of how to use PCA, t-SNE, and UMAP on genotype data from the Thousand Genome Project.

Jupyter Notebook 1.01% HTML 98.99%

1kgp_dimred's Introduction

1KGP_dimred

Interactive demonstration of how to use PCA, t-SNE, and UMAP on genotype data from the Thousand Genome Project.

For the related manuscript, see Diaz-Papkovich, Alex, Luke Anderson-Trocmé, Chief Ben-Eghan, and Simon Gravel. "UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts." PLoS genetics 15.11 (2019): e1008432. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008432

1kgp_dimred's People

Contributors

diazale avatar

Stargazers

 avatar Wasu (Top) Piriyakulkij avatar Michael Kaufman avatar James Melville avatar Keren Xu avatar Heather avatar Darek Kedra avatar Braza Faouzi avatar Parth Sanghavi avatar Elizabeth Atkinson avatar Hie Lim Kim avatar Shiya Song avatar Lizhong Wang avatar  avatar Kevin Arvai avatar Sam Vohr avatar Wendy Wong avatar Kamil Slowikowski avatar

Watchers

Onuralp avatar

1kgp_dimred's Issues

Number of samples in 1000 genomes

Your preprint says

The 1KGP contains genotype data of 3,450 individuals from 26 relatively distinct labeled populations

and when I run your script that's what I get too. At the same time, https://www.nature.com/articles/nature15393 says

Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations

Why 2504 vs 3450?

Adding a personal genome omni-chip individual

Hi,

I read the tweet about 1KGP_dimred https://twitter.com/adp_diaz/status/1044235174718951425 and I would like to know how difficult would it be to do the following:

(1) Take the omni file, since the personal genome I want to add is from MyHeritageDNA which is Illumina's 700K Omni chip. I presume it's the file below:
http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/hd_genotype_chip/ALL.chip.omni_broad_sanger_combined.20140818.snps.genotypes.vcf.gz
Would that file load if substituted to the one you point at in the ipynb?
(2) Add more individuals to the .vcf.gz file:
Given the format needed for the ipynb, what would be the best way to add extra individuals to the file, re-index, append the file to the .panel and tabular info files, and run the ipynb again?

What are the parameters for Figure S2

I tried to reproduce the results in Figure S2 (# princial components = 15). However, the results are different from what are shown in the supporting materials. I was just wondering if you could provide me the specific parameters for UMAP.

Below is my experiment configuration:
I first applied PCA to the transposed_genotype_matrix to get the proj_pca. Then I applied UMAP to proj_pca[:,:15] with n_components=2, n_neighbors=15, min_dist=0.5.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.