GithubHelp home page GithubHelp logo

ancestor's Introduction

Ancestor: Simple ancestry analysis

image

image

Ancestor is a very simple library for doing ancestry analysis of a genotype dataset (i.e. 23andMe)

Users can run Ancestor as either a standalone command line tool or import it into their python library

Requirements ***********

The ancestry relies on the 1000 Genomes dataset. The dataset has to be provided as an HDF5 file and be in a specific format

To play around pcs files, a test genotype and the weights in HDF5 (calculated based on the above 1000 genomes dataset) format can be found in the https://github.com/TheHonestGene/ancestor/tree/master/tests/data folder.

How-To

There are 2 steps to finish in the ancestry pipeline. Some of the steps only have to be done once or once for each genotyping plattform and dataset respectively Each step maps to a subcommand of the command line script. To get information about the various subcommands the user can run:

ancestor -h

Step 1 (Optional) - Convert weights file from CSV format to HDF5 format =============================================================== This step is not required but can speed up the rest of the pipeline a bit. It also only have to be done once.

ancestor convert weights_file weights_file.hdf5

Step 2 - Calculating PC projections and admixture decomposition information for a reference panel =============================================================== For the admixture analysis and membership testing of an individual genotype in a specific population, the PC projections and admixture decomposition for a reference genotype panel have to be calculated. This has to be done once per genotyping platform/version and weights file. It is possible to use any reference genotype panel but the most comprehensive one is the 1000 genomes reference panel.

ancestor prepare 1001genomes.hdf5 weights.hdf5 1000_ref_pcs_file.hdf5 --ntmap 23andme_v4_nt_map.pickled

The --ntmap argument specifies the nucleotide map which is specific to the genotyping platform/version. This file can be created with the imputor library

Step 3 - Membership test and admixture analysis of individual genotype & plotting =============================================================== To calculate the PC projections and admixture decomposition of an individual genotype and optionally test membership and plot the PCs the individual imputed (using the imputor library) genotype as well as the PCs file for the reference genotype panel that was generated in Step 2 and the weights file have to be provided.

ancestor pcs genome_imputed.hdf5 weights.hdf5 1000_ref_pcs_file.hdf5 --plot pc_plot.png --check GBR

The paramters --check and --plot are optional and used for testing membership in a population and plotting

Test

The test suite can be run with:

$ python setup.py test

Installation

Of course, the recommended installation method is pip:

$ pip install ancestor

Thank You

Thanks for checking this library out! We hope you find it useful.

Of course, there's always room for improvement. Feel free to open an issue so we can make it better.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.