GithubHelp home page GithubHelp logo

roshchupkin / pybgen Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 10 KB

Light and fast pure python parser for bgen format (v.1.1; 1.2; 1.3)

Python 100.00%
python bgen gwas ukbb ukbiobank parser bioinformatics

pybgen's Introduction

Small python library to read bgen format.

This parser is a part of the HASE framework for fast HD GWAS analysis, and provides just basic API for bgen data files reading and manipulation. Below you can find several examples how you can get data in python format for further analysis.

Support

  • bgen v1.1; 1.2; 1.3
  • Layout 1,2

Fits for UK Biobank data

Want to convert to more efficient data format? Check HASE

Does not support

  • Ploidy > 2
  • Number of allele > 2
  • Phase data

coming soon ...

Installation

  1. git clone https://github.com/roshchupkin/pybgen.git
  2. Add path to the cloned repository into your python search:
    • export PYTHONPATH=$PYTHONPATH:{path to pybgen folder}
    • Or inside python:
>> import sys
>> sys.path.append(path to pybgen folder)
>> import pybgen

Requirements

Python library:

  1. numpy
  2. bitarray
  3. zstd (you need this for bgen v1.2). Many thanks!!! to Sergey for simple python zstd library

Usage

You do not need to have bgen.bgi files. This parser works with pure bgen files and can make its own indices small files.

Overview

>> import pybgen
>> B_test=pybgen.Bgen('example.bgen')
File zise is 665108 bytes
There are 199 variants
There are 500 individuals
Genotype block layout 2

>>  B_test.info()
Name:example.bgen; N samples:500; N probes:199; Compression:zlib; Layout:2

>> B_test.get_indices()
>> B_test.probes_info.keys()[:10]
[u'RSID_2',
 u'RSID_3',
 u'RSID_4',
 u'RSID_5',
 u'RSID_6',
 u'RSID_7',
 u'RSID_8',
 u'RSID_9',
 u'RSID_10',
 u'RSID_11']

>> probe=B_test.read_probe(rsid='RSID_2')
>> probe.info()
Iden: SNPID_2, RSID: RSID_2, CHR: 1, POS: 2000, Alleles: OrderedDict([(1, [u'A']), (2, [u'G'])])

>> probe.get_genotypes(genotypes=True)
>> print probe.prob[:10]
[ 0.          0.          0.02780236  0.00863674  0.01736504  0.04968414
  0.02487179  0.93283081  0.03460688  0.01919559]

>> print probe.genotypes[:10]
[ 0.          0.06424146  0.08441421  0.9825744   0.08840936  0.14108266
  1.07330097  0.05413817  0.10858148  0.12307751]

Make indices file

>> B_test.get_indices()
>> B_test.save_indices('/home/username/bgen/')

This will save indices files 'example.bgen_ind.npy' to chosen folder. Next time you can directly load this info

>> B_test=pybgen.Bgen('example.bgen')
>> B_test.load_indices(/home/username/bgen/example.bgen_ind.npy)

Actually the operation get_indices() does not take a lot of time, but for very intense use of the same bgen files can be quite useful.

Contacts

If you have any questions/suggestions/comments or problems do not hesitate to contact me!

pybgen's People

Contributors

roshchupkin avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.