GithubHelp home page GithubHelp logo

slowkow / proxysnps Goto Github PK

View Code? Open in Web Editor NEW
28.0 5.0 9.0 191 KB

:bookmark: Get SNP proxies from the 1000 Genomes Project.

License: MIT License

R 98.98% Shell 1.02%
bioinformatics linkage-disequilibrium snps statistics rstats

proxysnps's Introduction

proxysnps

proxysnps is an R package that implements functions to get proxy SNPs in linkage disequilibrium (LD) with a SNP in the 1000 Genomes Project.

library(proxysnps)
d <- get_proxies(query = "rs42")
plot(d$POS, d$R.squared, main="rs42", xlab="Position", ylab=bquote("R"^2))

rs42

Usage

See the vignette for more usage examples.

Installation

install.packages("devtools")
devtools::install_github("slowkow/proxysnps")

Data

This package provides easy access to 1000 Genomes Project VCF files that have been filtered by Brian Browning, available here.

Contributing

Please submit an issue to report bugs or ask questions.

Please contribute bug fixes or new features with a pull request to this repository.

Related work

HaploReg

HaploReg is a tool for exploring annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci. Using LD information from the 1000 Genomes Project, linked SNPs and small indels can be visualized along with chromatin state and protein binding annotation from the Roadmap Epigenomics and ENCODE projects, sequence conservation across mammals, the effect of SNPs on regulatory motifs, and the effect of SNPs on expression from eQTL studies.

LDheatmap

Produces a graphical display, as a heat map, of measures of pairwise linkage disequilibria between SNPs. Users may optionally include the physical locations or genetic map distances of each SNP on the plot.

Also see the Statistical Genetics CRAN Task View for additional R packages.

LocusZoom

LocusZoom is a tool to plot regional association results from genome-wide association scans or candidate gene studies.

SNAP

SNAP is a computer program and web-based service for the rapid retrieval of linkage disequilibrium proxy SNP results given input of one or more query SNPs and based on empirical observations from the International HapMap Project and the 1000 Genomes Project.

SNPsnap

The SNPsnap Web server enables SNP-based enrichment analysis by providing matched sets of SNPs that can be used to calibrate background expectations. Specifically, SNPsnap efficiently identifies sets of randomly drawn SNPs that are matched to a set of query SNPs based on allele frequency, number of SNPs in LD, distance to nearest gene and gene density.

Tagger

Tagger is a tool for the selection and evaluation of tag SNPs from genotype data such as that from the International HapMap Project. It combines the simplicity of pairwise tagging methods with the efficiency benefits of multimarker haplotype approaches.

proxysnps's People

Contributors

slowkow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

proxysnps's Issues

Feature request: signed r

Hi,

Would it be possible to include the signed r's, along with the r-squared? It would facilitate plotting LD heat maps.

Thanks!

Sander

Feature request: add in alternative references

Hi,

It would be great to have (detailed) description of what is needed to get this package to work with alternative references. For instance, in-house references, or HRC, or UKBiobank, etc.

Thanks and best,

Sander

Memory usage

Hi,

I would need to find proxies (r2>0.8) for about 8000 snps from about 300 genomic regions (region defined so that the distance between consecutive snps is less than 1 000 000 bases). Using get_proxies per snp in a for-loop or with apply requires horribly lot of memory (10Gb is reached with about 50 snps). Apparently get_proxies calls get_vcf, which downloads huge datafiles from web.

Is there any way to free memory after each snps? Or should I download all required data in advance and store it locally? How would I then run get_proxies?

Or would you suggest a better way of finding the proxies?
SNAP proxy search has only 1000 genomes pilot.
LDlink does not appear suitable for this many snps.
Both have restrictions for the search region width.

Best wishes

/tm

missing value

Hello,

I have this error when checking proxies for 1 SNP:

Error in if (!is.numeric(start) || start < 1) { :
missing value where TRUE/FALSE needed

Any suggestion?

Thanks

Selecting correct query info when multiple hits returned

In main function:

v <- myvariant::queryVariant(query)
if (v$total > 0) {
  chrom <- v$hits$dbsnp$chrom[1]
  pos <- v$hits$dbsnp$hg19$start[1]
}

query SNP 'rs6025' returns 4 results, the first of which is "NA" in all fields, and the third entry which is correct. Instead of selecting 'v$hits$dpsnp$chrom[1] it would probably be better to select first entry that is not "NA", because otherwise program is unable to return correct information.

Source files lost?

2018-04-25 14.15 EEST
running:
d <- get_proxies(chrom = "12", pos = 583090, window_size = 1e5, pop = "AFR")
gives error message:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input

get_proxies() worked all right about 24h earlier.

error : no package called 'myvariant'

Hi,

I'm getting the following error, using proxysnps in R studio:

d <- get_proxies(query="rs17216707")
#> Error in loadNamespace(name) : there is no package called ‘myvariant’

Any thoughts?

Thank you!

Cassy

INDELs are not queried properly

Hi,

I noticed that when I query an indel (e.g. get_proxies(query = "rs148722713")), get_proxies() identifies a wrong SNP (rs1467204) to query and still returns a set of variants and their LD values associated with the wrong variant.

Thanks,
Yang

Window size

Hi,

Just for clarification: is the window size read like ± xxxx kb around the index variant?

Thanks!

Sander

Could not resolve host: tabix.iobio.io

Hi,

When I ran your example, there is an error message. Please see below:

library(proxysnps)
d <- get_proxies(query = "rs42")
Error in function (type, msg, asError = TRUE)  : 
  Could not resolve host: tabix.iobio.io

Error with tabix.iobio.io

Hi,

I am getting this error...

Could not resolve host: tabix.iobio.io

... when running this code:

d <- get_proxies(chrom = "12", pos = 583090, window_size = 1e5, pop = "AFR")

How can I fix this?

Best,

Sander

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.