GithubHelp home page GithubHelp logo

ghm17 / carrier-stat Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 2.25 MB

Carrier statistic is a statistical framework to prioritize disease-related rare variants by integrating gene expression data

License: MIT License

R 100.00%

carrier-stat's Introduction

Carrier statistic

Carrier statistic is a statistical framework to prioritize disease-related rare variants by integrating gene expression data.

Step1. Computing carrier statistic

Rscript step1_carrier_stat.R \
--genotype=GENOTYPE_PREFIX \
--variants=VARIANTS_PREFIX \
--rna=RNA_PREFIX \
--gene=GENE_FILE \
--variants_gene_pair=VARIANTS_GENE_PAIR_FILE \
--outfile=OUTFILE_PREFIX

where the inputs are

  • GENOTYPE_PREFIX (required): The prefix for genotype files. This prefix should correspond to GENOTYPE_PREFIX_case.txt for case group and GENOTYPE_PREFIX_ctrl.txt for control group.
  • VARIANTS_PREFIX (required): The prefix for variant information files accompanying the genotype files. This prefix should correspond to VARIANTS_PREFIX_case.txt for case group and VARIANTS_PREFIX_ctrl.txt for control group.
  • RNA_PREFIX (required): The prefix for gene expression data files. This prefix should correspond to RNA_PREFIX_case.txt for case group and RNA_PREFIX_ctrl.txt for control group.
  • GENE_FILE (required): The full path to the gene information file accompanying the gene expression data files.
  • VARIANTS_GENE_PAIR_FILE (required): The full path to the variant-gene pair information file.
  • OUTFILE_PREFIX (required): The prefix for output carrier statistic files. Two files will be generated, OUTFILE_PREFIX_case.txt for case group and OUTFILE_PREFIX_ctrl.txt for control group.

A concrete example

cd carrier-stat

Rscript ./step1_carrier_stat.R \
--genotype=./example/genotype \
--variants=./example/variants \
--rna=./example/rna \
--gene=./example/gene.txt \
--variants_gene_pair=./example/variants_gene_pair.txt \
--outfile=./example/carrier_stat

Input format

Genotype file (GENOTYPE_PREFIX_case.txt and GENOTYPE_PREFIX_ctrl.txt): Allelic dosage file (number of ALT alleles, only 0/1/2 are supported) without a header line, one row per sample and one column per variant. The number of columns (i.e., the number of variants) must be equal to the number of rows in the variant information file. The number of rows (i.e., the number of samples) must be equal to the number of columns in the gene expression data file.

Variant information file (VARIANTS_PREFIX_case.txt and VARIANTS_PREFIX_case.txt): A text file with a header line (CHROM: chromosome; POS: position; ID: variant name; REF: reference allele; ALT: alternative allele). The number of rows (i.e., the number of variants) must be equal to the number of columns in the genotype file.

Gene expression data file (RNA_PREFIX_case.txt and RNA_PREFIX_ctrl.txt): RNA reads count file without a header line, one row per gene and one column per sample. The number of columns (i.e., the number of samples) must be equal to the number of rows in the genotype file. The number of rows (i.e., the number of genes) must be equal to the number of rows in the gene information file.

Gene information file (GENE_FILE): A text file with a header line (CHROM: chromosome; MINBP: start position of the gene; MAXBP: end position of the gene; GENE: gene name). The number of rows (i.e., the number of genes) must be equal to the number of rows in the gene expression data file.

Variant-gene pair information file (VARIANTS_GENE_PAIR_FILE): A text file with a header line (CHROM: chromosome; POS: position; ID: variant name; REF: reference allele; ALT: alternative allele; GENE: gene name).

Output format

Output carrier statistic file (OUTFILE_PREFIX_case.txt and OUTFILE_PREFIX_ctrl.txt): A text file with a header line (CHROM: chromosome; POS: position; ID: variant name; REF: reference allele; ALT: alternative allele; GENE: gene name; n_carrier: number of samples carrying the variant; carrier_stat: carrier statistic value).

Step2. Prioritize rare variant-gene pairs with extreme carrier statistic

Rscript step2_analysis.R \
--carrier_stat=CARRIER_STAT_PREFIX \
--outfile=OUTFILE_PREFIX \
--fdr_thre=FDR_THRESHOLD

where the inputs are

  • CARRIER_STAT_PREFIX (required): The prefix for carrier statistic files output from Step 1. This prefix should correspond to CARRIER_STAT_PREFIX_case.txt for case group and CARRIER_STAT_PREFIX_ctrl.txt for control group.
  • OUTFILE_PREFIX (required): The prefix for files containing significant variant-gene pairs. Two files will be generated, OUTFILE_PREFIX_downregulated_fdr_FDR_THRESHOLD.txt for significant variant-gene pairs with negative carrier statistics and OUTFILE_PREFIX_upregulated_fdr_FDR_THRESHOLD.txt for significant variant-gene pairs with positive carrier statistics.
  • FDR_THRESHOLD (optional): FDR cutoff. Default is 0.05.

A concrete example

cd carrier-stat

Rscript ./step2_analysis.R \
--carrier_stat=./example/carrier_stat \
--outfile=./example/carrier_stat \
--fdr_thre=0.2

Output format

Output files containing significant variant-gene pairs (OUTFILE_PREFIX_downregulated_fdr_FDR_THRESHOLD.txt and OUTFILE_PREFIX_upregulated_fdr_FDR_THRESHOLD.txt): A text file with a header line (CHROM: chromosome; POS: position; ID: variant name; REF: reference allele; ALT: alternative allele; GENE: gene name; n_carrier: number of samples carrying the variant; carrier_stat: carrier statistic value; fdr: FDR).

carrier-stat's People

Contributors

ghm17 avatar

Watchers

 avatar

Forkers

suwonglab

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.