GithubHelp home page GithubHelp logo

a7420174 / cwas Goto Github PK

View Code? Open in Web Editor NEW

This project forked from joonan-lab/cwas

0.0 0.0 0.0 2.34 MB

Category-wide association study (CWAS) (Werling et al., 2018; An et al., 2018)

License: MIT License

Python 100.00%

cwas's Introduction

Category-wide association study (CWAS)

CWAS CI Workflow

CWAS (Category-Wide Association Study) is a data analytic tool to perform stringent association tests to find non-coding loci associated with autism spectrum disorder (ASD). CWAS runs category-based burden tests using de novo variants from whole genome sequencing data and diverse annotation data sets.

CWAS was used in the following papers.

Here is the original CWAS repository: sanderslab/cwas

Quickstart

Data requirements

Users must prepare following data for CWAS because it is very essential but cannot be generated automatically. Here are details.

1. Input VCF data (De novo variant list)

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    3747728 .        T       C       .       .       SAMPLE=11000.p1;BATCH=P231
chr1    38338861        .       C       A       .       .       SAMPLE=11000.p1;BATCH=P231
chr1    117942118       .      T       G       .       .       SAMPLE=11000.p1;BATCH=P231
  • The input VCF data must follow the specification of VCF.
  • The INFO field must contain a sample ID of each variant with this format SAMPLE={sample_id}.

2. List of samples

SAMPLE FAMILY PHENOTYPE
11000.p1 11000 case
11000.s1 11000 ctrl
11002.p1 11002 case
11002.s1 11002 ctrl
  • CWAS requires the file like above listing sample IDs with its family IDs and phenotypes (Case=case, Control=ctrl).
  • Here are details of the required format.
    • Tab separated
    • 3 essential columns: SAMPLE, FAMILY, and PHENOTYPE
    • A value in the PHENOTYPE must be case or ctrl.
  • The values in the SAMPLE must be matched with the sample IDs of variants in the input VCF file.

3. List of adjustment factors (Optional)

SAMPLE AdjustFactor
11000.p1 0.932
11000.s1 1.082
11002.p1 0.895
11002.s1 1.113
  • The file like above is required if you want to adjust the number of variants for each sample in CWAS.
  • Here are details of the required format.
    • Tab separated
    • 2 essential columns: SAMPLE and AdjustFactor
    • A value in the AdjustFactor must be a float.
  • The values in the SAMPLE must be matched with the sample IDs of variants in the input VCF file.

You can get the examples of the above data requirements from joonan-lab/cwas-input-example

4. CWAS annotation files

You can install those file from this repository: joonan-lab/cwas-dataset

git clone https://github.com/joonan-lab/cwas-dataset.git

Due to the sizes of BigWig files for conservation scores, you must install them manually. Please follow this instruction.

Installation

CWAS uses conda virtual environment to build environment for CWAS. Run the following statements in your shell.

# In your directory where CWAS is installed
git clone https://github.com/mwjjeong/cwas.git
cd cwas
conda env create -f environment.yml -n cwas
conda activate cwas
python setup.py install

In addition, you must install Variant Effect Predictor (VEP).

CWAS Execution

1. Start

Run this command.

cwas start

This command creates CWAS workspace in your home directory. The path is $HOME/.cwas. $HOME/.cwas/configuration.txt has also generated.

.cwas
└── configuration.txt

2. Configuration

Write the following information in the $HOME/.cwas/configuration.txt.

ANNOTATION_DATA_DIR=/path/to/your/dir
GENE_MATRIX=/path/to/your/file
ANNOTATION_KEY_CONFIG=/path/to/your/file
BIGWIG_CUTOFF_CONFIG=/path/to/your/file
VEP=/path/to/your/vep

The ANNOTATION_DATA is a directory that contains all the BED files and BigWig files from joonan-lab/cwas-dataset.

After writing the above file, run this command.

cwas configuration

Following files will be generated in your home directory.

.cwas
├── annotation-data
├── annotation_cutoff_bw.yaml
├── annotation_key_bed.yaml
├── annotation_key_bw.yaml
├── category_domain.yaml
├── configuration.txt
├── gene_matrix.txt
└── redundant_category.txt
.cwas_env

3. Preparation

This step merges the BED files to annotate variants. Run the following command.

cwas preparation -p 4

4 is the number of worker processes. You can adjust this.

After running this, Merged BED file and its index will be generated in your CWAS workspace.

.cwas
...
├── merged_annotation.bed.gz
├── merged_annotation.bed.gz.tbi
...

4. Annotation

This step annotate your VCF file using VEP. Run this command.

cwas annotation -v /path/to/your/vcf

Here is the result file.

.cwas
...
├── {Your VCF filename}.annotated.vcf
...

5. Categorization

This step categorize your variants using the annotation datasets. Run this command.

cwas categorization -p 4

4 is the number of worker processes. You can adjust this.

After running this, you will get...

.cwas
...
├── {Your VCF filename}.categorization_result.txt
...

6. Burden Test (Binomial Test)

This step runs category-based burden tests using the categorization result. A type of this burden test is binomial test. Run this command.

cwas binomial_test -s /path/to/your/samples [-a /path/to/your/adj_factors]

[] means that this is optional. If -a option does not specified, this step will bypass the adjustment step.

After running this, you will get...

.cwas
...
├── {Your VCF filename}.burden_test.txt
...

cwas's People

Contributors

mwjin avatar randrover avatar a7420174 avatar joonan30 avatar stephansanders avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.