CWAS (Category-Wide Association Study) is a data analytic tool to perform stringent association tests to find non-coding loci associated with autism spectrum disorder (ASD). CWAS runs category-based burden tests using de novo variants from whole genome sequencing data and diverse annotation data sets.
CWAS was used in the following papers.
- An analytical framework for whole genome sequence association studies and its implications for autism spectrum disorder (Werling et al., 2018)
- Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder (An et al., 2018)
Here is the original CWAS repository: sanderslab/cwas
Users must prepare following data for CWAS because it is very essential but cannot be generated automatically. Here are details.
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 3747728 . T C . . SAMPLE=11000.p1;BATCH=P231
chr1 38338861 . C A . . SAMPLE=11000.p1;BATCH=P231
chr1 117942118 . T G . . SAMPLE=11000.p1;BATCH=P231
- The input VCF data must follow the specification of VCF.
- The INFO field must contain a sample ID of each variant with this format
SAMPLE={sample_id}
.
SAMPLE | FAMILY | PHENOTYPE |
---|---|---|
11000.p1 | 11000 | case |
11000.s1 | 11000 | ctrl |
11002.p1 | 11002 | case |
11002.s1 | 11002 | ctrl |
- CWAS requires the file like above listing sample IDs with its family IDs and phenotypes (Case=case, Control=ctrl).
- Here are details of the required format.
- Tab separated
- 3 essential columns: SAMPLE, FAMILY, and PHENOTYPE
- A value in the PHENOTYPE must be case or ctrl.
- The values in the SAMPLE must be matched with the sample IDs of variants in the input VCF file.
SAMPLE | AdjustFactor |
---|---|
11000.p1 | 0.932 |
11000.s1 | 1.082 |
11002.p1 | 0.895 |
11002.s1 | 1.113 |
- The file like above is required if you want to adjust the number of variants for each sample in CWAS.
- Here are details of the required format.
- Tab separated
- 2 essential columns: SAMPLE and AdjustFactor
- A value in the AdjustFactor must be a float.
- The values in the SAMPLE must be matched with the sample IDs of variants in the input VCF file.
You can get the examples of the above data requirements from joonan-lab/cwas-input-example
You can install those file from this repository: joonan-lab/cwas-dataset
git clone https://github.com/joonan-lab/cwas-dataset.git
Due to the sizes of BigWig files for conservation scores, you must install them manually. Please follow this instruction.
CWAS uses conda virtual environment to build environment for CWAS. Run the following statements in your shell.
# In your directory where CWAS is installed
git clone https://github.com/mwjjeong/cwas.git
cd cwas
conda env create -f environment.yml -n cwas
conda activate cwas
python setup.py install
In addition, you must install Variant Effect Predictor (VEP).
Run this command.
cwas start
This command creates CWAS workspace in your home directory. The path is $HOME/.cwas
. $HOME/.cwas/configuration.txt
has also generated.
.cwas
└── configuration.txt
Write the following information in the $HOME/.cwas/configuration.txt
.
ANNOTATION_DATA_DIR=/path/to/your/dir
GENE_MATRIX=/path/to/your/file
ANNOTATION_KEY_CONFIG=/path/to/your/file
BIGWIG_CUTOFF_CONFIG=/path/to/your/file
VEP=/path/to/your/vep
The ANNOTATION_DATA
is a directory that contains all the BED files and BigWig files from joonan-lab/cwas-dataset.
After writing the above file, run this command.
cwas configuration
Following files will be generated in your home directory.
.cwas
├── annotation-data
├── annotation_cutoff_bw.yaml
├── annotation_key_bed.yaml
├── annotation_key_bw.yaml
├── category_domain.yaml
├── configuration.txt
├── gene_matrix.txt
└── redundant_category.txt
.cwas_env
This step merges the BED files to annotate variants. Run the following command.
cwas preparation -p 4
4
is the number of worker processes. You can adjust this.
After running this, Merged BED file and its index will be generated in your CWAS workspace.
.cwas
...
├── merged_annotation.bed.gz
├── merged_annotation.bed.gz.tbi
...
This step annotate your VCF file using VEP. Run this command.
cwas annotation -v /path/to/your/vcf
Here is the result file.
.cwas
...
├── {Your VCF filename}.annotated.vcf
...
This step categorize your variants using the annotation datasets. Run this command.
cwas categorization -p 4
4
is the number of worker processes. You can adjust this.
After running this, you will get...
.cwas
...
├── {Your VCF filename}.categorization_result.txt
...
This step runs category-based burden tests using the categorization result. A type of this burden test is binomial test. Run this command.
cwas binomial_test -s /path/to/your/samples [-a /path/to/your/adj_factors]
[]
means that this is optional. If -a
option does not specified, this step will bypass the adjustment step.
After running this, you will get...
.cwas
...
├── {Your VCF filename}.burden_test.txt
...