statgen-mi

Brief Summary

multiple imputation for association studies

Authors

Lightning Auriga (@lightning-auriga)

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

Clone this repository to your local system, into the place where you want to perform the data analysis.

    git clone https://github.com/lightning-auriga/statgen-mi.git

Step 2: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and manifest.tsv to specify your sample setup.

Configuration settings

The following settings are available in primary user configuration under config/config.yaml:

manifest: location of primary run manifest file; defaults to config/manifest.tsv
tools: configuration options for association tools supported by the workflow
- bcftools: configuration options specific to bcftools
  - executable: command to launch bcftools (see note below)
  - plugin_path: path to plugin libraries for this version of bcftools
  - note: this workflow uses bcftools +setGT to randomize genotypes from imputed probabilities. this functionality is not actually present in +setGT, but has been modded in; see this repo for the code. eventually, this will hopefully get migrated somewhere more useful like conda, but a local build suffices for this early stage. recommend placing the bcftools repo at ../bcftools relative to this workflow
- plink2: configuration options specific to plink2 --glm methods
  - executable: command to launch plink2. if using conda, this should remain default plink2
  - maxthreads: maximum number of threads to deploy in a plink2 task
  - maxmem: maximum RAM (in MB) supplied to a plink2 task
  - mi_draws: number of multiple imputation simulated sets to run for plink2 MI runs
imputed_datasets: user-defined sets of imputed data that can be selected for analysis
- each tag under imputed_datasets should be unique, and can be used to refer to the dataset in the manifest
- each tag should contain under it:
  - type: descriptor of imputed file type. currently the only accepted value is minimac4
  - filename: full path to and name of imputed data file(s). for minimac4: the dose.vcf.gz file(s). if multiple paths are specified in an array, the files will each be processed in turn and concatenated (in order) after run completion
regression_models: user-defined sets of phenotypes and covariates that can be selected for analysis
- each tag under regression_models should be unique, and can be used to refer to the model in the manifest
- each tag should contain under it:
  - filename: full path to and name of plink-format phenotype file containing any relevant variables. other variables can also be present
  - phenotype: primary outcome for this regression model, as the corresponding header entry in the phenotype file
  - covariates: (optional) list of covariates for this regression model, as the corresponding header entry or entries in the phenotype file
  - model: descriptor of association type. currently recognized options are linear or logistic
  - vif: (optional) for tools that support this, primarily plink: variance inflation factor cap above which a model is suppressed
queue: user-defined configuration data for compute queue
- small_partition: slurm partition (or equivalent for other cluster profiles) for jobs with the following restrictions:
  - max RAM will never exceed 3500M
  - max time will be less than 10 minutes
  - this is exposed to save money, but can just be set to the same value as the below partition if desired
- large_partition: slurm partition (or equivalent for other cluster profiles) for jobs using maximum per-tool analysis settings, with the following additional restrictions:
  - at least 8000M RAM should be available for a task
  - jobs on the partition should be permitted to run at least four hours before being killed

Run manifest

Each desired MI run should be configured in a row of the run manifest, by default at config/manifest.tsv. The following entries are required for each run:

analysis: unique identifier for this particular run
imputed_dataset: tag for desired imputed dataset to use, as enumerated in config/config.yaml
tool: supported association tool for analysis
regression_model: tag for desired regression model to use, as enumerated in config/config.yaml

Step 3: Install Snakemake

Install Snakemake using conda:

conda create -c bioconda -c conda-forge -n snakemake snakemake

For installation details, see the instructions in the Snakemake documentation.

Step 4: Execute workflow

Activate the conda environment:

conda activate snakemake

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --profile /path/to/slurm-profile --jobs 100

See the Snakemake documentation for further details.

Step 5: Investigate results

More information will be added here as it becomes available. For now, draft results are populated under results/{analysis}/{imputed_dataset}/{tool}/{regression_model}

Step 6: Commit changes

Whenever you change something, don't forget to commit the changes back to your github copy of the repository:

git commit -a
git push

Testing

Testing for embedded snakemake python scripts is in workflow/scripts/tests and handled with pytest. snakemake_unit_tests integration TBD.

Version History

see the changelog for details.
note that this was originally written under the name "Cameron Palmer"

lightning-auriga / statgen-mi Goto Github PK

statgen-mi's Introduction

statgen-mi

Brief Summary

Authors

Usage

Step 1: Obtain a copy of this workflow

Step 2: Configure workflow

Configuration settings

Run manifest

Step 3: Install Snakemake

Step 4: Execute workflow

Step 5: Investigate results

Step 6: Commit changes

Testing

Version History

statgen-mi's People

Contributors

Watchers

statgen-mi's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs