Before executing this pipeline, you need to create a comma-separated file to specify the sample name and fastq file path (Sample ID, Fastq1, Fastq2). Headers are unnecessary. An example is /Variant_Discovery/cohort_info.txt
. You also need a config file to specify path of tools and data, like /Variant_Discovery/myConfigFile.config
sh /Variant_Discovery/run_00_fastqc.sh myConfigFile.config
In this step, sentieon is used to perform sequence alignment and variant detection, which is a commercial software used as an alternative to the bwa+gatk process.
sh /Variant_Discovery/run_01_sentieon.sh myConfigFile.config
sh /Variant_Discovery/run_02_joint_calling.sh myConfigFile.config
sh /Variant_Discovery/run_03_QC.sh myConfigFile.config
sh /Variant_Discovery/run_04_Annotation.sh myConfigFile.config
Rare variants often lack statistical validity, so we recommend performing a cascade-based filtering strategy. In this pipeline, we will filter to keep rare and likely deleterious variants based on annotation results. However, the next analyses are personalized, and examining loci in known disease-causative genes may be a good start.
python ./Variant_Priorization/Main.py