A comprehensive pipeline for population genetic analysis containing Read mapping, Variant calling, and Population genetic analysis
-
Raw read trimming
- Trim galore!
-
Read mapping (Two options)
- BWA
- Bowtie2
-
Variant calling (Three options)
- GATK3
- GATK4
- BCFtools call
-
Postprocessing
-
11 popular Population genetic analysis
- principal component analysis (Plink 1.9)
- PCA projection analysis (Plink 2)
- Phylogenetic analysis (Snphylo)
- Treemix analysis (Treemix2)
- Population structure analysis (Structure)
- Linkage disequilibrium decay analysis (PopLDdecay)
- Selective sweep finding analysis (SweepFinder2)
- Population admixture analysis (Admixtools)
- Pairwise sequentially Markovian coalescent analysis (PSMC)
- Multiple sequentially Markovian coalescent analysis (MSMC)
- Fixation index analysis (Fst)
git clone https://github.com/jkimlab/PAPipe.git
-
You can prepare the environment in local with the commands below
cd ./Programs/ bash ./set_local_env.sh
This commands will automatically install the requirements and print the paths can be used as parameter file directly
-
Or you can use PAPipe on docker without having to prepare the environment. → How to use PAPipe on docker
-
Check out Requirements for details
Using local environment
/Path_to_PAPipe/Programs/bin/main.py -p main.param.txt -s main.sample.txt - -o OUTDIR
Using Docker
# Change directory where Dockerfile exists
cd ./Programs
# Build Docker image
docker build -t [docker image name] ./ &> log_image_build
#Run
docker run -v [Local directory containing data]:[Path of connecting directory on container] -it [docker image name]
/Path_to_PAPipe/Programs/bin/main.py -p main.param.txt -s main.sample.txt - -o OUTDIR
→ You can generate the parameter file easily at here : PAPipe Parameter genetator
→ Check out more details about parameter files : Tutorial
-
Trimmed read data
-
Trimmed read data for all samples
/Path_to_out_directory/00_ReadQC/TrimmedData/[sample]_1_val_1.fq.gz /Path_to_out_directory/00_ReadQC/TrimmedData/[sample]_2_val_2.fq.gz
-
fastQC results for all samples before and after trimming
/Path_to_out_directory/00_ReadQC/QC_Report_Before_Trimming/[population]/[sample]_1_fastqc.html /Path_to_out_directory/00_ReadQC/QC_Report_Before_Trimming/[population]/[sample]_2_fastqc.html
-
MultiQC summarized QC results for populations before and after trimming
/Path_to_out_directory/00_ReadQC/QC_Report_Before_Trimming/[population]/multiqc_report.html
-
-
Read alignment data
-
Read mapping files for all samples
/Path_to_out_directory/01_readMapping/04ReadRegrouping/[population]_[sample].addRG.marked.sort.bam
-
-
Variant call data
-
Variant call generated using all population sequencing data
/Path_to_out_directory/02_VariantCalling/VariantCalling/[].All.variant.combined.g.vcf.gz
-
-
Post-processed data
-
Variant call gone through Hapmap format conversion/Plink filtering
/Path_to_out_directory/03_Postprocessing/Hapmap/variant.combined.GT.SNP.flt.hapmap /Path_to_out_directory/03_Postprocessing/plink/[prefix].*
-
-
Population analysis
-
principal component analysis (Plink 1.9)
-
PCA results
/Path_to_out_directory/04_Population/[running datetime]/PCA/PCs.info
-
PCA plots of all available combination of two PCs
/Path_to_out_directory/04_Population/[running datetime]/PCA/all.PCA.pdf
-
-
PCA projection analysis (Plink 2)
-
PCA results
/Path_to_out_directory/04_Population/[running datetime]/PCA/PCs.info
-
-
Phylogenetic analysis (Snphylo)
-
.NEWICK formatted phylogenetic tree
/Path_to_out_directory/04_Population/[running datetime]/SNPhylo/snphylo.ml.txt
-
Visualized phylogenetic tree
/Path_to_out_directory/04_Population/[running datetime]/SNPhylo/snphylo.ml.png
-
-
Treemix analysis (Treemix2)
-
Treemix results in a single PDF file
/Path_to_out_directory/04_Population/[running datetime]/Treemix/Treemix.results.pdf
-
-
Population structure analysis (Structure)
-
STRUCTURE results per K in .PNG files and all STRUCTURE results in a single PDF file
/Path_to_out_directory/04_Population/[running datetime]/STRUCTURE/CLUMPAK/K=[n].MajorCluster.png
-
STRUCTURE results for all K in single .PDF file
/Path_to_out_directory/04_Population/[running datetime]/STRUCTURE/CLUMPAK/pipeline_summary.pdf
-
-
Linkage disequilibrium decay analysis (PopLDdecay)
-
Linkage disequilibrium decay results for each maximum distance parameter
/Path_to_out_directory/04_Population/[running datetime]/LdDecay/[maxDist]/Plot/out.pdf
-
-
Selective sweep finding analysis (SweepFinder2)
-
Selective Sweep results in point plot figures for all chromosom generated per population
/Path_to_out_directory/04_Population/[running datetime]/SweepFinder2/[population]/SweepFinderOut.pdf
-
Selective Sweep results per population and per chromosome
/Path_to_out_directory/04_Population/[running datetime]/SweepFinder2/[population]/[population].[chromosome].SF2out
-
-
Population admixture analysis (Admixtools)
-
Admixture analysis results
/Path_to_out_directory/04_Population/[running datetime]/ADMIXTOOLS/admixtools_3pop/result.out /Path_to_out_directory/04_Population/[running datetime]/ADMIXTOOLS/admixtools_4diff/result.out /Path_to_out_directory/04_Population/[running datetime]/ADMIXTOOLS/admixtools_f4stat/result.out /Path_to_out_directory/04_Population/[running datetime]/ADMIXTOOLS/admixtools_Dstat/result.out
-
-
Pairwise sequentially Markovian coalescent analysis (PSMC)
-
Effective Size plot
/Path_to_out_directory/04_Population/[running datetime]/EffectiveSize/psmc_plot.pdf
-
-
Multiple sequentially Markovian coalescent analysis (MSMC)
-
Effective Size plot
/Path_to_out_directory/04_Population/[running datetime]/MSMC/MSMC.pdf
-
-
Fixation index analysis (Fst)
-
Fixation index results visualized in manhatton plot figures
/Path_to_out_directory/04_Population/[running datetime]/Fst/[pair information]/Fst_result.pdf
-
Significant regions results of Fst analysis
/Path_to_out_directory/04_Population/[running datetime]/Fst/[comparing pair information]/[comparing pair information].sig.region.txt
-
-