The home page of "Polygenic Basis Seedless Grapes" can be accessed by clicking here.
Input Data
- WGS data:
.R1.fastq.gz
and.R2.fastq.gz
- VCF File
Dependency The detials of all tools can be available in their offical website as followed and most of them can quickly install using Anaconda:
You could be checked the raw VCF file obtained from this repository (Population Genetics Analysis Pipeline).
If needed, you can separate the VCF file into SNPs and InDels.
vcf="all_miss0.8GQ20maf0.0001.id.vcf"
#vcftools --vcf ${vcf}.vcf --remove-indels --recode --out ${vcf}.rmindel
#vcftools --vcf ${vcf}.vcf --keep-only-indels --recode --out ${vcf}.rmsnp
Here, we only retain samples with phenotype values.
plink --vcf ./all_miss0.8GQ20maf0.0001.id.vcf \
--vcf-half-call m \
--geno 0.2 \
--maf 0.05 \
--hwe 1e-5 \
--keep keep444.txt \
--recode vcf-iid \
--allow-extra-chr \
--const-fid \
--threads 5 \
--make-bed \
--out 444_geno0.2gq20maf0.005hwe1e-5.id
-------------
./keep444.txt:
0 B1
0 B2
0 B3
...
Please replace the sixth line of the 444_geno0.2gq20maf0.005hwe1e-5.id.fam
file with their phenotype values. Using the Excel VLOOKUP
function is recommended.
We selected the linear mixed model (LMM) and used all three tests: Wald test
, likelihood ratio test
, and score test
with the -lmm 4
option. Detailed documentation for the GEMMA software is available here. The GEMMA offical GitHub: https://github.com/genetics-statistics.
name='444_geno0.2gq20maf0.005hwe1e-5.id'
gemma -bfile ${name} -gk 2 -miss 0.2 -maf 0.05 -hwe 1e-5 -o ${name}
gemma -bfile ${name} -n 1 -miss 0.2 -maf 0.05 -hwe 1e-5 -k ./output/${name}.sXX.txt -lmm 4 -o ${name}
name='444_geno0.2gq20maf0.005hwe1e-5.id'
# Please ensure your chromosome IDs contain only numbers. Alternatively, you can use the following code to revise your chromosome numbers:
# for i in {01..19}; do sed -i "s/Chr${i}/${i}/g" ${name}.assoc.txt ;done
Rscript ./manhattan.R ${name}.assoc.txt p_wald
# sudo yum install poppler-utils
pdftoppm -png -singlefile ${name}.assoc.txtp_wald_manhattan.pdf ./manhattan.plot
pdftoppm -png -singlefile ${name}.assoc.txtp_wald_qq.pdf ./qq.plot
Please check the tool of LDBlockShow, and population analysis pipeline.
All scripts could be found in this repository above. If you have any question, please do not hesitate contact us. Xu Wang: [email protected]
Wang, Xu, et al. "Integrative genomics reveals the polygenic basis of seedlessness in grapevine" Current Biology (2024). doi: https://doi.org/10.1016/j.cub.2024.07.022