xihaoli / staarpipeline Goto Github PK

An R package for performing association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using STAARpipeline

License: GNU General Public License v3.0

R 93.25% C 0.04% C++ 6.71%

statistical-genetics whole-genome-sequencing whole-exome-sequencing functional-annotation rare-variant-analysis staar-pipeline

staarpipeline's Issues

Forced relatedness in null model despite NULL kins

Hi all,

from reading and trying to understand/run the pipeline, the relatedness parameter is set to TRUE during the creation of the null model object, irrespective of whether kins is NULL or a matrix, c.f. lines 159 in fit_nullmodel.R

As a direct consequence, STAAR will call STAAR_O_SMMAT (since sparse_kins is set to TRUE, c.f. line 81 in fit_nullmodel.R) rather than STAAR_O. From the outside, it would thus appear that a function designed to account for population structure/relatedness is called despite no information about the population structure/relatedness being provided, which is surprising.

Was this intended? Or should the relatedness element of obj_nullmodel be set to FALSE, such that STAAR_O would be called?

preinstall packages

I would suggest add following lines for the preinstall packages. These packages seem need to be installed manually in the order (like STAAR depend on SeqArray).

BiocManager::install("SeqArray")
BiocManager::install("SeqVarTools")
devtools::install_github("hanchenphd/GMMAT")
BiocManager::install("GENESIS")
devtools::install_github("xihaoli/STAAR",ref="main")
BiocManager::install("TxDb.Hsapiens.UCSC.hg38.knownGene")
BiocManager::install("GenomicFeatures")
devtools::install_github("zilinli1988/SCANG")

I am also get confused for the Intel Math Kernel Library.

STAARPipeline for imbalanced binary phenotypes

Hello and first many thanks for generating such a nice and useful pipeline!

I've seen that you recently improved STAAR according to imbalanced binary scenarios. Is this already integrated in the STAARPipeline approach?

Many thanks in advance!

Best
Andi

Minimum number of samples?

I have a dataset with ~700 samples, eventually to increase to ~1000 but I'm testing out STAARpipeline on what I have so far. I know that is very small for a human GWAS study, but I thought the gene centric and/or sliding window analysis, in combination with the weights from annotations used in STAARpipeline, might bolster the power enough to be worthwhile.

I noticed that my results files from the STAARpipeline_Gene_Centric_Coding.R script just contained a list of NULL values. I went back and stepped through the script manually, and got the following message printed for the first gene:

# of selected samples: 721
# of selected variants: 103
# of selected samples: 721
# of selected variants: 12
# of selected samples: 721
# of selected variants: 0
Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff,  : 
  genotype is not a matrix!
# of selected samples: 721
# of selected variants: 0
Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff,  : 
  genotype is not a matrix!
# of selected samples: 721
# of selected variants: 3
Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff,  : 
  Number of rare variant in the set is less than 2!
# of selected samples: 721
# of selected variants: 9
Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff,  : 
  Number of rare variant in the set is less than 2!
# of selected samples: 721
# of selected variants: 0
Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff,  : 
  genotype is not a matrix!
# of selected samples: 721
# of selected variants: 3,724,472

If I do str(results):

List of 5
 $ plof               : NULL
 $ plof_ds            : NULL
 $ missense           : NULL
 $ disruptive_missense: NULL
 $ synonymous         : NULL

Is my dataset just too small, or is there some other issue to be fixed? I am running 0.9.6 from the Docker container because I was working on this back in October, but can change to 0.9.7 if you think that would help.

Account for population frequency

Hi Dr. Li

In determining rare variant in the gene-centric coding pipeline, does STAARpipeline use allelic frequency information from population database such as gnomAD, 1000 Genome, ESP6500, etc to determine whether the variant in exonic region is also rare in all populations? Also, how do you define 'rare' in all other pipeline? Like rare in the cohort or rare in the population databases?

Thank you very much

Feature request: add to Biocontainers

Hi all,

It would be nice if the Dockerfile containing all the STAAR-related packages were added to biocontainers. Any interest?

How do other species generate the variants list to be annotated?

Hi,

I want to use STAARpipeline to detect rare variation information in plants. I have my own vcf file and its annotation information. How can I generate the variants list to be annotated? I read the FAVORdatabase_chrsplit.csv in the example, and I don't know what it means, so I don't know how to start.

Looking forward to your reply!

Ayn

Burden effects for all genes of a category

Hello,

I am trying to extract burden effects for all genes present in the rvas results rather than significant genes to perform meta-analysis? Is it possible to do so if so what is the potential way to accomplish it?

Regards
Akhil

Does this pipeline support sex chromosomes?

When I am running the process to the step of STAARpipeline/0.2.1Varinfo_gds.R, if I provide the chromosome as a sex chromosome, it will automatically recognize it as chrNA. I would like to know if this procedure supports calculations for sex chromosomes.

docker for STAARpipeline

Hello,

I am trying to see if there have been an implementation of this pipeline in the google cloud platform or if any docker has been available for this to perform the same in google cloud as I have seen the same in the Dnanexus platform for ukbiobank.

Regards
Akhil

Running docker image

Hi everyone!
Is there any possibility of providing a small tutorial on how to use the docker image from STAAR-pipeline?
Without any directions I'm not sure how to execute the correct steps with it and I'm not sure how to proceed. I'm working on implementing the staar-pipeline for our cluster and the docker image would be the best way to do so.
Thanks

Results are not consistent with what we got before

Hello,

I have performed RVAS analysis using STAARpipeline using TOPMed dataset and got the results but when I performed the analysis on the updated dataset, I am not getting the same results as before and Genes that are reported as significant in the original analysis is not significant or not present in the results files. Can you help me out regarding this?

Thank you so much for help.

SCANG error in class(genotype)=="dgCMatrix" Rversion 4.2.0

Trying to use Dynamic_Window_SCANG.r function I run into an error come from comparison class(genotype)=="dgCMatrix". While, going deeper into this I found that the problem is that genotype object (= Geno) has two classes "matrix" "array" and the comparison cannot be done. The issue with the two classes, appears after R version 4.0.0 (https://cran.r-project.org/doc/manuals/r-release/NEWS.html).

Integration of annotations

Hello I have a question based on the STAAR code and its underlying publications.

In the code it says:
#For each noncoding functional category, the conditional STAAR-B p-value is a p-value from an omnibus test #' that aggregated conditional Burden(1,25) and Burden(1,1), #' together with conditional p-values of each test weighted by each annotation using Cauchy method.
--> So this seems to me like calculate e.g. different burden tests by annotation weights, af weights etc. and integrating them afterwards

But based on your publication it seems to me that you integrated the beta-allele frequency weighting directly with the functional variant annotation and the variant score to generate e.g. QBurden.

Which one is correct?

Best
Andi

Are there provide a toy data?

Hi, Dr. li

thanks you for the tool, It looks handy for users to go through the GWAS analysis. And I want to know there are provide a toy data for a new guy, like me.

Looking forward your reply!

Null output in Dynamic window analysis

Hi,

Recently I am using your pipeline on Docker to do burden test with dynamic window analysis.

After generating aGDS file, I conducted step0-step1-step5. I tested with WES data of 6 people (2 trios) with a chr1 region. Here is my original vcf file, generated aGDS file and commands:
dynamic_window_test.zip

As I only tested one chromosome, I changed some lines of codes:

#### Number of jobs for each chromosome
jobs_num <- matrix(rep(0,3),nrow=1)
for(chr in 1:1)
{
	print(chr)
	gds.path <- agds_dir[1] # agds_dir[1]
	genofile <- seqOpen(gds.path)
	
	filter <- seqGetData(genofile, QC_label)
	SNVlist <- filter == "PASS" 

	position <- as.numeric(seqGetData(genofile, "position"))
	position_SNV <- position[SNVlist]
  
	jobs_num[chr,1] <- chr
	jobs_num[chr,2] <- min(position[SNVlist])
	jobs_num[chr,3] <- max(position[SNVlist])

	seqClose(genofile)
}

About groupid and arraryid in step 0, I directly used groupid = arraryid = scang_num.

Though no errors were reported,the output was null. I am not sure if these changes were correct.
Have no idea which step went wrong. Could you give me some advice?

Thanks!

Provide test data

Hi, I'm interested in your software. I would like to use this software on other species, but there are no directly available annotation files, only vcf files. Can you provide test data in the tutorial? It would be even better to provide a pipeline that starts with the raw vcf.
Thank you.

Gene_Centic_Coding Unable to Analyze Gene

Hello,

While running theGene_Centic_Coding function, I noticed a strange issue while processing through a list of genes for a specific chromosome.

On any given gene, the function seems to work properly until the internal coding function attempts to run the STAAR function:

try(pvalues <- STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, 
    rare_maf_cutoff = rare_maf_cutoff, rv_num_cutoff = rv_num_cutoff), 
    silent = silent)

I am receiving the following error, and thus no results from the current gene:

Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff,  :     Dimensions don't match for genotype and annotation!

This error occurs virtually for all genes. Looking into this, it appears the issue is how the annotation data is subset for the final list of variants that are lof in plof:

Anno.Int.PHRED.sub.category <- Anno.Int.PHRED.sub[lof.in.plof, ]

When I run this, lof.in.plof is a vector of NAs, TRUEs, and FALSEs, with the number of TRUEs corresponding to the final filtered number of variants to use (in my case, 5). When the annotation data in Anno.Int.PHRED.sub is subset using this vector, however, the final dimensions of the table still contain the number of rows that correspond to the previous number of variants (which, in my case, was 129).

The Geno matrix has the dimensions [n samples x 5 variants]. When Anno.Int.PHRED.sub.category is passed to the STAAR function, however, its dimensions are still [n samples x 129 variants], causing the error.

If I wrap the which function around lof.in.plof, the dimensions of the resulting table are [n samples x 5] and STAAR is able to run properly and gives no error:

Anno.Int.PHRED.sub.category <- Anno.Int.PHRED.sub[which(lof.in.plof),]

I assume this fix makes sense and there shouldn't be a reason Anno.Int.PHRED.sub.category should still contain rows with NA data..? The final dimensions of this annotation table should indeed match that of the genotype matrix, no?

xihaoli / staarpipeline Goto Github PK

staarpipeline's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs