yafeng / deqms Goto Github PK

DEqMS is a tool for quantitative proteomic analysis

R 100.00%

deqms's Introduction

DEqMS

DEqMS is a statistical tool for testing differential protein expression in quantitative proteomic analysis, developed by Yafeng Zhu @ Karolinska Institutet.

DEqMS is a published method, if you use it in your research, please cite: Zhu et al. DEqMS: A Method for Accurate Variance Estimation in Differential Protein Expression Analysis. Mol Cell Proteomics 2020 Mar 23. PMID: 32205417

Installation

To install lastest DEqMS package (v1.6.0), start R (version "4.0") and enter:


if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("DEqMS")

Introduction

DEqMS is developed on top of Limma. However, Limma assumes same prior variance for all genes. In proteomics, the accuracy of protein abundance estimates varies by the number of peptides/PSMs quantified in both label-free and labelled data. Proteins quantification by multiple peptides or PSMs are more accurate. DEqMS package is able to estimate different prior variances for proteins quantified by different number of PSMs/peptides, therefore achieving better accuracy. The package can be applied to analyze both label-free and labelled proteomics data.

How to use it

Browse DEqMS online Vignettes here

deqms's People

Contributors

Stargazers

Watchers

Forkers

zhouyawen1 glormph

deqms's Issues

spectraCounteBayes Warning messages

Hello, I have tried to analyse my label-free data by referring to the tutorial "DEqMS analysis using MaxQuant outputs (label-free data)", but when I run the code fit4 = spectraCounteBayes(fit3), R pops up a warning message. How can I fix it?

Thanks,
Nai xiang Yu

NA values in model

Hello,

I am executing limma on a proteomics dataset with 5 conditions and each condition having 3 samples. Limma is returning: "Partial NA coefficients for 209 probe(s) by executing:

# Specify comparisons
comparisons <- c("B-A", "C-A", "D-A", "E-A", "C-B", "D-B", "E-B", "D-C",  "E-C",  "E-D")

# Create design matrix
groupsM <- as.factor(condition)
designM <- model.matrix(~0+groupsM)
colnames(designM) <- levels(groupsM)

# Fit lm
fit <- lmFit(data, designM)

# Create contrasts
contr <- makeContrasts(contrasts = comparisons, levels = colnames(coef(fit)))

# Contrast fit and ebayes
fit2 <- contrasts.fit(fit, contr)
ebfit <- eBayes(fit2, trend=TRUE)

Now I call the DEqMS function spectraCountBayes(fit), but I get a warning (because of the NAs in the model ebfit):

prot <- rownames(fit$coefficients)
rowdata <- as.data.table(rowData(se))
PSMs <- data.frame("Razor + unique peptides" = rowdata$Razor...unique.peptides)
rownames(PSMs) <- rowdata$Protein.IDs
fit$count <- PSMs[prot, "Razor...unique.peptides"]
fit_DEqMS <- DEqMS::spectraCounteBayes(fit) # model variance

And the warning is:

Warning message:
In y.pred - digamma(df/2) :
longer object length is not a multiple of shorter object length

I looked into the spectraCounteBayes function and saw that the problem arises because y.pred (from loess model) is shorter. My question now is, isn't it possible to call loess(logVAR ~ x, span = 0.75, na.action = stats::na.exclude) with the na.action parameter, so that y.pred has the same length than the coefficients, gamma, etc. from my limma model?

Since I am calling limma with all comparisons together, I don't know how I would control that for each two-group comparison more than 2 non-missing values are present. Therefore I was trying to find another solution, and actually the na.action parameter appears to be to a good solution.

What do you think?

Best,
Lis

DEqMS on proteins quantified based on fragment ions

Hi, I have used the IQ package (https://cran.r-project.org/web/packages/iq/index.html) to run MaxLFQ in order to estimate protein level quantities. The IQ package uses the fragment ions quantity to infer protein quantities, which makes sense since I have acquired my samples in DIA mode. Now, since I have my quantitative protein table, I wonder if the DEqMS package could be a suitable option? I know that its doing some variance estimation based on identified PSMs, which I guess makes sense if you have DDA data with MS1 quantification. Now, DIA data and the quantified fragment ions, would it make sense to do variance estimation based on the number of identified fragment ions instead, rather than PSMs?

Best,

Marc

DEqMS DIA data DIANN output

I posted a similar question on Bioconductor but it looks like this may be a better place to post.
I see in a previous question that precursors should be used in place of peptide counts regarding DIA data. If DIANN output for protein intensity is determined using MaxLFQ , is it really necessary to use the number of precursors?

Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : NA/NaN/Inf in foreign function call (arg 2)

Dear Author：
when i ues DEqMS to fit the model：fit3$count = psm.count.table[rownames(fit3$coefficients),"count"]
fit4 = spectraCounteBayes(fit3).
i have this problem ：Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : NA/NaN/Inf in foreign function call (arg 2)

here is my codes：

cond = as.factor(c(rep("M",21),rep("F",22)))
design = model.matrix(~0+cond) # 0 means no intercept for the linear model
colnames(design) = gsub("cond","",colnames(design))
x <- c("F-M")
contrast = makeContrasts(contrasts=x,levels=design)
fit1 <- lmFit(pro_M_F, design)
fit2 <- contrasts.fit(fit1,contrasts = contrast)
fit3 <- eBayes(fit2)
library(matrixStats)
count_columns = seq(16,34,2)
psm.count.table = data.frame(count = rowMins(
as.matrix(df.prot[,count_columns])), row.names = df.prot$Protein.accession)
fit3$count = psm.count.table[rownames(fit3$coefficients),"count"]
fit4 = spectraCounteBayes(fit3)

when i use this command：fit3$count = psm.count.table[rownames(fit3$coefficients),"count"]，it will be error：

fit3$count
[1] NA 1 NA NA 4 5 0 11 61 NA 1 NA NA 6 28 6 NA
[18] 11 20 NA NA NA NA NA NA NA NA NA NA NA NA 23 NA NA
[35] 27 2 NA 3 NA NA 8 3 5 NA 146 NA 1 1 NA 8 NA
[52] 1 1 11 747 3 52 5 NA NA 1 27 5 NA NA NA NA NA
[69] 3 NA NA NA NA 144 NA 1 1025 6 NA NA NA NA 1 1 NA
[86] 18 NA NA NA NA 7 11 NA NA NA NA 6 0 0 355 NA NA
[103] NA 1 2 17 18 NA NA 3 NA 10 NA 1 NA NA

can you help me?? thanks

Issue with fit4

Hi there,

I've encountered an issue after I applied fit4 = spectraCounteBayes(fit3)

It reports this error: Error in model.frame.default(formula = logVAR ~ x) :
variable lengths differ (found for 'x')

I just happened to me recently

Thank you

Dependency matrixStats

Hi,

thanks for this nice package. I am currently trying to apply it to my data.

Just one small remark. The package matrixStats seems to be missing in the dependencies. Would be great to have it automatically installed.

Cheers,
Johannes

Error running DEqMS

Hello,
I am new to DEqMS. I am trying to analyze MaxQuant data. I was able to follow the vignettes instructions to process the PXD000279 data. However, when I load my own proteinGroups.txt, I encountered an error running the command:

pep.count.table = data.frame(count = rowMins(as.matrix(df.prot[,19:24])),
row.names = df.prot$Majority.protein.IDs)

Error in rowMins(as.matrix(df.prot[, 19:24])) :
Argument 'x' cannot be logical.

I wonder if you have any idea what caused this error.

Thank you for your time!

Hsin-Yao

Support for complex mixed model

Hi, I'm curious if DEqMS allows for the possibility of fine tuning the linear mixed model that it runs on each protein. Basically, my problem is that I am working with samples from a wild population of animals, which introduces non-random structure to the dataset. For example, if i were just running a linear mixed model of the data (currently MaxQuant output normalized with MSstats), the structure would be something like this:

Protein_Intensity ~ Age + (1|Social_Group/Individual) + (1|Sex) + (1|Season)

In this case, I am trying to see if the age of the individual predicts protein intensity, while accounting for the fact that there are 2 samples from each individual (one in each of 2 seasons), that each individual belongs to a social group, and they can be males or female. However, I presume that just running lmer on each protein intensity value would result in some problems given the large number of NAs in the dataset, which is why I am looking for a more appropriate program.

Is this something I can modify in this step or elsewhere? design = model.matrix(~0+cond)? If not, do you happen to know of a different program that would be more suitable for this type of analysis?

Thanks!

Error in BioConductor Developer Version 3.20

Dear DEqMS developer,

I am in the process of submitting my package "PRONE" (https://github.com/lisiarend/PRONE) to Bioconductor, which includes DEqMS as a dependency. However, your package is currently unable to be built in BioConductor Developer Version 3.20 (https://bioconductor.org/checkResults/3.20/bioc-LATEST/DEqMS/).

To proceed with the review and acceptance process of my package in Bioconductor, I have temporarily removed the DEqMS function. Nevertheless, I am keen to reintegrate it in the future version of PRONE once the issue if resolved.

I would greatly appreciate it if you could address the build issue with DEqMS.

Best regards,
Lis

The data url does not exist

Hi Yafeng,
The following code does not work because of "Error in download.file(url, destfile = "./miR_Proteintable.txt", method = "auto"): cannot open URL 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2016/06/PXD004163/Yan_miR_Protein_table.flatprottable.txt'"

url <- "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2016/06/PXD004163/Yan_miR_Protein_table.flatprottable.txt" download.file(url, destfile = "./miR_Proteintable.txt",method = "auto") df.prot = read.table("miR_Proteintable.txt",stringsAsFactors = FALSE, header = TRUE, quote = "", comment.char = "",sep = "\t")

TMT data analysis with DEqMS - with multiple set of TMT data sets

Dear Assume DEqMS developer teams!

how can i running DEqMS with multiple set of TMT dat sets??

i have 55 sample that multiplexed TMT-11plex with 5 set of data sets

in this case, how can i running DEqMS?

Thank you

Peptide counts for multiple label-free sample comparisons

Hello, I want to use DEqMS to analyze my LFQ dataset, which consists of five samples (each in triplicate) and analyze the DEPS among all ten possible comparison pairs. Should I use the values from the total dataset for the peptide min count, or should I always use minimum peptide counts relevant only for the compared sample pairs?
Thanks
Martin Potocky

Error in getEAWP(object)

Hello

I'm trying to analyze a label-free data with DEqMS but I get an error when I apply lmFit.
Error in getEAWP(object) : Data object doesn't contain numeric expression values

My input data is an already normalized and log2 transformed protein table. Then in step protein.matrix = log2(as.matrix(df.LFQ.filter)) instead of using log2 I use the following expression protein.matrix = as.matrix(df.LFQ.filter)

I don't know if this is related to the error

Error in smooth.spline(x, logVAR, cv = FALSE) :missing or infinite values in inputs are not allowed

Hello,

I am testing the package to analyze a time-course proteomics data set produced by MaxQuant. The output was processed using MSstats for protein summarization and normalization and then translated into wide format to use it as input in limma + DEqMS. The spectral count data was produced from the msms.txt file. After following the steps in the vignette to modify the limma::lmFit output object to be used as input for the function DEqMS::spectraCounteBayes() I am receiving the error as written in the tittle. I am attaching here the necessary files and the code to reproduce the error. Every row with missing values was filtered-out from the expression data set and the same set of proteins were used to created the count-data matrix.

I would really appreciate if you could offer any support to understand what can be happening.

annotation_liver.txt
logexpr_msstats_norm_liver_wide.txt
count_liver_msms.txt

Code below

## Load packages ----

library(readr)
library(dplyr)
library(magrittr)
library(DEqMS)
library(limma)

## Annotation for experimental design ----

annot_liv <- read_delim("annotation_liver.txt",delim = " ") %>% 
          mutate(Raw.file = str_trim(Raw.file))

## Prep count data -----

wide_counts2 <- read_delim("count_liver_msms.txt", delim = " ")

wide_counts3 <- dplyr::select(wide_counts2, -Protein)

countsmat <- data.frame(count = rowMins(as.matrix(wide_counts3)),
                        row.names = wide_counts2$Protein)

## Prep expression data ----

liv_reshaped2woNA <- read_delim("logexpr_msstats_norm_liver_wide.txt", delim = " ")

## Formating for limma ----

liv_reshaped3 <- dplyr::select(liv_reshaped2woNA, -Protein)

liv_resh2matrx <- as.matrix(liv_reshaped3)

row.names(liv_resh2matrx) <- liv_reshaped2woNA$Protein

#liv_resh2matrx[1:6, 1:5]

## Prep design matrix ----

levs <- c("6h", "12h","18h", 
          "24h", "48h", "72h", "96h")

factors <- factor(x = annot_liv$Condition, levels = levs)

## design matrix with time as continuos value ----

time <- str_remove(factors, "h") %>% as.numeric()

# Cubic spline fitting ----
library(splines)
cspl <- ns(time, df = 3)

design_cub <- model.matrix(~cspl)


## Fitting model with limma ----

fit_cub1 <- lmFit(liv_resh2matrx, design = design_cub)
fit_cub2 <- eBayes(fit_cub1, trend = TRUE, robust = TRUE)

## Adapting object for DEqMS ----

fit_cub2$count = countsmat[rownames(fit_cub2$coefficients),"count"]
fit_cub3 = spectraCounteBayes(fit_cub2, coef_col = 2:4, fit.method = "spline")

I also tested:

Modifying the fit.method argument, with similar results (errors making reference to missing values).
Using a simpler model.matrix (design <- model.matrix(~0+time)) and setting the coef_col = 1.
Setting the trend and robust arguments in limma::eBayes() to FALSE.

All of these with the same result.

Many thanks in advance!

Combining t and P value adjustment with TREAT

Hello, I have a DIA dataset I am working with. I am interested in using the "treat" method (limma-voom and edgeR offer functions to do so) to determine if the logFC of a protein is significantly higher than that of a biologically meaningful value such as 0.5 instead of comparing it to 0. It is my understanding DEqMS simply adjusts the t and p-statistics reported by eBayes() based on the number of precursors used to identify and quantify each protein. I was wondering if there would be a way to combine these two methodologies.

Thanks,

Nathaniel Deimler

DEqMS for analyzing DIA data

Do you have any suggestion/tutorial for DIA data analysis?
Thank you for the wonderful package.