fgcz / rawdiag Goto Github PK
View Code? Open in Web Editor NEWBrings Orbitrap mass spectrometry data to life; multi-platform, fast and colorful R package
Home Page: https://bioconductor.org/packages/rawDiag
Brings Orbitrap mass spectrometry data to life; multi-platform, fast and colorful R package
Home Page: https://bioconductor.org/packages/rawDiag
Our current readXICs() does not support any scan filters. Actually the compiled c# code uses the hard coded filter:
Filter = "ms"
that returns all scans.
see line 1046 of fgcz_raw.cs
I think it would be cool to have an additional parameter for readXICs() that passes filters to the c# function.
gp <- PlotMassHeatmap(PXD006932, bins=40)
gp2 <- gp + theme(legend.position = 'none') +
theme(axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position="none",
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
plot.background=element_blank()) +
theme(plot.title = element_blank()) +
theme(plot.subtitle = element_blank()) +
theme(strip.background = element_blank()) +
theme(strip.text = element_blank()) +
theme(plot.background = element_rect(fill = "black")) +
theme(panel.spacing = unit(-1, "lines"))
ggsave(filename = "graphics/Thumb.png", gp2,
device = 'png',
dpi = 300,
height = 9, width =16)
code for tweet https://twitter.com/hb9feb/status/1014602529034915840
#R
library(protViz)
library(rawDiag)
f <- function(rawfile, pepSeq, dt = 0.1){
mass2Hplus <- (parentIonMass(pepSeq) + 1.008) / 2
X <- readXICs(rawfile = rawfile, masses = mass2Hplus)
S <- read.raw(rawfile)
idx <- lapply(mass2Hplus, function(m){
which(abs(S$PrecursorMass - m) < 0.1)
})
scanNumbers <- lapply(idx, function(x){S$scanNumber[x]})
bestMatchingMS2Scan <- sapply(1:length(pepSeq), function(i){
peakList <- readScans(rawfile, scans = scanNumbers[[i]])
peptideSpecMatch <- lapply(peakList,
function(x){
psm(pepSeq[i], x, FUN = function (b, y){cbind(b, y)}, plot = FALSE)})
score <- sapply(1:length(peptideSpecMatch),
function(j){
sum(peakList[[j]]$intensity[abs(peptideSpecMatch[[j]]$mZ.Da.error) < 0.1])})
bestFirstMatch <- which(max(score, na.rm = TRUE) == score)[1]
scanNumbers[[i]][bestFirstMatch]
})
peakList <- readScans(rawfile, scans = bestMatchingMS2Scan)
pp <- lapply(1:length(pepSeq), function(j){
jpeg(filename = paste("~/Desktop/rawDiag_", pepSeq[j],".jpeg", sep=''), quality = 100, height = 640)
op<-par(mfrow = c(2,1), mar = c(5,4,4,3))
peakplot(pepSeq[j], peakList[[j]], FUN = function (b, y){cbind(b, y)})
t <- S$StartTime[bestMatchingMS2Scan[j]];
peak.idx <- which((t - dt) < X[[j]]$times & X[[j]]$times < (t + dt))
plot(X[[j]], xlim = c(t - 0.2, t + 0.2), main = paste("RT =", round(t * 60), 'seconds', "[m+2H]2+ =", mass2Hplus[j] ),
xlab = 'RT [min]', ylab = 'intensity');
abline(v = t, col = rgb(0.8, 0.1, 0.1, alpha = 0.5), lwd = 3)
# peak fitting
xx <- X[[j]]$times[peak.idx]
yy <- X[[j]]$intensities[peak.idx]
points(xx, yy, pch = 16, col = rgb(0.0, 0.1, 0.8, alpha = 0.5))
# text(xx, yy, peak.idx, pos = 1)
peak <- data.frame(logy = log(yy), x = xx)
x.mean <- mean(peak$x)
peak$xc <- peak$x - x.mean
(fit <- lm(logy ~ xc + I(xc^2), data = peak))
xx <- with(peak, seq(min(xc) - 0.2, max(xc) + 0.2, length = 100))
lines(xx + x.mean, exp(predict(fit, data.frame(xc = xx))), col=rgb(0.25, 0.25, 0.25, alpha = 0.3), lwd = 5)
dev.off()
})
}
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3918884/
f(rawfile = "/Users/cp/Downloads/20180220_14_autoQC01.raw",
c('GAGSSEPVTGLDAK', 'VEATFGVDESNAK',
'TPVISGGPYEYR', 'TPVITGAPYEYR', 'DGLDAASYYAPVR',
'ADVTPADFSEWSK', 'GTFIIDPGGVIR')
)
First of all, rawDiag is AWESOME!
I guess this is more a feature request, but I am not completely sure this is even possible.
It would be nice to be able to extract from a raw file the list of monoisotopic m/z the mass spectrometer has 'calculated' from the MS1 survey scan. I believe Thermo is using a proprietary (?) algorithm for the MIPS, but probably the output of that step (probably m/z, charge and intensity) is stored in the final raw file for each MS1 scan.
Knowing which precursor has been actually fragmented and which one not due to the cycle time, one can decide whether it is worth to adjust LC & MS parameters to dig deeper into the sample.
Don't know if it ever was intended to work on ITMS data, however trying to get the peaklist of a Fusion Lumos ion trap scan always results in the following error:
Example scan:
Scan Mode: ITMS + c NSI r d Full ms2 [email protected] [100.00-825.00]
extract <- readScans(rawfile = rawfile), c(10638))
# No centroid stream available
Cheers
Daniel
INFORMATICS: ALGORITHMS AND STATISTICAL ADVANCES II 374-392
ThP 375
Optimize your Method: rawDiagnostic An R Package to Support Method Development for Bottom-up Proteomics on Orbitrap Instruments
used in the application section of https://www.biorxiv.org/content/early/2018/04/24/304485
and add the ID to the manuscript.
this function has to be refactored to eliminate the R CMD check NOTE: 'no visible binding for global variable'. using the mutate_at
before we should have a unit test
read.tdf <- function(filename){
con <- dbConnect(RSQLite::SQLite(), filename)
rv <- dbGetQuery(con, "SELECT * FROM Precursors a INNER JOIN Frames b on a.id == b.id;");
dbDisconnect(con)
rv <- rv[, c('Id','Time','ScanNumber','Intensity','SummedIntensities',
'MonoisotopicMz', 'Charge', 'MsMsType')];
colnames(rv) <- c('scanNumber','StartTime','BasePeakMass','BasePeakIntensity',
'totIonCurrent', 'PrecursorMass','ChargeState','MSOrder')
rv$filename <- basename(filename)
rv$MSOrder[rv$MSOrder == 0] <- "Ms"
rv$MSOrder[rv$MSOrder == 8] <- "Ms2"
as.rawDiag(rv)
}
Line 469 in 6efdab0
possible solution
rv <- as.data.frame(lapply(rv, function(x)
if(any(grepl(',', x))) as.numeric(gsub(',', '.', x)) else x), stringsAsFactors=FALSE)
or change the language settings
thanks to Yann GUITTON
rawDiag/inst/docker/fgcz_raw.cs
Line 512 in cc8e18e
Console.WriteLine(" Study: " + rawFile.SampleInformation.UserText[0]);
Console.WriteLine(" Laboratory: " + rawFile.SampleInformation.UserText[2]);
MQ bfabric workunit with combined course data: 175310
download zip than:
xx <- read.csv("paolo_20180716_o4526_MQ_txtFiles/evidence.txt",sep="\t")
xx %>% head()
# vielleicht brauchbare columns.
relevant <- xx %>% select(Raw.file,
Sequence,Modified.sequence,Proteins, Charge,MS.MS.m.z, m.z, Mass,
Retention.time, Retention.length,Score,Delta.score,MS.MS.scan.number,
Intensity, Type ) %>% head()
It would be nice to def. which scan type should be used as a marker for the start of an instrument cycle. Here is an example:
We execute cycles of
MS1 -> msxSIM -> M2-> ... -> MS1 -> ...
selecting MS1 as origin scan would def. an instrument cycle. This would allow plotting cycle specific stats.
gp <- ggplot(data = df, aes(x = log(abundance,10), y = log(intensity,10), fill=filename)) +
geom_point(stat='identity', size = 2, aes_string(group = "filename", colour = "filename")) +
geom_smooth(method = "lm", se = FALSE, aes_string(group = "filename", colour = "filename")) +
#geom_text(x = -2, y = 7, label = lm_eqn_promega(df), parse = TRUE, aes_string(group = "filename", colour = "filename")) +
facet_wrap(~ sequence * filename, scales = "free", nrow = 6)
and
gp <- ggplot(data = df, aes(x = rt, y = t, fill=filename)) +
xlab("iRT score") +
ylab("retention time [minutes]") +
geom_point(stat='identity', size = 2, ) +
geom_smooth(method = "lm", se = FALSE, aes_string(group = "filename", colour = "filename"))
if (input$plottype == "trellis") {
gp <- gp +
#geom_text(x = 0, y = median(df$t), label = lm_eqn(df), parse = TRUE) +
#geom_text(x = -2, y = 7, label = lm_eqn(df), parse = TRUE, aes_string(group = "filename", colour = "filename")) +
facet_wrap(~filename, ncol = 1, scales = "free")
}
gp <- gp + scale_fill_manual(values = cbbPalette)
here a link to the ASMS poster of Thermos Tartare!
https://assets.thermofisher.com/TFS-Assets/CMD/posters/po-65226-xcalibur-tartare-rawmeat-asms2018-po65226-en.pdf
Error in unlist(str_split(x, "\n"), recursive = FALSE, use.names = FALSE): lazy-load database '/usr/local /lib/R/site-library/stringr/R/stringr.rdb' is corrupt
.read.raw.info <- function(file,
mono = if(Sys.info()['sysname'] %in% c("Darwin", "Linux")) TRUE else FALSE,
exe = file.path(path.package(package = "rawDiag"), "exec", "fgcz_raw.exe"),
mono_path = "",
argv = "info",
system2_call = TRUE,
method = "thermo"){
if(system2_call && method == 'thermo'){
tf <- tempfile(fileext = '.tsv')
tf.err <- tempfile(fileext = '.tsv')
message(paste("system2 is writting to tempfile ", tf, "..."))
if (mono){
rvs <- system2("mono", args = c(exe, shQuote(file), argv),
stdout = tf)
}else{
rvs <- system2(exe, args = c(shQuote(file), argv),
stderr = tf.err,
stdout = tf)
}
if (rvs == 0){
rv <- read.csv(tf, sep = ":", stringsAsFactors = TRUE, header = FALSE,
col.names = c('attribute', 'value'))
message(paste("unlinking", tf, "..."))
unlink(tf)
# unlink(tfstdout)
return(rv)
}
}
NULL
}
When extracting centroided ITMS scans, rawDiag prints "try" for every single scan that it extracts to the console. Suggest to remove or make at least optional flag.
...
Bigger bug, I guess you didn’t have a Fusion Lumos file to test with:
Extracting the raw file for scans/XIC works, however the read.raw () function throws an error. I guess the naming is different, perhaps one can build a fallback flag to ignore them if none are found. Otherwise this might have to be adapted to different machines. I could provide raw files for Orbitrap XL, Velos, Elite, QE Plus, QE HF, QE HFX, Fusion Lumos.
metadata <- read.raw(rawfile)
system2 is writting to tempfile C:\Users\danielz\AppData\Local\Temp\RtmpqM2GA5\file139c2bc35ba1tsv ...
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
If you want to reproduce that, I uploaded a BSA raw file from said Fusion Lumos mass spectrometer ...
I tried to follow
"Register the .Net assembly in your system similar to a Linux installation", but it is unclear to me what this implies. The following things are done:
What are the next steps? How do I install the NuGet packe? Do I need VisualStudio? If not, what are the alternatives? We need to document this in a way that people without any prior experience in this area will to able to complete installation.
to benefit from to the fantastic PSI world.
hei zäme..
de ScanTime plot (den so viele Leute sich anschauen wollten) funktioniert nicht.
Hello! Thank you for developing this useful package. Is there any way to add the signal to noise info to the object returned by ReadScans? This is a value that Thermo stores in the .raw file for every peak, besides the m/z and intensity. I believe that it can be obtained from one of the ThermoFisher.CommonCore DLLs already utilized by rawDiag. Thanks!
I created some raw files that could be used for unit testing.
a) Calibration mix recording on a FUSION (profile & centroid mode) using direct infusion (no LC seperation!)
Pierce LTQ Velos ESI Positive Calibration Solution, Product number: 88323
product homepage
Certificate of analysis
Could be used to test basic functions like:
MSScan_Orbi_centroid.raw contains 50 scan of type
FTMS + c ESI Full ms [150.0000-2000.2000]
scan #50 looks like this in FreeStyle 1.4 (uses RawFileReader)
The file header contains
FileHeader_MSScan_Orbi_centroid.txt
The profile mode file is structured accordingly and displays like this for scan 2
should pass a CRAN submission
Dear RawDiag Team,
I am trying to extract scans from a RAW file. MS2 scans work, MS1 scan extraction works in general, e.g. if I subselect the first 100 scans to extract. Whenever I submit a large amount of scans (like all MS1 scans of a file), readScans returns:
Error in source(tfo) : negative length vectors are not allowed
I suspect that one of the scans might be empty (have seen that before, but rarely). The behavior is file dependent, some run through, some don't. Are there verbose messages to find at which scan it goes wrong? If it is indeed an empty scan, can one try to catch this error?
This seems to be a memory issue, quite a lot hits for the error. When I chunk the scans (5x1000 scans) it runs fine. So I guess the function does not scale well to ~ 5k MS1 scans (profile) or > 80k MS2 scans (these were testfiles that fail).
RawDiag 0.0.29, R 3.5.2 under 64bit Windows:
file <- "02401_Ecoli_QC_R3.raw"
metaDat <- read.raw(file, rawDiag = FALSE)
idx <- metaDat[ which(metaDat$MSOrder == "Ms"),]$scanNumber
scanDat <- readScans(file, scans = idx)
File that I am using: https://drive.google.com/open?id=1VN4U21jtg5bY10Bb9bnFEZ-mTfRMFKEY
Thanks for the support.
#R
# Christian Panse <[email protected]>
# Functional Genomics Center Zurich 2018
# System Requirements
pkgs <- c( 'devtools',
'dplyr',
'ggplot2',
'hexbin',
'magrittr',
'parallel',
'protViz',
'rmarkdown',
'RSQLite',
'scales',
'shiny',
'tidyr',
'tidyverse',
'DT')
(pkgs <- pkgs[(!pkgs %in% unique(installed.packages()[,'Package']))])
if(length(pkgs) > 0){install.packages(pkgs)}
# Installation
install.packages('http://fgcz-ms.uzh.ch/~cpanse/rawDiag_0.0.28.tar.gz', repos=NULL)
# Testing
library(rawDiag)
(rawfile <- file.path(path.package(package = 'rawDiag'), 'extdata', 'sample.raw'))
system.time(RAW <- read.raw(file = rawfile))
dim(RAW)
summary.rawDiag(RAW)
PlotScanFrequency(RAW)
# read all dimensions
dim(RAW)
RAW <- read.raw(file = rawfile, rawDiag = FALSE)
dim(RAW)
R.version.string; Sys.info()[c('sysname', 'version')]
run the rawDiag shiny application
library(rawDiag)
# root defines where your raw files are
rawDiagShiny(root="D:/Data2San/")
run as BAT script on the windows box
"c:\Program Files\R\R-3.5.1\bin\R.exe" -e "library(rawDiag); rawDiagShiny(root='D:/Data2San', launch.browser=TRUE)"
or from the Linux/Apple command line
R -e "library(rawDiag); rawDiagShiny(root='$HOME/Downloads', launch.browser=TRUE)"
and you can add it to you $HOME/.bashrc
alias rawDiag="R -e \"library(rawDiag); rawDiagShiny(root='$HOME/Downloads', launch.browser=TRUE)\""
So far rawDiag supports:
Are there other peptide sets that could make sense?
PROCAL
Zolg, D. P., Wilhelm, M., Yu, P., Knaute, T., Zerweck, J., Wenschuh, H., et al. (2017). PROCAL: A Set of 40 Peptide Standards for Retention Time Indexing, Column Performance Monitoring, and Collision Energy Calibration. Proteomics, 17(21), 1700263. http://doi.org/10.1002/pmic.201700263
https://fgcz-bfabric.uzh.ch/bfabric/userlab/show-workunit.html?id=170064&tab=details
4 Exp., jedes vom Type Full MS -> ddMS2 (1 s), aber jedes Exp. hat eine andere Kombie aus
FTMS/ITMS und Centroided/Profile data.
Hi Christian,
I was thinking: it would be really cool if in the shiny version of rawDiag you added an option for a custom mass XIC. For example if I wanted to see where a particular trypsin peptide was eluting I could type in the mass range (e.g. 421.74-421.76) and it would display those XICs.
Thanks for all your help; I’m really loving rawDiag.
Cheers,
Richard Hagan | PhD Student
Max Planck Institute for the Science of Human History
Kahlaische Straße 10 07745 Jena, Germany
having implementations for three method options.
plot.xic <- function(x, method = 'trellis'){
#x$fmass <- as.factor(x$mass)
figure <- ggplot(x, aes_string(x = "time", y = "intensity")) +
#geom_segment() +
geom_line(stat='identity', size = 1, aes_string(group = "filename", colour = "filename")) +
#scale_x_continuous(breaks = scales::pretty_breaks(8)) +
#scale_y_continuous(breaks = scales::pretty_breaks(8)) +
labs(title = "XIC plot") +
labs(subtitle = "Plotting XIC intensity against retention time") +
labs(x = "Retention Time [min]", y = "Intensity Counts [arb. unit]") +
theme_light()
if(input$XICmainPeak){
figure <- figure + facet_wrap(~ x$mass , scales = "free", ncol = 1)
}else{
figure <- figure + facet_wrap(~ x$mass , ncol = 1)
}
return(figure)
}
also consider to load data as
data(rawDiag)
and having a man page.
rawfile <- structure(list(path = "Downloads/Resource_642890/20180717_006_tSIM_demo.raw", header = ... ), class = "raw")
accordingly XIC() could be def. as
XIC(rawfile, mz, tol, ...)
PlotHeatmap <- function(x, deconvolute = TRUE, ...){
}
We should have a tab in the GUI that displays some important infos regarding the software:
brought up by Eduard
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.