Comments (24)
You are trying to fetch this code here?
from rawrr.
Hi Henrik,
nice to hear that our packages are useful! Could you please upload a small code chunk that showcases how you currently use rawDiag
and how you envision to use rawrr
code instead? The bigger picture is: At some point we need to clean up/redo rawDiag. Meaning it would anyway use rawrr under the hood for data access. The first step is of course understanding what users would like to retrieve and design some sensible functions and managed code. The table returned by rawrr::readIndex
and the table returned by rawDiag are indeed similar, but we tried to keep the readIndex
-derived table very minimalistic to draw as little unneeded information as possible.
Best,
Tobi
from rawrr.
Hi Tobi,
I basically use it for a small PRM analyser tool, scanning for intensities of mz listed in a library. We found that rawDIAG doesn't work on our windows server (problem with temporary files, kind of unsolved) I now decided to switch to rawrr. I tried two strategies:
The first would be to use the readChromatogram function to read intensities from masses listed in a library. I found that this is too slow compared to the second approach:
Here I read chunks of ms2 spectra with readSpectrum and assemble the intensities for matching mz scans on my own. Now comes the tricky part: We need to prefilter out some scans to be more fast and specific. For instance, we filter out DDA component in mixed PRM acquisitions or preScans in SureQuant runs. Especially the latter doesn't directly work for now with rawrr, since I only get the ScanDescription from the readSpectrum function. I found for now a workaround, which is to filter after readSpectrum. I understand, that the index table should obtain only the most relevant information but the complete table as obtained from RawDiag was useful as well. If this is feasible at all, an option to get a comprehensive readOut might be useful (especially for us ;) ) for some users.
All the best
Henrik
from rawrr.
Hi Hendrik,
do I get you right? Your final goal is to extract fragment ion chromatograms?
Regarding "I found that this is too slow compared to the second approach" I have to say that something seems to be going very wrong here, since the power of readChromatogram
is that your are directly using the managed code (#C API) of the RawFileReader
library to drive the chromatogram extraction. There are NO intermediate steps done in R as retrieving scans and filtering peak lists (the API manages all of this!). But that is basically your claim: Doing that manually in R is faster than using the API. I am tempted to say: That is impossible. The RawFileReader API is very strong when it comes to scan filtering and chromatogram extraction.
Maybe it would make sense if you would share some raw files and code to better understand where the performance loss is coming from.
Best,
Tobi
from rawrr.
It is a bit unfortunate, because the html page is currently not rendered correctly, but still:
https://fgcz-ms.uzh.ch/~cpanse/rawR/test/functional_test.html
The R code that renders the html page is also executed on our server and does a lot of chromatogram extraction from raw files using rawrr
.
@cpanse somehow the html page is not rendered correctly and the link to the code is not working. Could you please fix this?
from rawrr.
Dear Tobi,
sry for not being precise enough. I agree that the readChromatogram is faster compared to readSpectrum but depended on the depth of the library (and probably also based on my implementation) I found that complex libaries benefit from simply reading the spectra and extract all information to all precursors in the library falling into the corresponding isolation and RT window. Anyhow our major problem at the moment is to get access to the ScanDescription. We need to analyse some SureQuant runs. These require preScans in the acquisition, which we like to exclude from the final analysis but the only way to find these scans is from the ScanDescription. With rawrr I found, that I can access them with the readSpectrum function. Any ideas? Meanwhile I'll upload a link to a rawfile.
Your help is very appreciated!
from rawrr.
Hi Tobi,
here's a link to a SureQuant Example
rawfile: https://mdcsft.mdc-berlin.de/download.php/1743126505/Elmo_20210911_MVB_HS_prelim_T2.raw.zip
from rawrr.
What do you mean by scanDescription
? Do you mean the scanType
attribute of the rawrrSpectrum
object? Or is that some variable/column name that is returned by rawDiag::read.raw(file = rawfile, rawDiag = FALSE)
from rawrr.
The column name is $Scan Description:
.
from rawrr.
It's returned by rawDiag::read.raw
and by rawrr::readSpectrum
.
from rawrr.
Sorry that column does not exist:
> colnames(read.raw(file = rawfile, rawDiag = TRUE))
executing mono /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rawDiag/exec/fgcz_raw.exe /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rawrr/extdata/sample.raw qc ...
missing column name(s): MasterScanNumber, LMCorrection, ElapsedScanTimesec, transient, AGCMode, PrescanMode
MasterScanNumber calculated
renamed LMmZCorrectionppm to LMCorrection
renamed AGCPSMode to PrescanMode
[1] "filename" "scanNumber" "StartTime" "BasePeakMass"
[5] "BasePeakIntensity" "TIC" "ScanType" "CycleNumber"
[9] "MSOrder" "MassAnalyzer" "PrecursorMass" "AGC"
[13] "ChargeState" "IonInjectionTimems" "FTResolution" "LMCorrection"
[17] "PrescanMode" "MasterScanNumber" "ElapsedScanTimesec" "AGCMode"
[21] "transient"
from rawrr.
I see. I think, it shows up, if you set rawDiag = F
from rawrr.
Not really...
> colnames(read.raw(file = rawfile, rawDiag = FALSE))
executing mono /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rawDiag/exec/fgcz_raw.exe /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rawrr/extdata/sample.raw qc ...
[1] "filename" "scanNumber" "ScanEventNumber"
[4] "StartTime" "BasePeakMass" "BasePeakIntensity"
[7] "TIC" "ScanType" "CycleNumber"
[10] "Frequency" "HighMass" "IonizationMode"
[13] "MSOrder" "MassAnalyzer" "Detector"
[16] "Lock" "PrecursorMass" "LastPrecursorMass"
[19] "CollisionEnergy" "IsolationWidth" "MultipleInjection"
[22] "MultiInjectInfo" "AGC" "MicroScanCount"
[25] "ScanSegment" "ScanEvent" "MasterIndex"
[28] "ChargeState" "MonoisotopicmZ" "IonInjectionTimems"
[31] "MaxIonTimems" "FTResolution" "MS2IsolationWidth"
[34] "MS2IsolationOffset" "AGCTarget" "HCDEnergy"
[37] "AnalyzerTemperature" "MassCalibration" "ConversionParameterB"
[40] "ConversionParameterC" "TemperatureCompppm" "RFCompppm"
[43] "SpaceChargeCompppm" "ResolutionCompppm" "NumberofLockMasses"
[46] "LockMass1mZ" "LockMass2mZ" "LockMass3mZ"
[49] "LMSearchWindowppm" "LMSearchWindowmmu" "NumberofLMFound"
[52] "LastLockingsec" "LMmZCorrectionppm" "IonOpticsSettings"
[55] "SLensRFLevel" "SLensVoltageV" "SkimmerVoltageV"
[58] "InjectFlatapoleOffsetV" "BentFlatapoleDCV" "MP2andMP3RFV"
[61] "GateLensVoltageV" "CTrapRFV" "DiagnosticData"
[64] "DynamicRTShiftmin" "IntensCompFactor" "ResDepIntens"
[67] "CTCDNumF" "CTCDComp" "CTCDScScr"
[70] "RawOvFtT" "LCFWHMparameter" "Rod"
[73] "PSInjTimems" "AGCPSMode" "AGCPSDiag"
[76] "HCDEnergyeV" "AGCFill" "Injectiont0"
[79] "t0FLP" "AccessId" "AnalogInput1V"
[82] "AnalogInput2V"
but these variables change with the instrument model. Could you please post a code chunk that shows this using your raw file?
from rawrr.
> colnames(rawDiag::read.raw(file = "/Users/tobiasko/Downloads/Elmo_20210911_MVB_HS_prelim_T2.raw", rawDiag = FALSE))
executing mono /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rawDiag/exec/fgcz_raw.exe /Users/tobiasko/Downloads/Elmo_20210911_MVB_HS_prelim_T2.raw qc ...
[1] "filename" "scanNumber" "ScanEventNumber"
[4] "StartTime" "BasePeakMass" "BasePeakIntensity"
[7] "TIC" "ScanType" "CycleNumber"
[10] "Frequency" "HighMass" "IonizationMode"
[13] "MSOrder" "MassAnalyzer" "Detector"
[16] "Lock" "PrecursorMass" "LastPrecursorMass"
[19] "CollisionEnergy" "IsolationWidth" "ScanDescription"
[22] "MultipleInjection" "MultiInjectInfo" "AGC"
[25] "MicroScanCount" "ScanSegment" "ScanEvent"
[28] "MasterIndex" "MasterScanNumber" "ChargeState"
[31] "MonoisotopicmZ" "Errorinisotopicenvelopefit" "IonInjectionTimems"
[34] "MaxIonTimems" "FTResolution" "MS2IsolationWidth"
[37] "MS2IsolationOffset" "AGCTarget" "HCDEnergy"
[40] "HCDEnergyV" "AnalyzerTemperature" "MassCalibration"
[43] "ConversionParameterB" "ConversionParameterC" "TemperatureCompppm"
[46] "RFCompppm" "SpaceChargeCompppm" "ResolutionCompppm"
[49] "NumberofLockMasses" "LockMass1mZ" "LockMass2mZ"
[52] "LockMass3mZ" "LMSearchWindowppm" "LMSearchWindowmmu"
[55] "NumberofLMFound" "LastLockingsec" "LMmZCorrectionppm"
[58] "IonOpticsSettings" "FunnelRFLevel" "DiagnosticData"
[61] "ApplicationMode" "MildTrappingMode" "APD"
[64] "OTIntensCompFactor" "ResDepIntens" "QTransComp"
[67] "PrOSANumF" "PrOSAComp" "PrOSAScScr"
[70] "RawOvFtT" "DynamicRTShiftmin" "LCFWHMparameter"
[73] "PSInjTimems" "AGCPSMode" "AGCPSDiag"
[76] "AGCTargetAdjust" "AGCDiag1" "AGCDiag2"
[79] "HCDabsOffset" "SourceCIDeV" "AGCFill"
[82] "Injectiont0" "t0FLP" "IsoParaR"
[85] "InjParaR" "AccessId" "AnalogInAV"
[88] "AnalogInBV" "FAIMSAttached" "FAIMSVoltageOn"
[91] "FAIMSCV" "PhiSDM" "PhiSDMWindowsmZ"
[94] "PhiSDMIterations" "PhiSDMFlagging" "PhiSDMProcessingTimems"
from rawrr.
Sure:
> colnames(RAW <- read.raw(file = r,rawDiag = F))
executing mono /Library/Frameworks/R.framework/Versions/4.0/Resources/library/rawDiag/exec/fgcz_raw.exe /Volumes/MAGIC_PAGIC/MDC/ValiTest/Elmo_20210911_MVB_HS_prelim_H2.raw qc ...
[1] "filename" "scanNumber" "ScanEventNumber" "StartTime" "BasePeakMass"
[6] "BasePeakIntensity" "TIC" "ScanType" "CycleNumber" "Frequency"
[11] "HighMass" "IonizationMode" "MSOrder" "MassAnalyzer" "Detector"
[16] "Lock" "PrecursorMass" "LastPrecursorMass" "CollisionEnergy" "IsolationWidth"
[21] "ScanDescription" "MultipleInjection" "MultiInjectInfo" "AGC" "MicroScanCount"
[26] "ScanSegment" "ScanEvent" "MasterIndex" "MasterScanNumber" "ChargeState"
[31] "MonoisotopicmZ" "Errorinisotopicenvelopefit" "IonInjectionTimems" "MaxIonTimems" "FTResolution"
[36] "MS2IsolationWidth" "MS2IsolationOffset" "AGCTarget" "HCDEnergy" "HCDEnergyV"
[41] "AnalyzerTemperature" "MassCalibration" "ConversionParameterB" "ConversionParameterC" "TemperatureCompppm"
[46] "RFCompppm" "SpaceChargeCompppm" "ResolutionCompppm" "NumberofLockMasses" "LockMass1mZ"
[51] "LockMass2mZ" "LockMass3mZ" "LMSearchWindowppm" "LMSearchWindowmmu" "NumberofLMFound"
[56] "LastLockingsec" "LMmZCorrectionppm" "IonOpticsSettings" "FunnelRFLevel" "DiagnosticData"
[61] "ApplicationMode" "MildTrappingMode" "APD" "OTIntensCompFactor" "ResDepIntens"
[66] "QTransComp" "PrOSANumF" "PrOSAComp" "PrOSAScScr" "RawOvFtT"
[71] "DynamicRTShiftmin" "LCFWHMparameter" "PSInjTimems" "AGCPSMode" "AGCPSDiag"
[76] "AGCTargetAdjust" "AGCDiag1" "AGCDiag2" "HCDabsOffset" "SourceCIDeV"
[81] "AGCFill" "Injectiont0" "t0FLP" "IsoParaR" "InjParaR"
[86] "AccessId" "AnalogInAV" "AnalogInBV" "FAIMSAttached" "FAIMSVoltageOn"
[91] "FAIMSCV" "PhiSDM" "PhiSDMWindowsmZ" "PhiSDMIterations" "PhiSDMFlagging"
[96] "PhiSDMProcessingTimems"
from rawrr.
btw in RawDiag it's named ScanDescription
.
from rawrr.
ok! Now it get it. This is a scan attribute that is not present in my data. Could it be that this is some type of free text description that you added to the LC-MS method? Or is this written by every SureQuant method template on the Exploris 480?
> Elmo <- rawDiag::read.raw(file = "/Users/tobiasko/Downloads/Elmo_20210911_MVB_HS_prelim_T2.raw", rawDiag = FALSE)
executing mono /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rawDiag/exec/fgcz_raw.exe /Users/tobiasko/Downloads/Elmo_20210911_MVB_HS_prelim_T2.raw qc ...
> levels(as.factor(Elmo$ScanDescription))
[1] "" "SQ_ENDO_K+8_1" "SQ_ENDO_K+8_2" "SQ_ENDO_K+8_3" "SQ_ENDO_R+10_1"
[6] "SQ_ENDO_R+10_2" "SQ_ENDO_R+10_3" "SQ_IS_K+8_1" "SQ_IS_K+8_2" "SQ_IS_K+8_3"
[11] "SQ_IS_R+10_1" "SQ_IS_R+10_2" "SQ_IS_R+10_3" "SQ_IS_R+10_4" "SQ_QUANT_K+8_1"
[16] "SQ_QUANT_K+8_2" "SQ_QUANT_K+8_3" "SQ_QUANT_R+10_1" "SQ_QUANT_R+10_2" "SQ_QUANT_R+10_3"
from rawrr.
hmmm... π€ ok! I think a have a workaround for you that could be used for extracting the information. It uses some functions the packages isn't exporting. π Some black magic! π€ͺ Two simple steps: 1) def. a custom accessor function (a function that returns a function 2) tabulate the return values
> S <- rawrr::readSpectrum(rawfile = "/Users/tobiasko/Downloads/Elmo_20210911_MVB_HS_prelim_T2.raw", scan = 1:1000)
> ScanDescription <- rawrr::makeAccessor(key = "Scan Description:", returnType = "character")
> df <- rawrr:::tabulateSpectrum(S[[1]], c("scanNumber", "ScanDescription"))
> l <- rawrr:::tabulateSpectrumSet(S, c("scanNumber", "ScanDescription"))
Have a look at the source code of tabulateSpectrumSet
. You could make it even more versatile by using map_df
instead of lapply
.
tabulateSpectrumSet <- function (x, accNames){
#if (!requireNamespace("purrr", quietly = TRUE)) {
# stop("Package \"pkg\" needed for this function to work. Please install it.",
# call. = FALSE)
#}
stopifnot(all(vapply(x, is.rawrrSpectrum, TRUE)))
lapply(x, tabulateSpectrum, accNames)
#purrr::map_df(x, tabulateSpectrum, accNames)
}
This will return a data.frame instead a list of data.frames
The map alternative would be something like:
mapSpectrumSet <- function (x, accNames){
if (!requireNamespace("purrr", quietly = TRUE)) {
stop("Package \"pkg\" needed for this function to work. Please install it.",
call. = FALSE)
}
stopifnot(all(vapply(x, is.rawrrSpectrum, TRUE)))
#lapply(x, tabulateSpectrum, accNames)
purrr::map_df(x, rawrr:::tabulateSpectrum, accNames)
}
and get a data.frame with the stuff you need:
> df <- mapSpectrumSet(S, c("scanNumber", "ScanDescription"))
> head(df)
scanNumber ScanDescription
1 1
2 2
3 3
4 4
5 5
6 6 SQ_IS_R+10_2
But it of course it still creates a lot of overhead, because you need to fetch the complete scan data first! But this is totally generic and can be done with any key:value pair found in a raw file!
from rawrr.
Hi, this looks very useful! I'll give it a try and tell you if it works! Black magic is definitely my thing π
Many thx (also for your patience, I should have explained my problem a bit better π).
from rawrr.
Welcome! Just let me know how it goes! But scan Description:
is def. something we might consider as return value for rawrr::readIndex
. We would need to check how expensive the access is or make it optional by having a parameter.
from rawrr.
Hi, just tested your code but realised, that I already have a similar approach with Spectra[[1]]$
Scan Description:`` π . Would be there an advantage when using your approach? Anyhow I'll cross fingers for making implementation of ScanDescription into rawrr::readIndex happeningπ₯³ .
Have nice evening and many thx!
from rawrr.
The answer is
> class(S[[1]])
[1] "rawrrSpectrum"
rawrr is implemented using S3 object-oriented programming (OOP). That implies you should use methods to set and get instance attributes:
> attributes(S[[1]])
$names
[1] "scan" "basePeak"
[3] "TIC" "massRange"
[5] "scanType" "rtinseconds"
[7] "pepmass" "centroidStream"
[9] "HasCentroidStream" "centroid.mZ"
[11] "centroid.intensity" "title"
[13] "charge" "monoisotopicMz"
[15] "mZ" "intensity"
[17] "Scan Description:" "Multiple Injection:"
[19] "Multi Inject Info:" "AGC:"
[21] "Micro Scan Count:" "Scan Segment:"
[23] "Scan Event:" "Master Index:"
[25] "Master Scan Number:" "Charge State:"
[27] "Monoisotopic M/Z:" "Error in isotopic envelope fit:"
[29] "Ion Injection Time (ms):" "Max. Ion Time (ms):"
[31] "FT Resolution:" "MS2 Isolation Width:"
[33] "MS2 Isolation Offset:" "AGC Target:"
[35] "HCD Energy:" "HCD Energy V:"
[37] "Analyzer Temperature:" "=== Mass Calibration: ===:"
[39] "Conversion Parameter B:" "Conversion Parameter C:"
[41] "Temperature Comp. (ppm):" "RF Comp. (ppm):"
[43] "Space Charge Comp. (ppm):" "Resolution Comp. (ppm):"
[45] "Number of Lock Masses:" "Lock Mass #1 (m/z):"
[47] "Lock Mass #2 (m/z):" "Lock Mass #3 (m/z):"
[49] "LM Search Window (ppm):" "LM Search Window (mmu):"
[51] "Number of LM Found:" "Last Locking (sec):"
[53] "LM m/z-Correction (ppm):" "=== Ion Optics Settings: ===:"
[55] "Funnel RF Level:" "==== Diagnostic Data: ====:"
[57] "Application Mode:" "Mild Trapping Mode:"
[59] "APD:" "OT Intens Comp Factor:"
[61] "Res. Dep. Intens:" "Q Trans Comp:"
[63] "PrOSA NumF:" "PrOSA Comp:"
[65] "PrOSA ScScr:" "RawOvFtT:"
[67] "Dynamic RT Shift (min):" "LC FWHM parameter:"
[69] "PS Inj. Time (ms):" "AGC PS Mode:"
[71] "AGC PS Diag:" "AGC Target Adjust:"
[73] "AGC Diag 1:" "AGC Diag 2:"
[75] "HCD abs. Offset:" "Source CID eV:"
[77] "AGC Fill:" "Injection t0:"
[79] "t0 FLP:" "Iso Para R:"
[81] "Inj Para R:" "Access Id:"
[83] "Analog In A (V):" "Analog In B (V):"
[85] "FAIMS Attached:" "FAIMS Voltage On:"
[87] "FAIMS CV:" "PhiSDM:"
[89] "PhiSDM Windows m/z:" "PhiSDM Iterations:"
[91] "PhiSDM Flagging:" "PhiSDM Processing Time ms:"
$class
[1] "rawrrSpectrum"
Using the $
operator is kind of a dirty hack! The clean way is defining a method. Why? "The user doesnβt need to worry about details of an object because they are encapsulated behind a standard interface." So stop thinking why Thermo puts white spaces and :
in keys, or refuses to use boolean values. Instead build a method a deals with this this properly. A good example is the accessor function faimsVoltageOn
(already part of the package):
> faimsVoltageOn
function (x)
{
stopifnot(is.rawrrSpectrum(x))
if ("FAIMS Voltage On:" %in% names(x)) {
y <- x$`FAIMS Voltage On:`
lookup <- c(No = "FALSE", Yes = "TRUE")
return(as.logical(lookup[y]))
}
else {
stop("FAIMS Voltage On: is not available!")
}
}
It offers proper error handling (since attributes are not always present, depends on instrument model) and maps no/yes (character) to TRUE/FALSE.
from rawrr.
Of course, OOP is a good reason to do so. I rather was asking about the readout, but I'm aware that it's a "dirty" hack π .
Many thx
from rawrr.
@cpanse we should discuss if it makes sense to include the Scan Description:
in the data frame returned by readIndex
. I think it is a rather generic description that might be import to guided targeted extraction of scans. I think it is also written by tribrid instrument models (Fusion, Lumos, ...).
from rawrr.
Related Issues (20)
- Enhancement - Complete readIndex() function HOT 9
- Peak charges for MS1 spectras HOT 4
- Spectrum scan centroid mZ, intensity and noises values do not match HOT 2
- Error in Example: Length of "x" and "y" are not matching HOT 3
- Read noise value for profile mode mass spectra HOT 4
- Read_Spectrum - Sum Spectra
- unit should be minute / auc computation in seconds HOT 24
- validate_rawrrSpectrum 'StartTime' HOT 4
- "Error: line 1 did not have 9 elements" for readIndex() and readChromatogram() + "Error : No scan vector is provided"for readSpectrum HOT 9
- Problem executing readChromatogram inside Singularity container HOT 5
- Add a check if `input` file exists and is not empty
- Error in if (rvs != "No RAW file specified!") { : the condition has length > 1 HOT 16
- Switch to RawFileReader 5.0.93 HOT 2
- different total number of Spectra in msconvert, compomics/ThermoRawFileParser and thermofisherlsms/RawFileReader HOT 2
- Request for auc.rawrrChromatogram HOT 3
- profile mode in readSpectrum HOT 11
- speed up readIndex/readSpectrum by base::textConnection HOT 14
- rawrr::buildRawrrExe() fails HOT 6
- auc.rawrrChromatogram question
- readChromatogram not working in xic mode (Windows 11)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rawrr.