fgcz / rawrr Goto Github PK

View Code? Open in Web Editor NEW

52.0 7.0 8.0 3.16 MB

Access Orbitrap data in R lang using C# mono assembly - bioconductor package

Home Page: https://bioconductor.org/packages/rawrr/

C# 31.93% R 57.10% TeX 10.97%

fast multiplatform rpackage mass-spectrometry orbitrap-ms

rawrr's Issues

Functional test using raw files (autoQC01) from different LC-MS systems @FGCZ

determine input

last two files of each available instrument

PFILE=/srv/www/htdocs/Data2San/sync_LOGS/pfiles.txt ; 
cat ${PFILE} \
  | cut -f4 -d";" \
  | cut -d"/" -f3 \
  | sort \
  | uniq \
  |while read i ; 
  do 
        grep -E "/${i}/.*autoQC01.*raw$" ${PFILE} \
          | tail -n 2; 
  done \
  | cut -d";" -f4 \
  | while read raw ; 
    do 
        [[ -f /srv/www/htdocs/${raw} ]] && echo ${raw} ; 
  done

How to access S/N values

Dear all,
is there a way of accessing the S/N values from a readSpectrum-Object? I cant seem to find the info in the list.
Thanks

consider to build rawrr assemblies through using msbuild

benefit

no binary code would be contained in the source package.

requires

all RawFileReader Assemblies need to be installed

    if (Sys.which("msbuild") == "" && Sys.which("xbuild") == "")
    {
        warning ("could not find msbuild or xbuild in path; will not be able to use rDotNet unless corrected and rebuilt")
        return()
    }

TEST CASE 1 - no mono runtime

docker run -a stdin -a stdout -i -t rocker/verse:4.0.5 R

install.packages('http://fgcz-ms.uzh.ch/~cpanse/rawrr_0.99.13_19.tar.gz', repo=NULL)


rawfile <- rawrr::sampleFilePath()

h <- rawrr::readFileHeader(rawfile)
i <- rawrr::readIndex(rawfile)
x <- rawrr::readChromatogram(rawfile=rawfile, type="tic")
s <- rawrr::readSpectrum(rawfile, 1:9)

TEST CASE 2 - runtime installed

docker run -a stdin -a stdout -i -t c95c10872a5d

install.packages('http://fgcz-ms.uzh.ch/~cpanse/rawrr_0.99.13_19.tar.gz', repo=NULL)

rawfile <- rawrr::sampleFilePath()

h <- rawrr::readFileHeader(rawfile)
i <- rawrr::readIndex(rawfile)
x <- rawrr::readChromatogram(rawfile=rawfile, type="tic")
s <- rawrr::readSpectrum(rawfile, 1:9)

Listing of the Dockerfile

FROM rocker/verse:4.0.5
 
RUN apt-get update \
&& sudo apt-get install mono-runtime -y

CMD ["R"]

TEST CASE 3 - msbuild is installed

docker run -a stdin -a stdout -i -t f53000645fca

install.packages('http://fgcz-ms.uzh.ch/~cpanse/rawrr_0.99.13_19.tar.gz', repo=NULL)

rawfile <- rawrr::sampleFilePath()


h <- rawrr::readFileHeader(rawfile)
i <- rawrr::readIndex(rawfile)
x <- rawrr::readChromatogram(rawfile=rawfile, type="tic")
s <- rawrr::readSpectrum(rawfile, 1:9)

Listing of the Dockerfile

FROM rocker/verse:4.0.5
 
RUN apt-get update \
&& sudo apt-get install mono-mcs mono-xbuild -y

CMD ["R"]

TEST CASE 4 - msbuild is installed and MONO_PATH set

docker run -a stdin -a stdout -i -t -v /usr/local/lib/RawFileReader/:/usr/local/lib/RawFileReader/ d6cec6026a70

docker run -i -v /usr/local/lib/RawFileReader/:/usr/local/lib/RawFileReader/ d6cec6026a70 R --no-save << EOF

install.packages('http://fgcz-ms.uzh.ch/~cpanse/rawrr_0.99.13_19.tar.gz', repo=NULL)
Sys.getenv("MONO_PATH")


rawfile <- rawrr::sampleFilePath()

h <- rawrr::readFileHeader(rawfile)
i <- rawrr::readIndex(rawfile)
x <- rawrr::readChromatogram(rawfile=rawfile, type="tic")
s <- rawrr::readSpectrum(rawfile, 1:9)

EOF

Listing of the Dockerfile

FROM rocker/verse:4.0.5
 
RUN apt-get update \
&& sudo apt-get install mono-mcs mono-xbuild -y

CMD ["R"]

Titles in .plotChromatogramAndFit?

Hi guys,

When plotting multiple raw files with iRT peptides, I'm using the function .plotChromatogramAndFit that you showed.
I want to add a title being the name of each of those raw files to the plot, though I'm not being successful. Any ideas?

plot(x, main=???); legend("topright", legend=i, title='Instrument Model', bty = "n", cex=0.75)

Thanks a lot for the great library :)

citation()

We should update the package in a way that `citation("rawrr") returns the desired information. The current state is:

> citation(package = "rawrr")

To cite package ‘rawrr’ in publications use:

  Christian Panse and Tobias Kockmann (NA). rawrr: Access to Thermo Fisher Scientific raw
  files from R. R package version 0.1.7. https://github.com/fgcz/rawR/

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {rawrr: Access to Thermo Fisher Scientific raw files from R},
    author = {Christian Panse and Tobias Kockmann},
    note = {R package version 0.1.7},
    url = {https://github.com/fgcz/rawR/},
  }

Warning messages:
1: In citation(package = "rawrr") :
  no date field in DESCRIPTION file of package ‘rawrr’
2: In citation(package = "rawrr") :
  could not determine year for ‘rawrr’ from package DESCRIPTION file

I would suggest to reference our bioRxiv manuscript for now.

ReadChromatogram intensity or area under the curve

Hi, I really appreciated your efforts in making this package.
I have a few questions about the readChromatogram function, I get the XIC for an analyte, such as the 836.07492 at tol:100.
Once I got the XIC, there are an equal number of retention time and intensity, when I look at the details of the RT and intensity, for example, at rt33.03 min, the output intensity from this function is 23020686, but for the raw data, the NL is 7.11E6. So I was wondering, how is the output 23020686 calculated?

Another question is that, is there any way to get the area under the curve of XIC? I want to do the quantitation analysis.

Thank you.

merge scan index and file header into a single `rawRindex` object?

Hi @cpanse,

I had a look at the return values of readIndex() and readFileHeader() and I think it would make sense to combine them into a single object. The object would be structured into a data portion which is the data.frame returned by readIndex. All items in the list returned by readFileHeader would become attributes of the object. The object class could be something like rawRindex.

Loading .raw value seperation

readChromatogram(...)

refactor rawDiag::readXICs(rawfile, masses=unique(RAW$PrecursorMass), tol=1000)

returns a nested S3 list

[[22]]
$mass
[1] 554.2606

$times
[1] 0.1216408 0.1516450 0.4810452 0.5409557 0.7801059

$intensities
[1] 3005.061 4328.104 3658.515 3862.011 4992.357

$filename
[1] "sample.raw"

attr(,"class")
[1] "list" "XIC" 

attr(,"class")
[1] "list" "XICs"
> X[[20]]
$mass
[1] 653.3617

$times
 [1] 0.001619751 0.031642766 0.061663615 0.091651065 0.121640750 0.151644970
 [7] 0.181667770 0.211526280 0.241284530 0.271307600 0.301222200 0.331145000
[13] 0.361147270 0.391168050 0.421057700 0.450970250 0.481045180 0.510989030
[19] 0.540955750 0.570893130 0.600724580 0.630620300 0.660428770 0.690318400
[25] 0.720320350 0.750197620 0.780105920

$intensities
 [1] 374171.6 405717.2 350914.7 373948.4 328768.2 425965.4 360327.9 453483.1
 [9] 445894.1 430538.9 422901.3 545305.5 433117.8 357588.1 435593.4 351018.2
[17] 407768.5 406468.8 446027.8 385148.5 579871.8 461409.0 390769.3 458988.6
[25] 378339.6 480078.4 467780.0

$filename
[1] "sample.raw"

attr(,"class")

with additional attributes:

input:

scan filter
type : XIC, BPC, TIC

type BPC, TIC need no additional parameters. XIC requires mz and tolerance in addition.

output:

type, e.g, XIC, TIC, BPC (base peak chromatogram)
tolerance (in ppm)
mz
scan filter

How do we solve the license issue with vendor libraries?

see

Error in Example: Length of "x" and "y" are not matching

Hi everyone,
to test this package I wanted to load the .raw file and follow the provided example code.
Somehow my R sends me an error message that the "x and "y" coordinates are not matching:

Now when I run I get:

> plot(S[[1]], centroid=TRUE)
Error in xy.coords(x, y, xlabel, ylabel, log) : 
  Length of 'x' and 'y' do not match

I have absolutly no Idea what I'm doing wrong and am super lost.
I would appreciate some help here!

Also I am new to R and working with bio-informatics data so if anyone could provide any help how to come up with the number in the scan vector (paper just mentions some database seach?) that would be awesome aswell.
Thanks in advance!

readChromatogram does not return 0 values

Hi,

Thanks for making a great tool! I have found it quite useful so far 👍
I have an issue for XIC values when I wish to plot a certain peptides.

Firstly, I can successfully extract and plot the XICs using your inbuilt functions, but cannot figure out how to constrain the retention times plotted/extracted.

I did manage to access the S3 elements in the chromatogram object and plot them myself in ggplot, but then had an issue where rawR does not report the 0 values for M/Zs at certain times. This is useful to see the shape of the eluting peptide, though I acknowledge it will likely increase the object size...

Is it possible to clarify (1) how to constrain the XIC for a certain retention time range, and (2) how to access (or at least impute from RT of MS1 scans) the 0 values of XICs.

Thanks again for making this tool,
Tara

readFileHeader(...)

function reads file header information. In Freestyle key:value pairs

Sample Name	autoQC01	
Comment		
Seq Row	10	
Sample Type	Unknown	
Path	D:\Data2San\p2469\Proteomics\QEXACTIVEHF_2\bpfister_20200714	
Cal Level		
Cal File		
Inj Volume	2	
Sample Weight	0	
Sample Volume	0	
Sample Id	NA	
Istd Amount	0	
CD Factor	0	
Bar Code		
Bar Code Status	0	
Inst Method	C:\Xcalibur\methods\__autoQC\trap\autoQC01.meth	
Proc Method		
User Text1	2469	
User Text2		
User Text3	FGCZ	
User Text4		
User Text5		
Tray Index	80	
Tray Name	ANSI-48Vial2mLHolder/ANSI-48Vial2mLHolder	
Tray Shape	Rectangular	
Vial Index	48	
Vials Per Tray	48	
Vials Per TrayX	8	
Vials Per TrayY	6	
Instrument Name	Q Exactive HF Orbitrap	
Instrument Model	Q Exactive HF Orbitrap	
Instrument Number	Exactive Series slot #2496	
Instrument SoftWare	2.9-290204/2.9.3.2948	
Instrument Hardware	rev. 1	
Flags		
Mass Tolerance	0.5 amu	
Created by	Administrator

returns S3 object, (nested) list

include query functions for ProteomicsDB / Prosit

The goal would be that Spectra could not only be read from local raw files, but also public repositories like ProteomicsDB and prediction services like Prosit. A REST endpoint is already available and used by USE. This REST interface should also work for queries from R.

package naming

solve the naming conflict with

https://CRAN.R-project.org/package=rawr

R CMD check rawR_0.1.1.tar.gz

will produces

* package encoding: UTF-8
* checking CRAN incoming feasibility ... ERROR
Maintainer: 'Christian Panse <[email protected]>'
New submission
Conflicting package names (submitted: rawR, existing: rawr [https://CRAN.R-project.org])
Conflicting package names (submitted: rawR, existing: rawr [CRAN archive])
The Title field should be in title case. Current version is:
'Access to Thermo Fisher Scientific raw files from R'
In title case that is:

R> BiocCheck("rawR_0.1.1.tar.gz")
This is BiocCheck version 1.26.0. BiocCheck is a work in
progress. Output and severity of issues may change. Installing
package...
* Checking Package Dependencies...
* Checking if other packages can import this one...
* Checking to see if we understand object initialization...
* Checking for deprecated package usage...
* Checking for remote package usage...
* Checking version number...
* Checking for version number mismatch...
* Checking version number validity...
    Package version 0.1.1; pre-release
* Checking R Version dependency...
* Checking package size...
* Checking individual file sizes...
    * WARNING: The following files are over 5MB in size:
      'rawRcolor.tif'
* Checking biocViews...
* Checking that biocViews are present...
    * ERROR: No biocViews terms found.
See http://bioconductor.org/developers/how-to/biocViews/
* Checking build system compatibility...
* Checking for blank lines in DESCRIPTION...
* Checking if DESCRIPTION is well formatted...
* Checking for proper Description: field...
* Checking for whitespace in DESCRIPTION field names...
* Checking that Package field matches directory/tarball
  name...
* Checking for Version field...
* Checking for valid maintainer...
* Checking DESCRIPTION/NAMESPACE consistency...
    * WARNING: Import grDevices, graphics, utils in
      DESCRIPTION as well as NAMESPACE.
* Checking vignette directory...
    This is an unknown type of package
    * ERROR: No 'vignettes' directory.
* Checking library calls...
* Checking for library/require of rawR...
* Checking coding practice...
    * NOTE: Avoid sapply(); use vapply()
      Found in files:
        rawR.R (line 1011, column 29)
    * NOTE: Avoid 1:...; use seq_len() or seq_along()
      Found in files:
        rawR.R (line 600, column 36)
        rawR.R (line 745, column 70)
Warning in readLines(infile) :
  incomplete final line found on '/tmp/RtmpLJg2l6/filedebd713f5408/rawR/tests/testthat/test-header.R'
    * WARNING: Avoid class() == or class() != ; use is() or
      !is()
      Found in files:
        R/rawR.R (line 68)
* Checking parsed R code in R directory, examples,
  vignettes...
* Checking function lengths..........
    * NOTE: Recommended function length <= 50 lines.
      There are 5 functions > 50 lines.
      The longest 5 functions are:
        plot.rawRspectrum() (R/rawR.R, line 711): 108 lines
        readChromatogram() (R/rawR.R, line 429): 105 lines
        print.rawRspectrum() (R/rawR.R, line 839): 84 lines
        readFileHeader() (R/rawR.R, line 106): 61 lines
        validate_rawRspectrum() (R/rawR.R, line 635): 52 lines
* Checking man page documentation...
    * WARNING: Add non-empty \value sections to the following
      man pages: man/plot.rawRchromatogram.Rd,
      man/plot.rawRchromatogramSet.Rd,
      man/plot.rawRspectrum.Rd, man/print.rawRspectrum.Rd,
      man/summary.rawRspectrum.Rd
      man/plot.rawRspectrum.Rd, man/print.rawRspectrum.Rd,
      man/summary.rawRspectrum.Rd
    * ERROR: At least 80% of man pages documenting exported
      objects must have runnable examples. The following pages
      do not:
      new_rawRspectrum.Rd, plot.rawRchromatogramSet.Rd,
  validate_rawRspectrum.Rd
    * NOTE: Usage of dontrun{} / donttest{} found in man page
      examples.
      14% of man pages use one of these cases.
      Found in the following files:
        readChromatogram.Rd
        readSpectrum.Rd
    * NOTE: Use donttest{} instead of dontrun{}.
      Found in the following files:
        readChromatogram.Rd
        readSpectrum.Rd
* Checking package NEWS...
    * NOTE: Consider adding a NEWS file, so your package news
      will be included in Bioconductor release announcements.
* Checking unit tests...
* Checking skip_on_bioc() in tests...
* Checking formatting of DESCRIPTION, NAMESPACE, man pages, R
  source, and vignette source...
    * NOTE: Consider shorter lines; 32 lines (2%) are > 80
      characters long.
    First 6 lines:
      R/rawR.R:7 .writeRData <- function(rawfile, outputfile=paste0...
      R/rawR.R:14         list(scanType=rv$scanType, mZ=rv$mZ, inte...
      R/rawR.R:28                 warning("Can not find Mono JIT co...
      R/rawR.R:41             rvs <- system2(Sys.which('mono'), arg...
      R/rawR.R:64 #' pathToRawFile <- file.path(path.package(packag...
      R/rawR.R:154                 e$info$`Instrument method` <- ba...
    * NOTE: Consider 4 spaces instead of tabs; 5 lines (0%)
      contain tabs.
    First 5 lines:
      R/zzz.R:5 	if(interactive()){
      R/zzz.R:6 		version <- packageVersion('rawR')
      R/zzz.R:7 		packageStartupMessage("Package 'rawR' version ", ...
      R/zzz.R:8 	  invisible()
      R/zzz.R:9 	}
    * NOTE: Consider multiples of 4 spaces for line indents,
      233 lines(14%) are not.
    First 6 lines:
      R/rawR.R:107    mono = if(Sys.info()['sysname'] %in% c("Darwi...
      R/rawR.R:108    exe = system.file('exec/rawR.exe',package = '...
      R/rawR.R:109    mono_path = "",
      R/rawR.R:110    argv = "infoR",
      R/rawR.R:111    system2_call = TRUE,
      R/rawR.R:112                            method = "thermo"){
    See
      http://bioconductor.org/developers/how-to/coding-style/
    See styler package:
      https://cran.r-project.org/package=styler as described
      in the BiocCheck vignette.
* Checking if package already exists in CRAN...
    * ERROR: Package must be removed from CRAN.
* Checking for bioc-devel mailing list subscription...
    * NOTE: Cannot determine whether maintainer is subscribed
      to the bioc-devel mailing list (requires admin
      credentials). Subscribe here:
      https://stat.ethz.ch/mailman/listinfo/bioc-devel
* Checking for support site registration...
    Maintainer is registered at support site.
Summary:
ERROR count: 4
WARNING count: 4
NOTE count: 10
For detailed information about these checks, see the BiocCheck
vignette, available at
https://bioconductor.org/packages/3.12/bioc/vignettes/BiocCheck/inst/doc/BiocCheck.html#interpreting-bioccheck-output
BiocCheck FAILED.
$error
[1] "No biocViews terms found."                                                                                      
[2] "No 'vignettes' directory."                                                                                      
[3] "At least 80% of man pages documenting exported objects must have runnable examples. The following pages do not:"
[4] "Package must be removed from CRAN."                                                                             
$warning
[1] "The following files are over 5MB in size: 'rawRcolor.tif'"                                                                                                                                                 
[2] "Import grDevices, graphics, utils in DESCRIPTION as well as NAMESPACE."                                                                                                                                    
[3] " Avoid class() == or class() != ; use is() or !is()"                                                                                                                                                       
[4] "Add non-empty \\value sections to the following man pages: man/plot.rawRchromatogram.Rd, man/plot.rawRchromatogramSet.Rd, man/plot.rawRspectrum.Rd, man/print.rawRspectrum.Rd, man/summary.rawRspectrum.Rd"
$note
 [1] " Avoid sapply(); use vapply()"                                                                                                                                                     
 [2] " Avoid 1:...; use seq_len() or seq_along()"                                                                                                                                        
 [3] "Recommended function length <= 50 lines."                                                                                                                                          
 [4] "Usage of dontrun{} / donttest{} found in man page examples."                                                                                                                       
 [5] "Use donttest{} instead of dontrun{}."                                                                                                                                              
 [6] "Consider adding a NEWS file, so your package news will be included in Bioconductor release announcements."                                                                         
 [7] "Consider shorter lines; 32 lines (2%) are > 80 characters long."                                                                                                                   
 [8] "Consider 4 spaces instead of tabs; 5 lines (0%) contain tabs."                                                                                                                     
 [9] "Consider multiples of 4 spaces for line indents, 233 lines(14%) are not."                                                                                                          
R>

Error if too many scans are selected

Hello again :-)

Unfortunately I have a little problem, which I don't know how to solve...

readSpectrum() gives the following error message:
Error in source(tfo, local = TRUE) : negative length vectors are not allowed

A little example how my code looks:

library(rawrr)
library(tidyverse)

#reading Index and selecting scans with ms_order = "Ms"
ms_order <- "Ms"
IDX <- as_tibble(readIndex(path))
scans <- IDX %>% filter(MSOrder == ms_order) %>% pull(scan)

SPC <- readSpectrum(path, scan = scans)

I have about 12000 scans in total and about 2500 MS1 scans in my raw file. It works fine as long as I only read about 2000 scans. After that I receive the error message. I don't know if it due to memory limits on my machine.

Thanks for your help in advance!
kaempfro

readChromatogram

Hi,

I really like the package. Thank you for that! I was just going through the vignette and when I do readChromatogramm it gives me the following error. I guess I will not be the only one with that. How do you solve that?

plot(rawR::readChromatogram(rawfile = rawfile, type = "tic"))
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed

Also:

C <- rawR::readChromatogram(rawfile, mass = iRTmZ, tol = 10, type = "xic", filter = "ms")

plot(C, diagnostic = TRUE)
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(x, na.rm = na.rm) :
no non-missing arguments to min; returning Inf
2: In max(x, na.rm = na.rm) :
no non-missing arguments to max; returning -Inf
3: In min(x, na.rm = na.rm) :
no non-missing arguments to min; returning Inf
4: In max(x, na.rm = na.rm) :
no non-missing arguments to max; returning -Inf

Cheers!

readSpectrum(...)

refactor rawDiag::readScans()

arguments

rawfile : path to raw file
scans : numeric vector for selection based on scan index
filter : scan filter for logical selection of scans (e.g. MS, MS2, +, HCD, ...)

returns

S3 object, nested list of type Spectrum (not peaklist)

for FTMS scans the list should contain vectors for: mz, intensity, resolution, noise, charge

In addition header information is needed:

raw file
scan#
RT (Scan Start Time)
NL (normalized level = base beak intensity)
TIC (Total Ion Current)
Base Peak Mass
Scan Mode (e.g. FTMS + c NSI Full ms [350.0000 -1800.000]
Scan Low Mass
Scan High Mass

Issue with readChromatogram when type = "xic"

I have been using this package for ~1 year now, specifically the readChromatogram function for extracting "tic" and "base peak". It works nicely and has been very useful. However, I have just for the first time tried to extract an "xic" for some masses of interest, and here is what I get:

XICs <- lapply(Raws, function(raw) { readChromatogram(raw, masses, tols) })
Error in .rawrrSystem2Source(rawfile, input = mass, rawrrArgs = sprintf("xic %f %s",  : 
  **Rcode file to parse does not exist. 'C:\Users\MyUserName\AppData\Local/R/cache/R/rawrr/rawrrassembly/rawrr.exe' failed for an unknown reason.
Please check the debug files:
	C:\Users\MyUserName\AppData\Local\Temp\2\RtmpeQSRv0\file62986016698d.stderr
	C:\Users\MyUserName\AppData\Local\Temp\2\RtmpeQSRv0\file629849cf6634.stdout
and the System Requirements
Called from: .rawrrSystem2Source(rawfile, input = mass, rawrrArgs = sprintf("xic %f %s", 
    tol, shQuote(filter)))**

This is on a Windows 2019 Server machine, using R version 4.1.0 (2021-05-18) in RStudio 1.4.1717.
File "C:/Users/MyUserName/AppData/Local/R/cache/R/rawrr/rawrrassembly/rawrr.exe" does exist, but maybe this is an issue with slashes in Windows, since the error uses inconsistently backwards (Windows) and forward (Linux) slashes? In which case, including normalizePath(..., winslash = "/") would probably be enough to fix it?

readDetectorList(...) and details

Each raw file header contains a detector list. c# methods are:

int GetInstrumentCountOfType (Device type)

Device GetInstrumentType(int index);

int InstrumentCount { get; }

see page 17 of UsingRawFileReader.

details are available through InstrumentData GetInstrumentData();

make generic plotting function for spectrum objects

Implementation of plot.rawRspectrum(...) assuming $class = "rawRspectrum"

check during package startup if mono is in path `zzz.R`

RAW file still being acquired

Hi developers,

Thanks for the really great package! Saves me so much file conversion time.

I have a question regarding viewing RAW-files during acquisition. When I load those (also after making a copy of the file to 'fix' it), I get an error that the 'RAW file still being acquired', which of course is indeed the case.

Is there any possibilty to view it anyway, like you would do in vendor software to do some quick checks? Or is critical info save at the end of the run?

> RAW_chrom <- readChromatogram(rawfile = rawfile, tol = 3, mass = masses_of_intest)
RAW file still being acquired

Thanks!

noise values for a raw file collected in reduced profile mode

It is possible to use rawrr to access the noise
values for a raw file collected in reduced profile mode?

From reading
the code in rawrr.cs, it seems that the noise is only read for
centroided data, but I wanted to be sure.

by @davidsbutcher

propose package install for Linux

e.g., sudo apt-get install mono ...

mono runtime
mono msbuild
mcs

make generic plot function for rawRchromatogram objects

should use base R

generate function that transforms centroided spectrum into sparse matrix

Basic idea

Spectra are 2D data items (x, y data):

x : position (m/z)
y : intensity

All other information can be assumed to be meta data for the moment. The most basic idea to represent this data in R is to use two numeric vectors and pair according to the vector indices, so (xi, yi) are corresponding values in the 2D space generated by the vectors.

Collections of scans that are connected by a further dimension, for instance RT, could be handled as lists of vector tuples.

L 
   | - (xi, yi)
   | - (xi, yi)
   | - (xi, yi)

But there are some problems to this: If we use the RT to generate an index for L let's use j here, than we can only select scan according to index position, but this Lj may not be equal to the original scan# nor does it allow for RT-based access.

But we could add RT as data dimension directly and arrive at:

x : position (m/z)
y : intensity
z : RT

A 3D data type for numeric is simply an array. Array have nice properties, since they can be sliced along all dimensions as needed. The only problem left to solve is: Centroided data generates vectors of unequal length, but transforming these to sparse vectors/matrixes would solve the problem.

Two steps:

Generate a function that takes the rawRspectrum object as input and returns a sparse matrix.
Concat the sparse matrices to a sparse array.

The first would also be handy if one would like to compute dot products or alike.

Low performance when reading a lot of spectra

rawrr::readSpectrum is very slow, making it unuseable to read files with 10,000s of spectra

By slow I mean it takes ~1 second on my 1 year old Macbook Pro to read a spectrum.
(I do call the function once, with list of spectrum ids.)

It would take 3 hours just to read a single file. That renders the package unuseable by some two orders of magnitude.

I will be investigating to figure out what is the culprit. It might be necessary to add switches that remove some "advanced" functionality from spectrum reads to get the performance back (?).

summary method for rawrrChromatogram

How nice would that be?

display sublicense at startup of interactive R session

The following should be displayed after loading the package:

RawFileReader reading tool. Copyright © 2016 by Thermo Fisher Scientific, Inc. All rights reserved

the function “sample()” should be rename

Hi, Thankful for your great work to support the useful R packages, But the sample() is a import function in R base. Many R script and packages base on the "sample()" function. If you can rename the example database function like "example()" . IT will help researcher use more fluently.

Thanks for you grateful work for R packages“rawR”

Extracted Ion Chromatogram

Hi everyone,
I have a problem while completing XIC graphic, here is the code:

iRT.mZ <- c(487.2571, 547.2984, 622.8539, 636.8695, 644.8230, 669.8384,
683.8282, 683.8541, 699.3388, 726.8361, 776.9301)
c<- rawrr::readChromatogram(rawfile, mass = iRT.mZ, tol = 10, type = 'xic', filter = 'ms')
#Extracted Ion Chromatogram

plot(c, diagnostic = TRUE)

The problem is --> Error in xy.coords(x, y) : 'x' and 'y' lengths differ

Can anyone help me?
Thanks all!

Spectrum scan centroid mZ, intensity and noises values do not match

Hi:
I'm using "rawrr" to read and represent 2 spectrum scans. With the first scan, I had no problems when using the plot function to represent "spectrum.scan$centroid.mZ" values on the x axis, and "spectrum.scan$centroid.intensity" and" spectrum.scan$noises" as a rate on the y axis: The plotted spectrum is the same as the expected spectrum in this case, considering the obtained results with another software to represent them. However, with the second one, the plotted image is different as the expected one. I think that this is due to that mZ, intensity and noises vectors differ in length for this second scan. How can I approach this situation? How is it possible that these vector lengths differ? In the first scan, this wasn't the case.
Thank you for your time.

Enhancement - Complete readIndex() function

Hi,

First, thank you for developping such a nice package. I've been using it for a few days and have been really amazed by it so far ! For some context, I'm a big fan of MSnbase, but it requires data conversion which can be inconvenient... One handy function of MSnbase that is partly missing (or ) in rawrr is header which gives access to a myriad of useful information in a data.frame as follows :

print(names(header(msfile)))

 [1] "seqNum"                     "acquisitionNum"            
 [3] "msLevel"                    "polarity"                  
 [5] "peaksCount"                 "totIonCurrent"             
 [7] "retentionTime"              "basePeakMZ"                
 [9] "basePeakIntensity"          "collisionEnergy"           
[11] "ionisationEnergy"           "lowMZ"                     
[13] "highMZ"                     "precursorScanNum"          
[15] "precursorMZ"                "precursorCharge"           
[17] "precursorIntensity"         "mergedScan"                
[19] "mergedResultScanNum"        "mergedResultStartScanNum"  
[21] "mergedResultEndScanNum"     "injectionTime"             
[23] "filterString"               "spectrumId"                
[25] "centroided"                 "ionMobilityDriftTime"      
[27] "isolationWindowTargetMZ"    "isolationWindowLowerOffset"
[29] "isolationWindowUpperOffset" "scanWindowLowerLimit"      
[31] "scanWindowUpperLimit"

I don't know if this information is available but readIndex() could ideally contain (some of) these data, which would allow broader application of the package (for example, "isolationWindowLowerOffset" and "isolationWindowUpperOffset" give critical information for DIA applications).

Thanks again,
Vivian

Usage of `@importFrom`

"If you are using just a few functions from another package, the recommended option is to note the package name in the Imports: field of the DESCRIPTION file and call the function(s) explicitly using ::, e.g., pkg::fun(). Alternatively, though no longer recommended due to its poorer readability, use @importFrom, e.g., @importFrom pgk fun, and call the function(s) without ::."

taken from https://roxygen2.r-lib.org/articles/namespace.html#imports

Example found in rawrr.R:

#' Plot \code{rawrrChromatogramSet} objects
#'
#' @param x A \code{rawrrChromatogramSet} object to be plotted.
#' @param ... Passes additional arguments.
#' @param diagnostic Show diagnostic legend?
#' @author Tobias Kockmann, 2020.
#' @export
#' @importFrom grDevices hcl.colors
#' @importFrom graphics lines text

and many many more!

Missing Data readSpectrum(...)

Hi cpanse and tobiakso

I've noticed that there are some parameters missing in the rawRspectrum after importing rawfile.
However not all those parameters may be set in our experiment. But I also get a wrong reading of Base Peak Intensity and Base Peak Mass.

Thanks for helping!

RawRspectrum:

> Total Ion Current:	 4870947
> Scan Low Mass:	 50
> Scan High Mass:	 250
> Scan Start Time (Min):	 0
> Scan Number:	 1
> Base Peak Intensity:	 -1
> Base Peak Mass:	 -1
> Scan Mode:	 FTMS + p NSI Full ms [50.00-250.00]
> ======= Instrument data =====   : 	
> 
> Multiple Injection: 	
> 
> Multi Inject Info: 	
> 
> AGC:	On
> Micro Scan Count:	1
> Scan Segment:	0
> Scan Event:	0
> Master Index:	0
> Charge State:	1
> Monoisotopic M/Z:	78.0468
> Ion Injection Time (ms):	100.000
> Max. Ion Time (ms): 	
> 
> FT Resolution:	30000
> MS2 Isolation Width:	0.0
> MS2 Isolation Offset: 	
> 
> AGC Target: 	
> 
> HCD Energy: 	
> 
> Analyzer Temperature: 	
> 
> === Mass Calibration: 	
> 
> Conversion Parameter B:	47557789.235
> Conversion Parameter C:	-2547049.695
> Temperature Comp. (ppm): 	
> 
> RF Comp. (ppm): 	
> 
> Space Charge Comp. (ppm): 	
> 
> Resolution Comp. (ppm): 	
> 
> Number of Lock Masses: 	
> 
> Lock Mass #1 (m/z): 	
> 
> Lock Mass #2 (m/z): 	
> 
> Lock Mass #3 (m/z): 	
> 
> LM Search Window (ppm): 	
> 
> LM Search Window (mmu): 	
> 
> Number of LM Found: 	
> 
> Last Locking (sec): 	
> 
> LM m/z-Correction (ppm): 	
> 
> === Ion Optics Settings: 	
> 
> S-Lens RF Level: 	
> 
> S-Lens Voltage (V): 	
> 
> Skimmer Voltage (V): 	
> 
> Inject Flatapole Offset (V): 	
> 
> Bent Flatapole DC (V): 	
> 
> MP2 and MP3 RF (V): 	
> 
> Gate Lens Voltage (V): 	
> 
> C-Trap RF (V): 	
> 
> ====  Diagnostic Data: 	
> 
> Dynamic RT Shift (min): 	
> 
> Intens Comp Factor: 	
> 
> Res. Dep. Intens: 	
> 
> CTCD NumF: 	
> 
> CTCD Comp: 	
> 
> CTCD ScScr: 	
> 
> RawOvFtT: 	
> 
> LC FWHM parameter: 	
> 
> Rod: 	
> 
> PS Inj. Time (ms): 	
> 
> AGC PS Mode: 	
> 
> AGC PS Diag: 	
> 
> HCD Energy eV: 	
> 
> AGC Fill: 	
> 
> Injection t0: 	
> 
> t0 FLP: 	
> 
> Access Id: 	
> 
> Analog Input 1 (V): 	
> 
> Analog Input 2 (V):

parse system2 stdout for error message

rawrr/R/rawrr.R

Line 46 in eeceea8

tfstdout <- tempfile(tmpdir=tmpdir)

package structure

Just came across this when running R CMD check:

checking for executable files ...
   Found the following executable files:
     exec/ThermoFisher.CommonCore.BackgroundSubtraction.dll
     exec/ThermoFisher.CommonCore.Data.dll
     exec/ThermoFisher.CommonCore.MassPrecisionEstimator.dll
     exec/ThermoFisher.CommonCore.RawFileReader.dll
     exec/rawR.exe
   Source packages should not contain undeclared executable files.
   See section ‘Package structure’ in the ‘Writing R Extensions’ manual.

Did that and found:

1.1.7 Non-R scripts in packages

Code which needs to be compiled (C, C++, Fortran …) is included in the src subdirectory and discussed elsewhere in this document.

Subdirectory exec could be used for scripts for interpreters such as the shell, BUGS, JavaScript, Matlab, Perl, php (amap), Python or Tcl (Simile), or even R. However, it seems more common to use the inst directory, for example WriteXLS/inst/Perl, NMF/inst/m-files, RnavGraph/inst/tcl, RProtoBuf/inst/python and emdbook/inst/BUGS and gridSVG/inst/js.

So shouldn't we put the rawR.exe and the dlls in src instead of exec @cpanse ?

Get information on gradient

Hello,

really great package! I was wondering if it was possible to also get information on the chromatography via your package? For example, getting the LC gradient or the LC pressure curve would be great!

Thank you in advance!
Yasin

print the stickers

Noise value for individual mass peaks

Hello, and thank you for your package!

I was wondering if it is somehow possible to get the Noise value that is reported for every single mass peak in Thermo's .raw files.

Thanks!

Peak charges for MS1 spectras

Dear all,
thanks for this very useful package!
Is there a way to extract the peak "charges" of MS1 spectras with rawrr::readSpectrum? It seems to only work for MS2 spectras at the moment.
Thanks a lot.

CI / GitHub actions

Should we start using CI services like GitHub actions?

https://ropensci.org/technotes/2020/11/19/moving-away-travis/

I think it is a nice way of making sure rawR works on different OS platforms and different R versions.

fgcz / rawrr Goto Github PK

rawrr's Issues

determine input

benefit

requires

see also

TEST CASE 1 - no mono runtime

TEST CASE 2 - runtime installed

TEST CASE 3 - msbuild is installed

TEST CASE 4 - msbuild is installed and MONO_PATH set

see

arguments

returns

Recommend Projects

Recommend Topics

Recommend Org

Jobs