GithubHelp home page GithubHelp logo

idslme / idsl.ipa Goto Github PK

View Code? Open in Web Editor NEW
14.0 4.0 1.0 17.48 MB

Intrinsic Peak Analysis (IPA) pipeline for peak-picking in large-scale untargeted small molecule analysis including metabolomics, lipidomics, exposomics, and environmental studies.

R 100.00%
metabolomics lipidomics exposome mass-spectrometry feature-detection peak-detection small-molecule cran r untargeted-metabolomics

idsl.ipa's Introduction

IDSL.IPA

Developed-by CRAN status Dependencies

Intrinsic Peak Analysis (IPA) by the Integrated Data Science Laboratory for Metabolomics and Exposomics (IDSL.ME) is a light-weight R package that extracts peaks for organic small molecules from untargeted liquid chromatography high resolution mass spectrometry (LC/HRMS) data in population scale projects. IDSL.IPA is a suite of new algorithms covering extracted ion chromatogram (EIC) candidate generation, peak detection, peak property evaluation, recursive mass correction, retention time correction across multiple batches and peak annotation. IDSL.IPA generates comprehensive and high-quality datasets from untargeted analysis of organic small molecules for population-size studies. We have shown in our publication that IDSL.IPA is able to outperform similar peak picking tools such as MZmine 2, xcms, and MS-DIAL in terms of sensitivity, specificity and speed.

Table of Contents

Features of IDSL.IPA

  1. Parameter selection through a user-friendly and well-described parameter spreadsheet
  2. Analyzing population size untargeted studies (n > 500)
  3. Calculating 19 chromatographic peak properties such as peak area, nIsoPair, RCS, cumulated intensity, R13C, peak width, RPW, number of separation trays, asymmetry factor, USP tailing factor, skewness using derivative method, symmetry using pseudo-moments, skewness using pseudo-moments, gaussianity, S/N using baseline, S/N using the xcms method, S/N using the RMS method, and sharpness.
  4. Mass spectra scan level Ion Pairing to only select relevant ions
  5. Flexibility to screen for any ion mass difference in addition to natural carbon signatures (12C/13C isotopologues) mass difference
  6. Retention time correction using endogenous reference markers for multi-batch large scale studies
  7. Generating batch untargeted extracted ion chromatograms (EICs)
  8. Generating pairwise correlations list for aligned peak height and its gap-filled tables to detect potential recurring adducts, in-source products and fragment peaks
  9. Aggregating untargeted EICs after (m/z-RT) annotation for each compound
  10. Parallel processing in Windows and Linux environments
  11. Integration with molecular formula annotation tools IDSL.UFA and IDSL.UFAx
  12. Integration with IDSL.CSA workflow to cluster recurring ions to generate composite spectra

Installation

install.packages("IDSL.IPA")

Note: In case you want to process netCDF/CDF mass spectrometry data by IDSL.IPA, you should also install the RnetCDF package separately using the below command.

install.packages("RNetCDF")

Workflow

To process your mass spectrometry data (mzXML, mzML, netCDF), download the IPA parameter spreadsheet and select the parameters accordingly and then use this spreadsheet as the input for the IPA_workflow function as shown below:

library(IDSL.IPA)
IPA_workflow("Address of the IPA parameter spreadsheet")

Quick Batch Example

Follow these steps for a quick case study (n = 33) ST002263 which has Thermo Q Exactive HF hybrid Orbitrap data collected in the HILIC-ESI-POS/NEG modes.

  1. Download the file "ST002263_Rawdata.zip (1.6G)"

  2. Separate the positive and negative modes .mzXML data into different folders and process each mode separately. Positive and negative modes data must be processed separately.

  3. Open the IPA parameter spreadsheet and use default values for all parameters, except:

    3.1. Input data location: In PARAM0007, specify the location of the MS1 level HRMS data.

    3.2. Output location: In PARAM0010, specify the location where you want the processed data to be stored.

    3.3. Number of processing threads: In PARAM0006, increase the number of processing threads based on your computer's computational power.

  4. Open the R/Rstudio console or terminal and run the following command:

library(IDSL.IPA)
IPA_workflow("Address of the IPA parameter spreadsheet")
  1. The results will be available in the output location specified in PARAM0010 and will include:

    5.1. Individual peaklists for each HRMS file in .Rdata and .csv formats in the "peaklists" directory.

    5.2. Peak alignment tables in the "peak_alignment" directory.

    5.3. (Optional) untargeted EICs in the "IPA_EIC" directory for each HRMS file, if selected YES in PARAM0009.

  1. Example of a population size study with 499 individual mass spectrometry file
  2. IPA_targeted function for a large number of peaks (m/z-RT pairs) with an example for targeted IDSL.IPA
  3. Ion Pairing
  4. Definition of Signal to Noise ratio (S/N)
  5. nIsoPair/RCS
  6. Ratio of peak width at half-height to peak width at the baseline (RPW)
  7. Chromatogram gap percentage
  8. Peak tailing fronting resolving method
  9. Peak smoothing
  10. Extra scans
  11. Retention time correction.

News and Updates

We post major changes in the IDSL.IPA workflow here.

Citation

[1] Fakouri Baygi, S., Kumar, Y. Barupal, D.K. IDSL. IPA characterizes the organic chemical space in untargeted LC/HRMS datasets. Journal of proteome research, 2022, 21(6), 1485-1494.

idsl.ipa's People

Contributors

barupal avatar sajfb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

rnaimehaom

idsl.ipa's Issues

IDSL.IPA computations successful but workflow stopped due to an error

Hello,

I was able to successfully run IDSL.IPA (I believe so anyhow) on my 12 mzML files, however the command ended due to an error. Here's the end of the log

Completed gap-filling!
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
The required processing time was `1.21609768807888 hours`
2023-03-23 22:33:11 Etc/UTC


Completed IDSL.IPA computations successfully!
===================================================================================================
Error in base::try(ms1, silent = TRUE) : object 'ms1' not found

Stopped IDSL.IPA workflow!

Should the error be disregarded?

Also, which output in the peak_alignment folder should be used as the peak table? peak_R13C_gapfilled.csv?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.