GithubHelp home page GithubHelp logo

idslme / idsl.csa Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 0.0 7.21 MB

Composite Spectra Analysis

License: MIT License

R 100.00%
dda dia csa data-independent-acquisition mass-spectrometry spectral-entropy composite-spectra-analysis data-dependent-acquisition small-molecule r

idsl.csa's Introduction

IDSL.CSA

Developed-by CRAN status Dependencies

The Composite Spectra Analysis (IDSL.CSA) R package for the analysis of mass spectrometry data has been developed by the Integrated Data Science Laboratory for Metabolomics and Exposomics (IDSL.ME). This package can be used for the deconvolution of fragmentation spectra obtained through various analytical methods such as MS1-only Composite Spectra deconvolution Analysis (CSA), Data Dependent Acquisition (DDA), and a various Data-Independent Acquisition (DIA) methods including MSE, All-Ion Fragmentation (AIF), and SWATH-MS analyses. The aim of the IDSL.CSA package is to assist in streamlining the data analysis process and improving the overall chemical structure annotation in the fields of metabolomics and exposomics.

Table of Contents

Features of IDSL.CSA

  1. Parameter selection through a user-friendly and well-described parameter spreadsheet
  2. Peak detection and chromatogram deconvolution for various fragmentation data analyses including Composite Spectra Analysis (CSA), Data Dependent Acquisition (DDA), and Data-Independent Acquisition (DIA)
  3. Analyzing population size untargeted studies (n > 500)
  4. Aggregating annotated chemical structures on the aligned peak table using meta-variables such as InChIKey, SMILES, precursor type, molecular formula,... depending on the information in the reference library. This is a very unique feature that is only presented by IDSL.CSA. To familiarize with this statistical mass spectrometry feature, try PARAM0006 in the Start tab in the IDSL.CSA parameter spreadsheet.
  5. Generating batch untargeted aligned extracted ion chromatograms (EIC) figures for the DIA and CSA analyses in addition to generating batch DDA spectra figures.
  6. Parallel processing in Windows and Linux environments
  7. Integration with IDSL.FSA workflow to annotate various types of MSP files and generating fragmentation libraries.

Installation

install.packages("IDSL.CSA")

Workflow

Prior to processing your mass spectrometry data (mzXML, mzML, netCDF) using the IDSL.CSA workflow, mass spectrometry data should be processed using the IDSL.IPA workflow to acquire chromatographic information of the peaks (m/z-RT). When the chromatographic information of individual and aggregated aligned peaklists were generated using the IDSL.IPA workflow, download the IDSL.CSA parameter spreadsheet and select the parameters accordingly and then use this spreadsheet as the input for the IDSL.CSA workflow:

library(IDSL.CSA)
IDSL.CSA_workflow("Address of the CSA parameter spreadsheet")

Quick Batch Example

Follow these steps for a quick case study (n = 33) ST002263 which has Thermo Q Exactive HF hybrid Orbitrap data collected in the HILIC-ESI-POS/NEG modes.

  1. Process raw mass spectrometry data and chromatographic information using the method described for IDSL.IPA

  2. The Composite Spectra Analysis requires 39 parameters distributed into 5 separate sections for a full scale analysis. For this study, use default parameter values presented in the IDSL.CSA parameter spreadsheet. Next, provide information for

    2.1. Select YES for PARAM0001 in the Start tab to only process CSA workflow.

    2.2. CSA0005 for HRMS data location address (MS1 level HRMS data)

    2.3. CSA0008 for Address of the peaklists directory generated by the IDSL.IPA workflow

    2.4. CSA0009 for Address of the peak_alignment directory generated by the IDSL.IPA workflow

    2.5. CSA0011 for Output location (.msp files and EICs)

    2.6. You may also increase the number of processing threads using CSA0004 according to your computational power

  3. Run this command in R/Rstudio console or terminal:

library(IDSL.CSA)
IDSL.CSA_workflow("Address of the CSA parameter spreadsheet")
  1. You may parse the results at the address you provided for CSA0011.

    4.1. CSA_MSP includes .msp file

    4.2. CSA_adduct_annotation includes peaklists with potential adduct information

    4.3. peak_alignment_subset includes subsets of aligned peak tables for the major ions in each CSA cluster

    4.4. aligned_spectra_table includes information for the CSA aggregation on the aligned table

  1. CSA analysis by IDSL.CSA
  2. DDA analysis by IDSL.CSA
  3. DIA analysis by IDSL.CSA
  4. Unique spectra aggregation

Citation

[1] Fakouri Baygi, S., Kumar, Y. Barupal, D.K. IDSL.CSA: Composite Spectra Analysis for Chemical Annotation of Untargeted Metabolomics Datasets. Analytical Chemistry, 2023, 95(25), 9480–9487.

[2] Fakouri Baygi, S., Kumar, Y. Barupal, D.K. IDSL. IPA characterizes the organic chemical space in untargeted LC/HRMS datasets. Journal of proteome research, 2022, 21(6), 1485-1494.

idsl.csa's People

Contributors

barupal avatar sajfb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

idsl.csa's Issues

Problems running IDSL.CSA

Hi,

I've been able to get IDSL.IPA to run successfully and as a next step tried to get IDSL.CSA and IDSL.CSA with DIA option to run. However both workflows were unsuccessful.

The error logs were vague in pointing out what exactly the errors were, apart from problems with the input mzMLs. I'll copy them below:

===================================================================================================
mzML/mzXML/netCDF: ~/bolus_i_raw
OUTPUT: ~/bolus_i_ipa/csa_results
---------------------------------------------------------------------------------------------------
Initiated CSA workflow!
2023-03-27 17:08:27 Etc/UTC


CSA0001	yes
CSA0002	no
CSA0003	yes
CSA0004	12
CSA0005	~/bolus_i_raw
CSA0006	NA
CSA0007	All
CSA0008	~/bolus_i_ipa/peaklists
CSA0009	~/bolus_i_ipa/peak_alignment
CSA0010	no
CSA0011	~/bolus_i_ipa/csa_results
CSA0012	alignedtable
CSA0013	0.05
CSA0014	10
CSA0015	12
CSA0016	0.01
CSA0017	100
CSA0018	100
CSA0019	90
CSA0020	0
CSA0021	2
CSA0022	0.9
CSA0023	Name
CSA0024	NO
CSA0025	0.01
CSA0026	0.05
CSA0027	10
CSA0028	TRUE
CSA0029	0.9
CSA0030	0.05
CSA0031	5
CSA0032	3
CSA0033	0.1
CSA0034	0.5
CSA0035	no
CSA0036	0.01
CSA0037	TRUE
CSA0038	0.6
CSA0039	NA
---------------------------------------------------------------------------------------------------
Initiated Composite Spectra Analysis (CSA) by grouping IDSL.IPA peaks on individual peaklists using aligned peak height table correlations!
Initiated subsetting the `alignedPeakHeightTableCorrelationList.Rdata` for each sample! Temporary subsetted data are stored in the `~/bolus_i_ipa/csa_results` folder!
Completed subsetting the `alignedPeakHeightTableCorrelationList.Rdata`!
Individual `.msp` files are stored in the `CSA_MSP` folder!
Individual adduct annotated IDSL.IPA peaklists are stored in `.Rdata` and `.csv` formats in the `CSA_adduct_annotation` folder!
Problem with `20221002_NUAHyy_20220405_SMic_B3_HILIC_18.mzML`!
Problem with `20221002_NUAHyy_20220405_SMic_B3_HILIC_08.mzML`!
Problem with `20221002_NUAHyy_20220405_SMic_B3_HILIC_10.mzML`!
Problem with `20221002_NUAHyy_20220405_SMic_B3_HILIC_04.mzML`!
Problem with `20221002_NUAHyy_20220405_SMic_B3_HILIC_16.mzML`!
Problem with `20221002_NUAHyy_20220405_SMic_B3_HILIC_02.mzML`!
Problem with `20221002_NUAHyy_20220405_SMic_B3_HILIC_12.mzML`!
Problem with `20221002_NUAHyy_20220405_SMic_B3_HILIC_06_r.mzML`!
Problem with `20221002_NUAHyy_20220405_SMic_B3_HILIC_14.mzML`!
Problem with `20221002_NUAHyy_20220405_SMic_B3_HILIC_20.mzML`!
No `.msp` file was detected!
---------------------------------------------------------------------------------------------------
The required processing time was `3.09229907194773 mins`
2023-03-27 17:11:33 Etc/UTC


Completed the CSA analysis successfully!
===================================================================================================
===================================================================================================
mzML/mzXML/netCDF:  ~/bolus_i_raw
OUTPUT:  ~/bolus_i_ipa/csa_results
---------------------------------------------------------------------------------------------------
Initiated DIA workflow!
2023-03-27 18:16:39 Etc/UTC


DIA0001	yes
DIA0002	yes
DIA0003	12
DIA0004	~/bolus_i_raw
DIA0005	NA
DIA0006	All
DIA0007	~/bolus_i_ipa/peaklists
DIA0008	~/bolus_i_ipa/peak_alignment
DIA0009	All
DIA0010	no
DIA0011	~/bolus_i_ipa/csa_results
DIA0012	samplemode
DIA0014	400
DIA0015	12
DIA0016	12
DIA0017	0.01
DIA0018	100
DIA0019	100
DIA0020	90
DIA0021	0.9
DIA0022	Name
DIA0023	no
DIA0024	0.01
DIA0025	0.05
DIA0026	10
DIA0027	TRUE
DIA0028	0.9
DIA0013	2
---------------------------------------------------------------------------------------------------
Initiated Data Independent Acquisition (DIA) analysis at ms level = `2` on individual IDSL.IPA peaklists using raw spectra!
Individual `.msp` files are stored in the `DIA_MSP` folder!
No peak was detected for `20221002_NUAHyy_20220405_SMic_B3_HILIC_14.mzML`!
No peak was detected for `20221002_NUAHyy_20220405_SMic_B3_HILIC_10.mzML`!
No peak was detected for `20221002_NUAHyy_20220405_SMic_B3_HILIC_08.mzML`!
No peak was detected for `20221002_NUAHyy_20220405_SMic_B3_HILIC_06_r.mzML`!
No peak was detected for `20221002_NUAHyy_20220405_SMic_B3_HILIC_18.mzML`!
No peak was detected for `20221002_NUAHyy_20220405_SMic_B3_HILIC_12.mzML`!
No peak was detected for `20221002_NUAHyy_20220405_SMic_B3_HILIC_04.mzML`!
No peak was detected for `20221002_NUAHyy_20220405_SMic_B3_HILIC_16.mzML`!
No peak was detected for `20221002_NUAHyy_20220405_SMic_B3_HILIC_02.mzML`!
Initiated detecting unique DIA variants!
Problem with loading .msp file --> `NA`!
Problem with loading .msp file --> `DIA_MSP`!
Error in NumPeaks_PrecursorMZ_SpectralEntropy[, 2] : 
  subscript out of bounds

Stopped IDSL.CSA workflow!

Could you please help to pinpoint what went wrong? Please let me know if you need the input excels.

Thank you!

Website not live yet

Website: https://csa.idsl.me is not active yet?
Also the info has some weird characters?

> library(IDSL.CSA)
> IDSL.CSA_workflow("Address of the CSA parameter spreadsheet")

�[0;92mInitiated testing the IDSL.CSA workflow spreadsheet consistency!�[0m
�[0;91mThe IDSL.CSA workflow spreadsheet not found! It should be an Excel file with .xlsx extention!�[0m
�[0;91mPlease visit    https://csa.idsl.me    for instructions!!!�[0m
>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.