clavellab / maldipickr Goto Github PK

View Code? Open in Web Editor NEW

2.0 1.0 0.0 7.6 MB

Dereplicate And Cherry-pick Mass Spectrometry Spectra

Home Page: https://clavellab.github.io/maldipickr/

License: GNU General Public License v3.0

R 98.55% Dockerfile 0.78% CSS 0.58% SCSS 0.09%

dereplication maldi-tof-ms cherry-pick r r-packages rstats

maldipickr's Introduction

maldipickr

You are using the MALDI-TOF¹ Biotyper to identify bacterial isolates
You want to select representative isolates for further experiments
You need fast and automated selection decisions that you can retrace

{maldipickr} provides documented and tested R functions that will help you dereplicate MALDI-TOF data and cherry-pick representative spectra of microbial isolates.

Graphical overview

Illustration (click for a bigger version) of the data flow when using {maldipickr} to cherry-pick bacterial isolates with MALDI Biotyper. It depicts the two possible approaches using either taxonomic identification reports (left) or spectra data (right).

Quickstart

How to cherry-pick bacterial isolates with MALDI Biotyper:

using taxonomic identification report
using spectra data

Using taxonomic identification report

library(maldipickr)
# Import Biotyper CSV report
#  and glimpse at the table
report_tbl <- read_biotyper_report(
  system.file("biotyper_unknown.csv", package = "maldipickr")
)
report_tbl %>%
  dplyr::select(name, bruker_species, bruker_log)
#> # A tibble: 4 × 3
#>   name              bruker_species               bruker_log
#>   <chr>             <chr>                             <dbl>
#> 1 unknown_isolate_1 not reliable identification        1.33
#> 2 unknown_isolate_2 not reliable identification        1.4 
#> 3 unknown_isolate_3 Faecalibacterium prausnitzii       1.96
#> 4 unknown_isolate_4 Faecalibacterium prausnitzii       2.07


# Delineate clusters from the identifications after filtering the reliable ones
#   and cherry-pick one representative spectra.
#   The chosen ones are indicated by `to_pick` column
report_tbl <- report_tbl %>%
  dplyr::mutate(
      bruker_species = dplyr::if_else(bruker_log >= 2, bruker_species,
                                      "not reliable identification")
  )
report_tbl %>%
  delineate_with_identification() %>%
  pick_spectra(report_tbl, criteria_column = "bruker_log") %>%
  dplyr::relocate(name, to_pick, bruker_species)
#> Generating clusters from single report
#> # A tibble: 4 × 11
#>   name       to_pick bruker_species membership cluster_size sample_name hit_rank
#>   <chr>      <lgl>   <chr>               <int>        <int> <chr>          <int>
#> 1 unknown_i… TRUE    not reliable …          2            1 <NA>               1
#> 2 unknown_i… TRUE    not reliable …          3            1 <NA>               1
#> 3 unknown_i… TRUE    not reliable …          4            1 <NA>               1
#> 4 unknown_i… TRUE    Faecalibacter…          1            1 <NA>               1
#> # ℹ 4 more variables: bruker_quality <chr>, bruker_taxid <dbl>,
#> #   bruker_hash <chr>, bruker_log <dbl>

Using spectra data

library(maldipickr)
# Set up the directory location of your spectra data
spectra_dir <- system.file("toy-species-spectra", package = "maldipickr")

# Import and process the spectra
processed <- spectra_dir %>%
  import_biotyper_spectra() %>%
  process_spectra()

# Delineate spectra clusters using Cosine similarity
#  and cherry-pick one representative spectra.
#  The chosen ones are indicated by `to_pick` column
processed %>%
  list() %>%
  merge_processed_spectra() %>%
  coop::tcosine() %>%
  delineate_with_similarity(threshold = 0.92) %>%
  set_reference_spectra(processed$metadata) %>%
  pick_spectra() %>%
  dplyr::relocate(name, to_pick)
#> # A tibble: 6 × 7
#>   name         to_pick membership cluster_size   SNR peaks is_reference
#>   <chr>        <lgl>        <int>        <int> <dbl> <int> <lgl>       
#> 1 species1_G2  FALSE            1            4  5.09    21 FALSE       
#> 2 species2_E11 FALSE            2            2  5.54    22 FALSE       
#> 3 species2_E12 TRUE             2            2  5.63    23 TRUE        
#> 4 species3_F7  FALSE            1            4  4.89    26 FALSE       
#> 5 species3_F8  TRUE             1            4  5.56    25 TRUE        
#> 6 species3_F9  FALSE            1            4  5.40    25 FALSE

Installation

{maldipickr} is available on the CRAN and on GitHub.

To install the latest CRAN release, use the following command in R:

install.packages("maldipickr")

To install the development version, use the following command in R:

remotes::install_github("ClavelLab/maldipickr", build_vignettes = TRUE)

Usage

The comprehensive vignettes will walk you through the package functions and showcase how to:

Troubleshoot

If something unexpected happened when using this package, please first search the current open or closed issues to look for similar problems. If you are the first, you are more than welcome to open a new issue using the “Bug report” template with a minimal reprex.

Contribute

All contributions are welcome and the CONTRIBUTING.md documents how to participate.

Please note that the {maldipickr} package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Credits

Acknowledgements

This R package is developed for spectra data generated by the Bruker MALDI Biotyper device. The {maldipickr} package is built from a suite of Rmarkdown files using the {fusen} package by Rochette S (2023). It relies on:

the {MALDIquant} package from Gibb & Strimmer (2012) for spectra functions
the work of Strejcek et al. (2018) for the dereplication procedure.

Disclaimer

The developers of this package are part of the Clavel Lab and are not affiliated with the company Bruker, therefore this package is independent of the company and is distributed under the GPL-3.0 License.

The hexagonal logo was created by Charlie Pauvert and uses the Atkinson Hyperlegible font font and a color palette generated at coolors.co.

References

Gibb S & Strimmer K (2012). “MALDIquant: a versatile R package for the analysis of mass spectrometry data”. Bioinformatics 28, 2270-2271. https://doi.org/10.1093/bioinformatics/bts447.
Rochette S (2023). “fusen: Build a Package from Rmarkdown Files”. https://thinkr-open.github.io/fusen/, https://github.com/Thinkr-open/fusen.
Strejcek M, Smrhova T, Junkova P & Uhlik O (2018). “Whole-Cell MALDI-TOF MS versus 16S rRNA Gene Analysis for Identification and Dereplication of Recurrent Bacterial Isolates.” Frontiers in Microbiology 9 https://doi.org/10.3389/fmicb.2018.01294.

Matrix-Assisted Laser Desorption/Ionization-Time-Of-Flight (MALDI-TOF) ↩

maldipickr's People

Contributors

Stargazers

Watchers

maldipickr's Issues

error non-unique values when setting 'row.names' in `process_spectra()`

this error can happen when multiple spectra have the same name, which should not happen with proper research data input but can arise during data analyses.

At the moment, the function stops. An improvement would be to exit gracefully at least, or provide a solution (would make.unique() be possible or too intrusive?)

process_spectra.R

Prepare for a CRAN submission

Resources to read:

Issues to solve:

add Strejcek paper in DESCRIPTION

Add a CHANGELOG

https://common-changelog.org/

Add the picking function

read in mzML files instead of fid/acqu proprietary format

File format conversion can be made with the compass software, see rformassspectrometry/RforMassSpectrometry.org#18 (comment)

use a standardised format to store the raw spectra and the processed spectra

Currently, the spectra and peaks are stored as RDS files containing MassSpectrum or MassPeaks R objects. These objects are metadata-rich which is fundamental.

There is indeed a couple of standard file format (e.g., mzML) to which these objects can be exported using MALDIquantForeign, but the metadata are loss in the process..

silent fail when metadata did not merge with cluster_df in the picking function

deprecate `rds_prefix` option in `process_spectra()` because of `{targets}`

does not necessary make sense if the workflow in handled by {targets}
could be better file format alternative (e.g., qs)
Look at: https://lifecycle.r-lib.org/articles/communicate.html to do it properly, (EDIT: more specifically for arguments: https://lifecycle.r-lib.org/articles/communicate.html#deprecating-an-argument-providing-a-new-default)
will get rid of untested sections of the function (https://app.codecov.io/github/ClavelLab/maldipickr/blob/main/R%2Fprocess_spectra.R) as the CRAN is not happy when writing to disk during tests.

add function to gather spectra checks stats

Is your feature request related to a problem? Please describe.
I'm always frustrated when I need to compute how many raw spectra were analyzed, how many valid, and why they were rejected.

Describe the solution you'd like
This function below is an untested, undocumented attempt:

gather_spectra_stats <- function(check_vectors){
  # check_vectors from maldipickr::check_spectra
  # src: https://stackoverflow.com/a/51140480/21085566
  aggregated_checks <- Reduce(`|`, check_vectors)
  check_stats <- vapply(check_vectors, sum, FUN.VALUE = integer(1)) %>%
    tibble::as_tibble_row()
  tibble::tibble(
    "n_spectra" = length(aggregated_checks),
    "n_valid_spectra" = n_spectra - sum(aggregated_checks)
  ) %>%
    dplyr::bind_cols(check_stats) %>% 
    return()
}

Find connected components without `{igraph}` or `{tidygraph}`

The packages {igraph} and {tidygraph} are conveniently imported for the dereplication but are only used to detect connected components.

There should be an easy way to recode an equivalent of the connected components algorithm, seeing as we only use limited options there.

This would certainly drastically reduce the number of dependencies!

Release maldipickr 1.2.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

Implicit and undocumented use of single-linkage clustering

The clustering approach introduced in v1.1.0 implicitly use the single-linkage clustering (meaning friend-of-a-friends). This typically results in clusters in chain. The minimum similarity within the cluster in not controlled.

set a minimum R version

decide on a strategy to set a minimum R version. Currently, it implicitly uses the R version used for developing.

https://blog.r-hub.io/2022/09/12/r-dependency/

Remove the CHANGELOG and stick to the R practice of NEWS

output the `read_biotyper_report()` in wide AND long format

pick_spectra.R

move the Quickstart to a specific vignette to have a "Get started" page generated for the website

I think this would improve readability. The README can become a simple .md file instead of Rmd

Source: https://pkgdown.r-lib.org/articles/customise.html?q=Get%20starte#navbar-heading

picking function using the biotyper report

will need to render all "no reliable identification" unique though.

codecov coverage not updated despite GitHub Actions CI/CD success

[2023-10-24T15:35:03.710Z] ['error'] There was an error running the uploader:
Error uploading to https://codecov.io
Error: There was an error fetching the storage URL during POST:
404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API.
Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}

https://github.com/ClavelLab/maldipickr/actions/runs/6629129466/job/18007735293#step:8:39

Add additional tests

Improve coverage of functions: https://app.codecov.io/github/ClavelLab/maldipickr/tree/main/R
especially key functions:

bump Node.js version to 20 for Github Actions

Describe the bug
Current Github Actions uses Node.js 16 which is deprecated in favor of 20.

To Reproduce
Look up the Actions tab for warnings

Expected behavior
No warnings.

Solutions

Update:

actions/checkout@v3
JamesIves/[email protected]
codecov/codecov-action@v3

Screenshots

add test to the symlink portion of `import_biotyper_spectra`

add a quickstart section in README with examples

implement `{maldipickr}` using the RforMassSpectrometry improved codebase

MALDIquant and readBrukerFlexData both require a lot of maintenance
the packages at RforMassSpectrometry have an improved backend
implementing maldipickr with it could be relevant for long-term
but still needs to import some of the MALDIquant routines rformassspectrometry/MsCoreUtils#119

`reframe()` instead of `summarise()` when dplyr > 1.1.0

In the example of pick_spectra(), reproduced below:

# 4.2 Pick the spectra from clusters without spectra
#   labelled as `picked_before` (hard masking).
pick_spectra(clusters, metadata, "OD600",
  hard_mask_column = "picked_before"
)

A warning advocating for the replacement of summarise() by reframe() :

#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the maldipickr package.
#>   Please report the issue at <https://github.com/ClavelLab/maldipickr/issues>.

but current dplyr version used is 1.0.10

Document the dataset used in the examples

Write a walkthrough in the vignette

missing filtering on bruker log score for identification-based delineation

in the README quickstart example

Use temporary dirs and files for examples

rename functions to `verb_concept()`: e.g. `similarity_to_clusters()`

two functions do not comply at the moment:

similarity_to_clusters()
identification_to_clusters()

could be unified to:

delineate_clusters()

with two internal functions that are not exported?

Set up the pkgdown website

improve examples on merging different runs `merge_processed_spectra`

Release maldipickr 1.3.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

merge_processed_spectra.R

R code too wide in the vignettes

Refactor vignettes using diataxis principles

A possible option to reduce the length of the vignettes is to split into 3 vignettes:

Import data
Process data
Cherry-pick

While examples are part of the vignettes, additional examples could be added that would not make it to the examples section of the function but "just" in the vignettes.

Make sure to distinguish tutorials from how-to using https://diataxis.fr/

Release maldipickr 1.1.1

First release:

usethis::use_cran_comments()
Update (aspirational) install instructions in README
Proofread Title: and Description:
Check that all exported functions have @return and @examples
Check that Authors@R: includes a copyright holder (role 'cph')
Check licensing of included files
Review https://github.com/DavisVaughan/extrachecks

Prepare for release:

Submit to CRAN:

usethis::use_version('major')
devtools::submit_cran()
Approve email

Wait for CRAN...

warning in `pick_spectra` example with summarise/reframe when using the dry-run

pick_spectra

pick_spectra(clusters, discard_regex = "E11", only_show_discarded = TRUE)

#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the maldipickr package.
#>   Please report the issue at <https://github.com/ClavelLab/maldipickr/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

example_df %>% dplyr::relocate(name, to_pick)

improve handling of empty spectra

Consider adding a remove_empty() function to clean up the spectra, the peaks and the metadata files from the object returned by process_spectra().

Could be coupled with check_spectra() to flag the elements of the list.

clavellab / maldipickr Goto Github PK

maldipickr's Introduction

maldipickr

Graphical overview

Quickstart

Using taxonomic identification report

Using spectra data

Installation

Usage

Troubleshoot

Contribute

Credits

Acknowledgements

Disclaimer

References

Footnotes

maldipickr's People

Contributors

Stargazers

Watchers

maldipickr's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs