The maldipickr's discuss from clavellab

silent fail when metadata did not merge with cluster_df in the picking function

use a standardised format to store the raw spectra and the processed spectra

Currently, the spectra and peaks are stored as RDS files containing MassSpectrum or MassPeaks R objects. These objects are metadata-rich which is fundamental.

There is indeed a couple of standard file format (e.g., mzML) to which these objects can be exported using MALDIquantForeign, but the metadata are loss in the process..

missing filtering on bruker log score for identification-based delineation

in the README quickstart example

Release maldipickr 1.3.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

Add a function to decide whether the spectra is a reference spectra

needed before #4

codecov coverage not updated despite GitHub Actions CI/CD success

[2023-10-24T15:35:03.710Z] ['error'] There was an error running the uploader:
Error uploading to https://codecov.io
Error: There was an error fetching the storage URL during POST:
404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API.
Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}

https://github.com/ClavelLab/maldipickr/actions/runs/6629129466/job/18007735293#step:8:39

picking function using the biotyper report

will need to render all "no reliable identification" unique though.

error non-unique values when setting 'row.names' in `process_spectra()`

this error can happen when multiple spectra have the same name, which should not happen with proper research data input but can arise during data analyses.

At the moment, the function stops. An improvement would be to exit gracefully at least, or provide a solution (would make.unique() be possible or too intrusive?)

`to_pick` column is masked in the vignette

The example tables in the vignette are limited in width and some relevant columns are masked.

Could be solved by adding a dplyr::relocate() to some of the examples

example_df %>% dplyr::relocate(name, to_pick)

Refactor vignettes using diataxis principles

A possible option to reduce the length of the vignettes is to split into 3 vignettes:

Import data
Process data
Cherry-pick

While examples are part of the vignettes, additional examples could be added that would not make it to the examples section of the function but "just" in the vignettes.

Make sure to distinguish tutorials from how-to using https://diataxis.fr/

add logo in svg in man/ to be able to edit

add test to the symlink portion of `import_biotyper_spectra`

improve examples on merging different runs `merge_processed_spectra`

add Strejcek paper in DESCRIPTION

Add additional tests

Improve coverage of functions: https://app.codecov.io/github/ClavelLab/maldipickr/tree/main/R
especially key functions:

bump Node.js version to 20 for Github Actions

Describe the bug
Current Github Actions uses Node.js 16 which is deprecated in favor of 20.

To Reproduce
Look up the Actions tab for warnings

Expected behavior
No warnings.

Solutions

Update:

actions/checkout@v3
JamesIves/[email protected]
codecov/codecov-action@v3

Screenshots

Use temporary dirs and files for examples

add a quickstart section in README with examples

read in mzML files instead of fid/acqu proprietary format

File format conversion can be made with the compass software, see rformassspectrometry/RforMassSpectrometry.org#18 (comment)

set a minimum R version

decide on a strategy to set a minimum R version. Currently, it implicitly uses the R version used for developing.

https://blog.r-hub.io/2022/09/12/r-dependency/

Remove the CHANGELOG and stick to the R practice of NEWS

improve handling of empty spectra

Consider adding a remove_empty() function to clean up the spectra, the peaks and the metadata files from the object returned by process_spectra().

Could be coupled with check_spectra() to flag the elements of the list.

add function to gather spectra checks stats

Is your feature request related to a problem? Please describe.
I'm always frustrated when I need to compute how many raw spectra were analyzed, how many valid, and why they were rejected.

Describe the solution you'd like
This function below is an untested, undocumented attempt:

gather_spectra_stats <- function(check_vectors){
  # check_vectors from maldipickr::check_spectra
  # src: https://stackoverflow.com/a/51140480/21085566
  aggregated_checks <- Reduce(`|`, check_vectors)
  check_stats <- vapply(check_vectors, sum, FUN.VALUE = integer(1)) %>%
    tibble::as_tibble_row()
  tibble::tibble(
    "n_spectra" = length(aggregated_checks),
    "n_valid_spectra" = n_spectra - sum(aggregated_checks)
  ) %>%
    dplyr::bind_cols(check_stats) %>% 
    return()
}

output the `read_biotyper_report()` in wide AND long format

`reframe()` instead of `summarise()` when dplyr > 1.1.0

In the example of pick_spectra(), reproduced below:

# 4.2 Pick the spectra from clusters without spectra
#   labelled as `picked_before` (hard masking).
pick_spectra(clusters, metadata, "OD600",
  hard_mask_column = "picked_before"
)

A warning advocating for the replacement of summarise() by reframe() :

#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the maldipickr package.
#>   Please report the issue at <https://github.com/ClavelLab/maldipickr/issues>.

but current dplyr version used is 1.0.10

Document the dataset used in the examples

Release maldipickr 1.1.1

First release:

usethis::use_cran_comments()
Update (aspirational) install instructions in README
Proofread Title: and Description:
Check that all exported functions have @return and @examples
Check that Authors@R: includes a copyright holder (role 'cph')
Check licensing of included files
Review https://github.com/DavisVaughan/extrachecks

Prepare for release:

Submit to CRAN:

usethis::use_version('major')
devtools::submit_cran()
Approve email

Wait for CRAN...

implement `{maldipickr}` using the RforMassSpectrometry improved codebase

MALDIquant and readBrukerFlexData both require a lot of maintenance
the packages at RforMassSpectrometry have an improved backend
implementing maldipickr with it could be relevant for long-term
but still needs to import some of the MALDIquant routines rformassspectrometry/MsCoreUtils#119

move the Quickstart to a specific vignette to have a "Get started" page generated for the website

I think this would improve readability. The README can become a simple .md file instead of Rmd

Source: https://pkgdown.r-lib.org/articles/customise.html?q=Get%20starte#navbar-heading

Implicit and undocumented use of single-linkage clustering

The clustering approach introduced in v1.1.0 implicitly use the single-linkage clustering (meaning friend-of-a-friends). This typically results in clusters in chain. The minimum similarity within the cluster in not controlled.

deprecate `rds_prefix` option in `process_spectra()` because of `{targets}`

does not necessary make sense if the workflow in handled by {targets}
could be better file format alternative (e.g., qs)
Look at: https://lifecycle.r-lib.org/articles/communicate.html to do it properly, (EDIT: more specifically for arguments: https://lifecycle.r-lib.org/articles/communicate.html#deprecating-an-argument-providing-a-new-default)
will get rid of untested sections of the function (https://app.codecov.io/github/ClavelLab/maldipickr/blob/main/R%2Fprocess_spectra.R) as the CRAN is not happy when writing to disk during tests.

Find connected components without `{igraph}` or `{tidygraph}`

The packages {igraph} and {tidygraph} are conveniently imported for the dereplication but are only used to detect connected components.

There should be an easy way to recode an equivalent of the connected components algorithm, seeing as we only use limited options there.

This would certainly drastically reduce the number of dependencies!

Release maldipickr 1.2.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

rename functions to `verb_concept()`: e.g. `similarity_to_clusters()`

two functions do not comply at the moment:

similarity_to_clusters()
identification_to_clusters()

could be unified to:

delineate_clusters()

with two internal functions that are not exported?

pick_spectra(clusters, discard_regex = "E11", only_show_discarded = TRUE)

#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the maldipickr package.
#>   Please report the issue at <https://github.com/ClavelLab/maldipickr/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

clavellab / maldipickr Goto Github PK

maldipickr's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs