clavellab / maldipickr Goto Github PK
View Code? Open in Web Editor NEWDereplicate And Cherry-pick Mass Spectrometry Spectra
Home Page: https://clavellab.github.io/maldipickr/
License: GNU General Public License v3.0
Dereplicate And Cherry-pick Mass Spectrometry Spectra
Home Page: https://clavellab.github.io/maldipickr/
License: GNU General Public License v3.0
Currently, the spectra and peaks are stored as RDS files containing MassSpectrum
or MassPeaks
R objects. These objects are metadata-rich which is fundamental.
There is indeed a couple of standard file format (e.g., mzML
) to which these objects can be exported using MALDIquantForeign
, but the metadata are loss in the process..
in the README quickstart example
Prepare for release:
git pull
urlchecker::url_check()
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version(push = TRUE)
needed before #4
[2023-10-24T15:35:03.710Z] ['error'] There was an error running the uploader:
Error uploading to https://codecov.io
Error: There was an error fetching the storage URL during POST:
404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API.
Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}
https://github.com/ClavelLab/maldipickr/actions/runs/6629129466/job/18007735293#step:8:39
will need to render all "no reliable identification" unique though.
this error can happen when multiple spectra have the same name, which should not happen with proper research data input but can arise during data analyses.
At the moment, the function stops. An improvement would be to exit gracefully at least, or provide a solution (would make.unique()
be possible or too intrusive?)
The example tables in the vignette are limited in width and some relevant columns are masked.
Could be solved by adding a dplyr::relocate()
to some of the examples
example_df %>% dplyr::relocate(name, to_pick)
A possible option to reduce the length of the vignettes is to split into 3 vignettes:
While examples are part of the vignettes, additional examples could be added that would not make it to the examples section of the function but "just" in the vignettes.
Make sure to distinguish tutorials from how-to using https://diataxis.fr/
Improve coverage of functions: https://app.codecov.io/github/ClavelLab/maldipickr/tree/main/R
especially key functions:
Describe the bug
Current Github Actions uses Node.js 16 which is deprecated in favor of 20.
To Reproduce
Look up the Actions tab for warnings
Expected behavior
No warnings.
Solutions
Update:
File format conversion can be made with the compass software, see rformassspectrometry/RforMassSpectrometry.org#18 (comment)
decide on a strategy to set a minimum R version. Currently, it implicitly uses the R version used for developing.
Resources to read:
Issues to solve:
codecov
inst
(consider building the CITATION.cff
from {cffr}
https://docs.ropensci.org/cffr/ ?)rhub
https://r-hub.github.io/rhubConsider adding a remove_empty()
function to clean up the spectra, the peaks and the metadata files from the object returned by process_spectra()
.
Could be coupled with check_spectra()
to flag the elements of the list.
Is your feature request related to a problem? Please describe.
I'm always frustrated when I need to compute how many raw spectra were analyzed, how many valid, and why they were rejected.
Describe the solution you'd like
This function below is an untested, undocumented attempt:
gather_spectra_stats <- function(check_vectors){
# check_vectors from maldipickr::check_spectra
# src: https://stackoverflow.com/a/51140480/21085566
aggregated_checks <- Reduce(`|`, check_vectors)
check_stats <- vapply(check_vectors, sum, FUN.VALUE = integer(1)) %>%
tibble::as_tibble_row()
tibble::tibble(
"n_spectra" = length(aggregated_checks),
"n_valid_spectra" = n_spectra - sum(aggregated_checks)
) %>%
dplyr::bind_cols(check_stats) %>%
return()
}
In the example of pick_spectra()
, reproduced below:
# 4.2 Pick the spectra from clusters without spectra
# labelled as `picked_before` (hard masking).
pick_spectra(clusters, metadata, "OD600",
hard_mask_column = "picked_before"
)
A warning advocating for the replacement of summarise()
by reframe()
:
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#> always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the maldipickr package.
#> Please report the issue at <https://github.com/ClavelLab/maldipickr/issues>.
but current dplyr version used is 1.0.10
First release:
usethis::use_cran_comments()
Title:
and Description:
@return
and @examples
Authors@R:
includes a copyright holder (role 'cph')Prepare for release:
git pull
urlchecker::url_check()
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
git push
Submit to CRAN:
usethis::use_version('major')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version(push = TRUE)
MALDIquant
and readBrukerFlexData
both require a lot of maintenancemaldipickr
with it could be relevant for long-termMALDIquant
routines rformassspectrometry/MsCoreUtils#119I think this would improve readability. The README can become a simple .md
file instead of Rmd
Source: https://pkgdown.r-lib.org/articles/customise.html?q=Get%20starte#navbar-heading
The clustering approach introduced in v1.1.0
implicitly use the single-linkage clustering (meaning friend-of-a-friends). This typically results in clusters in chain. The minimum similarity within the cluster in not controlled.
{targets}
qs
)The packages {igraph}
and {tidygraph}
are conveniently imported for the dereplication but are only used to detect connected components.
There should be an easy way to recode an equivalent of the connected components algorithm, seeing as we only use limited options there.
This would certainly drastically reduce the number of dependencies!
Prepare for release:
git pull
urlchecker::url_check()
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version(push = TRUE)
two functions do not comply at the moment:
similarity_to_clusters()
identification_to_clusters()
could be unified to:
delineate_clusters()
with two internal functions that are not exported?
pick_spectra(clusters, discard_regex = "E11", only_show_discarded = TRUE)
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#> always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the maldipickr package.
#> Please report the issue at <https://github.com/ClavelLab/maldipickr/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.