GithubHelp home page GithubHelp logo

computational-metabolomics / structtoolbox Goto Github PK

View Code? Open in Web Editor NEW
8.0 6.0 4.0 25.25 MB

R/Bioconductor package - STRUCT (STatistics in R Using Class Templates) Toolbox

Home Page: https://computational-metabolomics.github.io/structToolbox/

License: GNU General Public License v3.0

R 100.00%
metabolomics statistics multivariate-analysis univariate lc-ms dims bioconductor-package r-package machine-learning

structtoolbox's Introduction

structToolbox

An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning.

This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.

Installation

To install this package:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("structToolbox")

To install the development version:

if (!require("remotes", quietly = TRUE))
    install.packages("remotes")

remotes::install_github("computational-metabolomics/structToolbox")

BioC version BioC status

structtoolbox's People

Contributors

grlloyd avatar jwokaty avatar nturaga avatar rjmw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

structtoolbox's Issues

Error occurs when plotting PLS feature importance

First of all, thanks for your excellent work. I have learned a lot of skills here.

I can reproduce all results with demo datasets. However, When I repeat it with my own data, some error occured as following:

test.zip

library(structToolbox)
load("test.Rdata")

mD <- DatasetExperiment(name = "mD", 
                        data = mconc, 
                        sample_meta = mmeta,
                        variable_meta = data.frame(variable = colnames(mconc),row.names =  colnames(mconc) ) ,
                        description = "The test"
                       )
mD$sample_meta$FN <- as.factor(mD$sample_meta$FN)


P = PLSDA(number_components = 2, factor_name= "FN" )

# apply the model
P = model_apply(P,mD)

C = pls_scores_plot(components=c(1,2),factor_name = "FN" )
chart_plot(C,P)



# prepare chart
C = pls_vip_plot(ycol = 'HE')
g1 = chart_plot(C,P)

the Erroe message like this:

Error in `$<-.data.frame`(`*tmp*`, "feature", value = c("pos.M68T423", :
replacement has 1000 rows, data has 0
6.
stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
"replacement has %d rows, data has %d"), N, nrows), domain = NA)
5.
`$<-.data.frame`(`*tmp*`, "feature", value = c("pos.M68T423",
"pos.M69T424", "pos.M70T363", "pos.M70T442", "pos.M70T535", "pos.M70T258",
"pos.M71T363", "pos.M72T349", "pos.M73T351", "pos.M74T449", "pos.M74T415",
"pos.M74T57", "pos.M74T129", "pos.M74T403", "pos.M74T602", "pos.M75T415", ...
4.
`$<-`(`*tmp*`, "feature", value = c("pos.M68T423", "pos.M69T424",
"pos.M70T363", "pos.M70T442", "pos.M70T535", "pos.M70T258", "pos.M71T363",
"pos.M72T349", "pos.M73T351", "pos.M74T449", "pos.M74T415", "pos.M74T57",
"pos.M74T129", "pos.M74T403", "pos.M74T602", "pos.M75T415", "pos.M76T415", ...
3.
.local(obj, dobj, ...)
2.
chart_plot(C, P)
1.
chart_plot(C, P)

my R evironment :

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                  LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8        LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8           LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8      LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

time zone: Asia/Shanghai
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] remotes_2.4.2.1      xlsx_0.6.5           pmp_1.14.0           BiocFileCache_2.10.1 dbplyr_2.4.0         cowplot_1.1.1        structToolbox_1.14.0 struct_1.14.0        lubridate_1.9.3     
[10] forcats_1.0.0        stringr_1.5.1        dplyr_1.1.4          purrr_1.0.2          readr_2.1.4          tidyr_1.3.0          tibble_3.2.1         ggplot2_3.4.4        tidyverse_2.0.0     
[19] omu_1.1.1           

loaded via a namespace (and not attached):
  [1] rstudioapi_0.15.0           magrittr_2.0.3              farver_2.1.1                zlibbioc_1.48.0             vctrs_0.6.4                 memoise_2.0.1              
  [7] RCurl_1.98-1.13             rstatix_0.7.2               S4Arrays_1.2.0              itertools_0.1-3             missForest_1.5              curl_5.1.0                 
 [13] broom_1.0.5                 SparseArray_1.2.2           pROC_1.18.5                 caret_6.0-94                FSA_0.9.5                   parallelly_1.36.0          
 [19] desc_1.4.2                  plyr_1.8.9                  impute_1.76.0               cachem_1.0.8                lifecycle_1.0.4             iterators_1.0.14           
 [25] pkgconfig_2.0.3             Matrix_1.6-3                R6_2.5.1                    fastmap_1.1.1               GenomeInfoDbData_1.2.11     MatrixGenerics_1.14.0      
 [31] future_1.33.0               digest_0.6.33               pcaMethods_1.94.0           colorspace_2.1-0            ps_1.7.5                    S4Vectors_0.40.1           
 [37] rprojroot_2.0.4             GenomicRanges_1.54.1        RSQLite_2.3.3               filelock_1.0.2              labeling_0.4.3              randomForest_4.7-1.1       
 [43] fansi_1.0.5                 timechange_0.2.0            httr_1.4.7                  abind_1.4-5                 compiler_4.3.2              rngtools_1.5.2             
 [49] bit64_4.0.5                 withr_2.5.2                 backports_1.4.1             carData_3.0-5               DBI_1.1.3                   pkgbuild_1.4.2             
 [55] MASS_7.3-60                 lava_1.7.3                  DelayedArray_0.28.0         ModelMetrics_1.2.2.2        tools_4.3.2                 future.apply_1.11.0        
 [61] nnet_7.3-19                 glue_1.6.2                  callr_3.7.3                 nlme_3.1-163                grid_4.3.2                  reshape2_1.4.4             
 [67] generics_0.1.3              recipes_1.0.8               gtable_0.3.4                tzdb_0.4.0                  class_7.3-22                data.table_1.14.8          
 [73] hms_1.1.3                   sp_2.1-1                    car_3.1-2                   utf8_1.2.4                  XVector_0.42.0              BiocGenerics_0.48.1        
 [79] foreach_1.5.2               pillar_1.9.0                rJava_1.0-6                 splines_4.3.2               lattice_0.22-5              survival_3.5-7             
 [85] bit_4.0.5                   tidyselect_1.2.0            knitr_1.45                  gridExtra_2.3               IRanges_2.36.0              SummarizedExperiment_1.32.0
 [91] stats4_4.3.2                xfun_0.41                   pls_2.8-3                   Biobase_2.62.0              hardhat_1.3.0               timeDate_4022.108          
 [97] matrixStats_1.1.0           stringi_1.8.1               xlsxjars_0.6.1              codetools_0.2-19            BiocManager_1.30.22         cli_3.6.1                  
[103] ontologyIndex_2.11          rpart_4.1.21                processx_3.8.2              munsell_0.5.0               Rcpp_1.0.11                 GenomeInfoDb_1.38.1        
[109] globals_0.16.2              parallel_4.3.2              ggfortify_0.4.16            gower_1.0.1                 blob_1.2.4                  prettyunits_1.2.0          
[115] doRNG_1.8.6                 bitops_1.0-7                listenv_0.9.0               ggthemes_4.2.4              viridisLite_0.4.2           ipred_0.9-14               
[121] scales_1.2.1                prodlim_2023.08.28          crayon_1.5.2                rlang_1.1.2                

filter_na_count documentation is confusing

For filter_na_count you say that the threshold = the maximum number of NA allowed per level of factor_name I think it is phrased weirdly because I think the value you put there is a minimum number of samples the group should be represented by. But the phrasing indicates how many "NA" you allow, I find it confusing.

model.apply fails for a model sequence when seq_in is not equal to 'data'

For model sequences seq_in must equal 'data' or model.apply fails. Example error when seq_in = 'names' for filter_by_name as the first step in a sequence:

Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘model_apply’ for signature ‘"model_seq", "filter_by_name"’

Possibilities to plot 3d PCA ?

Hi !
First of all many thanks for this great piece of software !

I wondered how I could plot a 3d PCA using the structToolbox ?

I have a

PCA(
       number_components = 10)

However if I try to acess > 2 components such as here :

C = pca_scores_plot(components = c(1,2,3), factor_name = 'brainregion')

I get the following error :

Error in validObject(obj) : 
  invalid class “entity” object: Components to plot: number of values must be less than "max_length"

How could I plot such object ?

Many thanks

report number of features used to compute the median for PQN

PQN currently only includes features with zero NA when computing the median. The number of features this applies to should be reported, as sometimes a very small number of features might be used and then result in a poor estimate of the coefficient.

fold change uses control group in the numerator

In fold_change objects the control_group parameter can be provided. However, the calculation uses this group in the numerator instead of the denominator (as shown by the column names) so that the calculated fold changes are not 'relative to the control' group as might be expected.

[BUG] default predicted slot for tSNE is set to a slot that doesnt exist

example:

M = tSNE()
predicted(M)

result:
"tnse"

this causes tSNE when part of a sequence to fail because

predicted(M)
Error in slot(obj, name) : 
  no slot of name "tsne" for this object of class "tSNE"

To work around this the predicted slot can be set when creating the object:

M = tSNE(predicted = 'Y')

corr_coef fails when using with only one factor

minimal example:


# example data
D=iris_DatasetExperiment()

# add continuous factor for correlation
D$sample_meta$example1=rnorm(nrow(D))
D$sample_meta$example2=rnorm(nrow(D))

M = corr_coef(factor_names = 'example1')
M = model_apply(M,D)

Fails with

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  row names supplied are of the wrong length

However, it runs as expected with factor_names = c("example1", "example2")

HSD doesnt handle missing groups well

In a one way ANOVA design, if a feature has missing values for an entire group this is not picked up by HSD and then the p-values are recycled over the groups.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.