GithubHelp home page GithubHelp logo

sugiharalab / redm Goto Github PK

View Code? Open in Web Editor NEW
114.0 114.0 43.0 31.95 MB

Applications of Empirical Dynamic Modeling from Time Series

License: Other

R 3.12% C++ 19.59% C 0.01% Makefile 0.10% TeX 0.74% HTML 76.44%

redm's People

Contributors

adamtclark avatar caijun avatar esabers avatar ethanwhite avatar ha0ye avatar janecowles avatar jstagge avatar softwareliteracy avatar yairdaon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

redm's Issues

Significance of converegcne

Hello,

I checked the document of rEDM and found this.
Capture
I do think that this statement is confusing since the fact that cross map skill with the largest library size is above correlation is not the sign of convergence.

Instead, only if there is significant improvement with increasing L, can we say that there is converegence.

Btw, can we compare two directions by simply comapring the cross map skill with largest L or shall we compare the rate pf convergence to say that which direction is stronger? (Rate of convergence=delta(cross map skill)/delta(Library size))

Any help would be appreciated!

include CCM lags example

Include Ye et al. 2015 CCM with time delay in description.
Add example for doing CCM with time delay to tutorial vignette.

feeding block_lnlp output into compute_stats

Currently, if you feed a .$model_output data.frame from block_lnlp into compute_stats, it does not know what to do. One issue is that they mix nomenclature. Block_lnlp gives columns "obs" and "pred", but compute_stats expected "observed" and "predicted".

version 1.15 dealing with NaN

Hello!

I have a few questions related to the missing data or NaN in the EDM functions like Simplex, CCM and S-map.
I understand that for Takens theorem to work, the continuity of the data is important for reconstructing the shadow manifold.
But unfortunately, my data have some gaps/missing data points between trials. Therefore, it will be helpful to know how could I avoid this problem in rEDM.

Questions:
(1)The note from rEDM version 1.15 mentioned that:
"SMap() ignoreNan parameter added. If ignoreNan is TRUE (default) the library is redefined to ignore embedding vectors with nan.
If ignoreNan is FALSE no change is made, the user can manually specify library segments in lib."

I also found a code note from rEDM version 1.2.3 mentioned:
"Missing data can be recorded using either of the standard NA or NaN values. The program will automatically ignore such missing values when appropriate. For instance, simplex projection will not select nearest neighbors if any of the state vector coordinates is missing or if the corresponding target value is missing."

I am wondering is the S-map ignoreNan function from version 1.15 is doing the same way as the version 1.2.3 did? Just not selecting the nearest neighbors if any of the state vector coordinates is missing or if the corresponding target value is missing?

(2) Does rEDM version 1.15 also ignore NaN (like the version 1.2.3) for Simplex, EmbedDimension and CCM?

Multiview Embedding Column Names

When you run a multiview embedding, column names are not attached to the output. This is troublesome if you hope to identify which lagged variables are contributing to the final output.

The Multiview function does not select the best views for the average prediction

There following issue is shown here: The Multiview function does not select the k best views for the average prediction

The issue is shown in the example code below:

library(rEDM)
data(block_3sp)

L.3views = Multiview(dataFrame = block_3sp, lib = "1 99", pred = "105 190", E=3,
                D = 3, columns = "x_t y_t z_t", target = "x_t", multiview = 3)

L.10views = Multiview(dataFrame = block_3sp, lib = "1 99", pred = "105 190", E=3,
                             D = 3, columns = "x_t y_t z_t", target = "x_t", multiview = 10)

L.3views$View[,1:7]

  col_1 col_2 col_3   name_1   name_2   name_3    rho
1     1     2     7 x_t(t-0) x_t(t-1) z_t(t-0) 0.9208
2     1     2     6 x_t(t-0) x_t(t-1) y_t(t-2) 0.8677
3     1     2     3 x_t(t-0) x_t(t-1) x_t(t-2) 0.9319

L.10views$View[,1:7]

   col_1 col_2 col_3   name_1   name_2   name_3    rho
1      1     2     7 x_t(t-0) x_t(t-1) z_t(t-0) 0.9208
2      1     2     6 x_t(t-0) x_t(t-1) y_t(t-2) 0.8677
3      1     2     3 x_t(t-0) x_t(t-1) x_t(t-2) 0.9319
4      1     2     8 x_t(t-0) x_t(t-1) z_t(t-1) 0.9183
5      1     7     9 x_t(t-0) z_t(t-0) z_t(t-2) 0.8858
6      1     4     9 x_t(t-0) y_t(t-0) z_t(t-2) 0.7774
7      1     3     7 x_t(t-0) x_t(t-2) z_t(t-0) 0.8724
8      1     2     5 x_t(t-0) x_t(t-1) y_t(t-1) 0.9017
9      1     2     4 x_t(t-0) x_t(t-1) y_t(t-0) 0.8805
10     1     3     6 x_t(t-0) x_t(t-2) y_t(t-2) 0.8463

If we compare the two View dataframes above, we see that in the model in which only 3 views are used not the best 3 views are selected (based on rho as described here https://science.sciencemag.org/content/353/6302/922.abstract), because in the model with 10 views we see that there would have been better views to select.

For instance, row 4 should have been selected over row 2.

It follows that the multiview function does no actually select the k best views to perform the average forecasting. It is unclear how it selects them.

Thank you and best regards,
Uriah

sessionInfo(package = "rEDM")

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Switzerland.1252  LC_CTYPE=English_Switzerland.1252    LC_MONETARY=English_Switzerland.1252
[4] LC_NUMERIC=C                         LC_TIME=English_Switzerland.1252    

attached base packages:
character(0)

other attached packages:
[1] rEDM_1.7.3

loaded via a namespace (and not attached):
 [1] compiler_4.0.3  graphics_4.0.3  htmltools_0.5.0 tools_4.0.3     utils_4.0.3     yaml_2.2.1      grDevices_4.0.3
 [8] Rcpp_1.0.5      stats_4.0.3     datasets_4.0.3  rmarkdown_2.6   knitr_1.30      methods_4.0.3   xfun_0.20      
[15] digest_0.6.27   rlang_0.4.10    base_4.0.3      evaluate_0.14  

Tp < -1 gives error

Hi,

Thanks for fixing the issue with tau in 52.

In rEDM 1.9.2, though, I still get the issue with tp < -1.

Error in RtoCpp_CCM(pathIn, dataFile, dataFrame, pathOut, predictFile, :
DataFrame::DataFrameFromColumnIndex(): A column index (66) exceeds the data frame domain.

Reprex:

data(paramecium_didinium, package = "rEDM")

vars <- names(paramecium_didinium)[2:3]

firstvar <- "paramecium"
secondvar <- "didinium"
  
# Prepare objects to save data in
timepoints_out <- list()
timepoints_plots <- list()
timepoints_means <- list()

parameters_to_check_1 <- c(-10:10)

### START RUN
for (i in 1:length(parameters_to_check_1)){
  
timepoint_for_analysis <- parameters_to_check_1[i]
dimension_for_analysis <- 3
evaluate_by_increment <- 10

thrips_combined_complete_data <- paramecium_didinium 

libSize = paste(
  # Start
  NROW(thrips_combined_complete_data) - dimension_for_analysis, 
  # Stop
  NROW(thrips_combined_complete_data) - dimension_for_analysis, 
  # Increment by which to evaluate
  evaluate_by_increment, 
  collapse = " ")

timepoints_out[[i]] = rEDM::CCM(dataFrame = thrips_combined_complete_data,
                                columns = firstvar, 
                                target = secondvar, 
                                libSizes = libSize, 
                                # ERROR from anything < -1
                                Tp = timepoint_for_analysis, 
                                E = dimension_for_analysis, 
                                includeData = TRUE,
                                sample = 100)

}

Session info:

R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=Finnish_Finland.1252 LC_CTYPE=Finnish_Finland.1252
[3] LC_MONETARY=Finnish_Finland.1252 LC_NUMERIC=C
[5] LC_TIME=Finnish_Finland.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rEDM_1.9.2 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[5] purrr_0.3.4 readr_2.0.1 tidyr_1.1.3 tibble_3.1.4
[9] ggplot2_3.3.5 tidyverse_1.3.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 lubridate_1.7.10 lattice_0.20-44 assertthat_0.2.1
[5] digest_0.6.27 foreach_1.5.1 utf8_1.2.2 R6_2.5.1
[9] cellranger_1.1.0 plyr_1.8.6 backports_1.2.1 reprex_2.0.1
[13] casnet_0.2.0 httr_1.4.2 pillar_1.6.2 rlang_0.4.11
[17] readxl_1.3.1 rstudioapi_0.13 Matrix_1.3-3 invctr_0.1.0
[21] splines_4.1.0 htmlwidgets_1.5.3 igraph_1.2.6 munsell_0.5.0
[25] broom_0.7.9 compiler_4.1.0 modelr_0.1.8 xfun_0.25
[29] pkgconfig_2.0.3 mgcv_1.8-35 htmltools_0.5.2 tidyselect_1.1.1
[33] codetools_0.2-18 fansi_0.5.0 withr_2.4.2 crayon_1.4.1
[37] tzdb_0.1.2 dbplyr_2.1.1 grid_4.1.0 nlme_3.1-152
[41] jsonlite_1.7.2 gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1
[45] magrittr_2.0.1 scales_1.1.1 cli_3.0.1 stringi_1.7.4
[49] fs_1.5.0 doParallel_1.0.16 xml2_1.3.2 ellipsis_0.3.2
[53] generics_0.1.0 vctrs_0.3.8 iterators_1.0.13 tools_4.1.0
[57] glue_1.4.2 hms_1.1.0 parallel_4.1.0 fastmap_1.1.0
[61] colorspace_2.0-2 rvest_1.0.1 knitr_1.33 haven_2.4.3

Prediction not NA with disjoint/segmented pred

As pointed out by @nonlinearnature, the current version (1.8) does not output NA in prediction rows in the presence of disjoint or segmented pred. The code makes predictions in these "gaps" using the available library vectors.

An example of the current behavior:
NOTE: This was run prior to rEDM 1.15. In 1.15 the legacy wrapper block_lnlp was deprecated.

> block <- data.frame( time=1:10, x=sin((1:10)/pi), y=cos((1:10)/pi) )
> out <- block_lnlp(block,lib=rbind(c(1,5),c(6,10)),tp=2,columns=c("x","y"),target_column = "x",stats_only = FALSE)
> out$model_output
   Index Observations Predictions Pred_Variance Const_Predictions
1      1      0.31296         NaN           NaN               NaN
2      2      0.59448         NaN           NaN               NaN
3      3      0.81627     0.95936      0.003832           0.31296
4      4      0.95606     0.89728      0.011778           0.59448
5      5      0.99978     0.88276      0.014781           0.81627
6      6      0.94307     0.89594      0.031104           0.95606
7      7      0.79160     0.59326      0.058161           0.99978
8      8      0.56060     0.26670      0.076164           0.94307
9      9      0.27328     0.28246      0.104256           0.79160
10    10     -0.04149     0.36578      0.024815           0.56060
11    11          NaN     0.09529      0.041300           0.27328
12    12          NaN     0.17593      0.055783          -0.04149

The behavior of rEDM 0.7.4 with disjoint pred:

> out74 <- block_lnlp( block, lib=rbind(c(1,5),c(6,10)), tp=2, columns=c("x","y"), target_column = "x", stats_only = FALSE )
> out74 $ model_output[[1]]
   time      obs   pred pred_var
1     3  0.81627 0.9594 0.003832
2     4  0.95606 0.8973 0.011778
3     5  0.99978 0.8828 0.014781
4     6  0.94307    NaN      NaN
5     7  0.79160    NaN      NaN
6     8  0.56060 0.2667 0.076164
7     9  0.27328 0.2825 0.104256
8    10 -0.04149 0.3658 0.024815
9   NaN      NaN    NaN      NaN
10  NaN      NaN    NaN      NaN

make_block producing zeros instead of NaN

When using the function make_block to make my own embeddings, if I use a tau different than (the default) -1, I get a series of zeros that I think should be NaN

Eg. if I do

head(make_block(Lorenz5D, "V1", tau = -2))

I get as a result

-- V1(t-0) V1(t-1) V1(t-2)
1 2.4873 NaN NaN
2 3.5108 0.0000 NaN
3 4.1666 2.4873 0.0000
4 4.4836 3.5108 0.0000
5 4.5246 4.1666 2.4873
6 4.3996 4.4836 3.5108

note that also the title for each column is only (t+n) and not (t+n*tau)

Is this also the behavior if I try to use different tau in the simplex function?

Thank you!,
Juan

Zenodo metadata

Add .zenodo.json file for accurate tracking of metadata for future releases.

simplex outputting null predictions in model_output

I am following the User Guide and set stats_only parameter to FALSE to review the actual predictions. They appear to be null. I am reproducing the same values for rho, etc. and the same plots as in the Guide but predictions appear to be null.

head(simplex_output[[1]]$model_output)
time obs pred pred_var
1 2 -0.6012986 NaN NaN
2 3 0.7998003 NaN NaN
3 4 -0.7944096 NaN NaN
4 5 0.7979992 NaN NaN
5 6 -0.8195405 NaN NaN
6 7 0.8051964 NaN NaN

Thx,

Shane

Example of allowed libSizes for tau or negative tp

Hi,

Stupid question: I've been stuck for days trying to figure out how to use rEDM::CCM() with tau > 0 and tp < -1.

Would it be possible to give an example of how to specify libSizes here?

I keep getting an error such as:

Error in RtoCpp_CCM(pathIn, dataFile, dataFrame, pathOut, predictFile, :
DataFrame::DataFrameFromColumnIndex(): A column index (60) exceeds the data frame domain.

I cannot use the example in https://ha0ye.github.io/rEDM/articles/rEDM.html, because the same happens there (additionally, target_column and lib_column need to be specified as.character() for the code to work)

output <- do.call(rbind, lapply(seq_len(NROW(params)), function(i) {
    ccm(paramecium_didinium, E = params$E[i], lib_sizes = NROW(paramecium_didinium), 
        random_libs = FALSE, lib_column = params$lib_column[i], target_column = params$target_column[i], 
        tp = params$tp[i], silent = TRUE)
}))

Prediction of single time points

Hi!

This might be related to issue #45, but it is just a small issue.

I'm trying to randomly select time points for which to predict. Using the code from issue #45, this does not work (i.e. predicting just one time point):

block <- data.frame( time=1:10, x=sin((1:10)/pi), y=cos((1:10)/pi) )
out2 <- block_lnlp(block,pred=c(4,4),tp=1,columns=c("x","y"),target_column = "x",stats_only = FALSE)

Error in RtoCpp_Simplex(pathIn, dataFile, dataFrame, pathOut, predictFile,  : 
  Parameters::Validate(): prediction start 4 exceeds end 4.

So I tried to use pred=c(4,5) instead of pred=c(4,4):

out3 <- block_lnlp(block,pred=c(4,5),tp=1,columns=c("x","y"),target_column = "x",stats_only = FALSE)
out3$model_output

  Index Observations Predictions Pred_Variance Const_Predictions
1     4    0.9560557         NaN           NaN               NaN
2     5    0.9997847   0.9244915    0.00336703         0.9560557
3     6    0.9430667   0.8738918    0.01783972         0.9997847

As can be seen, 2 time points are predicted instead of 1 (and additionally, neither of the 2 is time index 4, the predicted ones are 5 and 6). This is probably the expected behavior for pred=c(4,5), but it is not possible to predict "isolated" time points.

This is not a big issue, as I can code a workaround, but I would appreciate it if it were possible to predict single time points. It might be just as simple as changing the implementation such that prediction start can be the same as prediction end (see error message above -- but this is just a guess).

And as a side note, this code gives the following warning:

out4 <- block_lnlp(block,pred=rbind(c(4,5),c(7,8)),tp=1,columns=c("x","y"),target_column = "x",stats_only = FALSE)
WARNING: Validate(): Disjoint prediction sets  are not fully supported. Use with caution.

So I'm wondering whether it is better not to use disjoint prediction sets?

Cheers,
Uriah

Multidimensional time series

Often times you have multiple observable variables.

How do you construct an attractor from a delayed embedding when you have multiple variables? It is not obvious how a delayed embedding could be constructed.

The naive approach where you have some n variables would require an xn embedding where x is the number of lags.

add user guide info on forecasting ahead of observed data

Some users want to know about forecasting into unknown futures.

for known input data, can specify NA for observed future state:

  • use loop with 1-step ahead forecasts to fill in predictions, one at a time
  • use larger tp steps to generate forecast functions for tp-step ahead

Inverted name for the output generated by block_lnlp function

Hi there,

So I am using block_lnlp function to make some estimation on interaction strength coefficients, the thing is the output by the function gives the coefficients with inverted nomenclature for the interaction strengths names, I think. Like, if I put as the target column X and Y to predict X, the interaction strengths should be the Jacobian of \frac{\partialX}{\partialY}, but the output name the interaction strength as \frac{\partialY}{\partialX}, this is a problem only on the names not on the estimation, right?

Can anyone confirm this to me? If not I have to inverted the coefficient?

Thanks!

multiple simultaneous targets

Often times, we may want to compute the predictions to multiple target variables at the same time, e.g. for block_lnlp to a whole set of state variables, or CCM to a set of plausible causal variables.

On the algorithm side, this is straightforward, as both simplex and s-map just need some loops added in, as in the dev branch.

One issue is that some of the checking on valid vectors for indexing depends on the target variable. If the target variables have different gaps or missing values, this becomes problematic (and is why dev hasn't been merged back into master).

One solution is to add a prior check on valid vectors for all targets before running and stop if it doesn't pass.

R 4.2.0 Character Encoding on Windows: Can not load rEDM

Hello, I am an old time rEDM user. Thank you all for your hard work and for rEDM. I'd like to report an issue with rEDM that happened on 6/6/2022 as I installed fresh new R, RStudio and rEDM on my new laptop.

Indeed, with the latest and newest version of R.4.2.0 (022-04-22, "Vigorous Calisthenics"), on Windows11, with RStudio (RStudio Desktop 2022.02.3+492), using the latest rEDM version (1.12.2), the R session systematically crashes as soons as it tries to load the rEDM library in RStudio. I did several install of rEDM:
install.packages("rEDM")
and
devtools::install_github("SugiharaLab/rEDM")

It all leads to a complete crash ... I also changed the directory of my R libraries from \AppData\Local\R\ to \Documents\R\win-library\ but that does not help either. Only the rEDM library causes this major malfunction (among all my time series libraries).

Never had this issue before. So, I installed an older version of R (4.1.2/2021-11-01) and reinstall rEDM (1.12.2); this seems to be stable as it does not crash. It seems that rEDM is not compatible anymore with the latest R.

Thank you so much

Sebastien

package failed to install on CentOS

R: 3.1.1
GCC: 5.3.0

> install_github("ha0ye/rEDM")
Downloading GitHub repo ha0ye/rEDM@master
from URL https://api.github.com/repos/ha0ye/rEDM/zipball/master
Installing rEDM
'/usr/lib64/R/bin/R' --no-site-file --no-environ --no-save --no-restore CMD  \
  INSTALL '/tmp/RtmpeO8gw3/devtoolsb91b2d4dd73d/ha0ye-rEDM-2416db6'  \
  --library='/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1' --install-tests 

* installing *source* package ‘rEDM’ ...
** libs
I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include"  -I../inst/include    -c RcppExports.cpp -o RcppExports.o
/bin/sh: I/usr/include/R: No such file or directory
make: [RcppExports.o] Error 127 (ignored)
I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include"  -I../inst/include    -c block_lnlp.cpp -o block_lnlp.o
/bin/sh: I/usr/include/R: No such file or directory
make: [block_lnlp.o] Error 127 (ignored)
I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include"  -I../inst/include    -c forecast_machine.cpp -o forecast_machine.o
/bin/sh: I/usr/include/R: No such file or directory
make: [forecast_machine.o] Error 127 (ignored)
I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include"  -I../inst/include    -c lnlp.cpp -o lnlp.o
/bin/sh: I/usr/include/R: No such file or directory
make: [lnlp.o] Error 127 (ignored)
I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include"  -I../inst/include    -c xmap.cpp -o xmap.o
/bin/sh: I/usr/include/R: No such file or directory
make: [xmap.o] Error 127 (ignored)
-L/usr/local/lib64 -o rEDM.so RcppExports.o block_lnlp.o forecast_machine.o lnlp.o xmap.o -L/usr/lib64/R/lib -lR
/bin/sh: line 2: -L/usr/local/lib64: No such file or directory
make: *** [rEDM.so] Error 127
ERROR: compilation failed for package ‘rEDM’
* removing ‘/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/rEDM’
Error: Command failed (1)

Installing from the source produces the same error message:

install.packages("~/rEDM/",repos=NULL,type="source")
Installing package into ‘/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1’
(as ‘lib’ is unspecified)
* installing *source* package ‘rEDM’ ...
** libs
I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include"  -I../inst/include    -c RcppExports.cpp -o RcppExports.o
/bin/sh: I/usr/include/R: No such file or directory
make: [RcppExports.o] Error 127 (ignored)
I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include"  -I../inst/include    -c block_lnlp.cpp -o block_lnlp.o
/bin/sh: I/usr/include/R: No such file or directory
make: [block_lnlp.o] Error 127 (ignored)
I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include"  -I../inst/include    -c forecast_machine.cpp -o forecast_machine.o
/bin/sh: I/usr/include/R: No such file or directory
make: [forecast_machine.o] Error 127 (ignored)
I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include"  -I../inst/include    -c lnlp.cpp -o lnlp.o
/bin/sh: I/usr/include/R: No such file or directory
make: [lnlp.o] Error 127 (ignored)
I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include"  -I../inst/include    -c xmap.cpp -o xmap.o
/bin/sh: I/usr/include/R: No such file or directory
make: [xmap.o] Error 127 (ignored)
-L/usr/local/lib64 -o rEDM.so RcppExports.o block_lnlp.o forecast_machine.o lnlp.o xmap.o -L/usr/lib64/R/lib -lR
/bin/sh: line 2: -L/usr/local/lib64: No such file or directory
make: *** [rEDM.so] Error 127
ERROR: compilation failed for package ‘rEDM’
* removing ‘/d1/dwuab/R/x86_64-redhat-linux-gnu-library/3.1/rEDM’
Warning message:
In install.packages("~/rEDM/", repos = NULL, type = "source") :
  installation of package ‘/ghome/dwuab/rEDM/’ had non-zero exit status

CCM error "model$set_block(data.matrix(block))"

Same data set causing the error with smap is causing the following error in the CCM method: "model$set_block(data.matrix(block))"

Again, I'm happy to share the dataset (96407 x 4 features + timestamps matrix)

can not library the "rEDM"

I installed rEDM to Windows. When I "library(rEDM)", the R program hanged up. After waiting for more than 2 hours, the library ended.Then I can use this package.
My R version is 4.1.2 and rEDM version is 1.9.3.
Do you have any suggestions/comments about this?
I would be grateful if you could reply to my question.

Thanks!

When should different surrogate generation functions be used?

I have an ecological dataset with a high level of seasonality, and I was wondering whether it would be better to use the 'make_surrogate_twin' or the 'make_surrogate_seasonal' method within make_surrogate_data? In the rEDM apple thrips example (https://ha0ye.github.io/rEDM/articles/rEDM.html#edm-examples), the seasonal option is picked to generate surrogates, whereas in Ushio (2018), phase lock twin surrogate has been used.

Are there specific advantages and disadvantages to each of these algorithms, particularly in ecological food-web settings? For example, is one better for identifying inter-specie interactions, and another better for identifying causal interactions between species and environmental variables?

Many thanks

Vignette - Appropriate E for CCM in Big Biodiversity Example

Within the vignette, in the Big Biodiversity example, the CCM section uses the bestE object to choose the correct E.

I believe the indices for these calls are incorrect. For example, the vignette says the following:

no_xmap_inv <- ccm(composite_ts, lib = segments, pred = segments, E = bestE[4], ...
inv_xmap_no <- ccm(composite_ts, lib = composite_lib, pred = composite_pred, E = bestE[1] ...

but, bestE looks like this:

AbvBioAnnProd noh020tot invrichness precipmm
5 5 4 2

The indices used are neither the lib or the target columns. I believe the indices should be bestE[3] and bestE[2], respectively, to line up with the putative causal process (the target column). I think this error carries through this section.

In the vignette at https://cran.r-project.org/web/packages/rEDM/vignettes/rEDM_tutorial.html, the indices for the first example are correct, but the second example is incorrect. I think you may have just copied the 4 and 1 indices from the first example down to the second.

Thanks for this great package and documentation!

Columns input for block_lnlp

Output from block_lnlp gives a single character vector for the columns argument, using commas to space out the column indices. This isn't currently an accepted format for input into block_lnlp, which means the user has to do a bit of work to string along multiple runs -- i.e. selecting some sets of columns to run from one output and feeding it into the next run.

Understand why a few manually calculated predictions differ from those calculated by rEDM

I have written some R code to manually do simplex calculations in order to better understand the methods used in the rEDM package. I get results that exactly match those from rEDM for 92 out of the 99 predictions in my example data set. So this Issue is concerned with understanding why the remaining values differ (they differ for three different reasons), and confirming what rEDM is doing.

Maybe I am misunderstanding something or there are some assumptions in rEDM that I have not found documentation of. I have written up the details in the attached file rEDMissues.pdf and provided a minimal working example in mwe.r, which is copied below.

rEDMissues.pdf

Thanks for any insight - my manually calculated results for my simulated dataset actually suggest that the EDM method is doing better (higher correlation coefficient rho) than implied by rEDM.

File mwe.r is copied below (it seems that .r files cannot be inserted in GitHub comments like a .pdf can):

# mwe.r - minimal working example to demonstrate two of the discrepancies
#  between rEDM results and my R code. These are documented in the file
#  rEDMissues.pdf to be posted (with this file) as Issue #24 on the rEDM
#  GitHub site. Here I am including results from my R code to explain the
#  issues. This code is designed to be stepped through line-by-line in
#  conjunction with reading the comments and the .pdf. 
#  Andrew Edwards. 21st November 2017.

rm(list=ls())
require(rEDM)

# Xvec contains the values of X(t). The X(t) are first differences of output
#  from a Ricker-like model. X is not standardised (which is okay as univariate).
#  X(t) = N(t+1) - N(t) where N(t) are population numbers.
Xvec = c(-0.056531409251883,0.059223778257432,5.24124928046977,
    -4.85399581474521,-0.46134818068973,0.273317575696793,0.801806230470337,
    -0.888891901824982,-0.202777622745051,0.497565422757662,5.10219324323769,
    -5.36826459373178,-0.17467165498718,1.06545333399298,1.97419279178678,
    -2.91448405223082,-0.179969867858605,0.237962494942611,1.47828327622468,
    -1.54267507064286,-0.180342027136338,0.238919610831881,1.06140368490958,
    -1.06522901782019,-0.214923527940395,0.452847221462308,2.13053391555372,
    -2.55145224744286,-0.0307653352327702,1.1448014288826,-0.0675575239486375,
    -1.04711881585576,-0.00910890042051652,0.726257323277433,0.732271192186161,
    -1.35460378982395,-0.0322955446760023,0.507606440290776,3.73396587274012,
    -4.19686615950143,-0.0997201857962038,0.753392632401029,2.41347231553437,
    -3.03677401452137,-0.141112562089696,0.446002103079665,0.223768504955365,
    -0.615452831633047,-0.0216659723974975,0.292246351104258,0.20006105300258,
    -0.469596514211075,0.0422676544887819,0.474264989176278,-0.0416811459395667,
    -0.53555712696719,0.118860281628173,0.176335117268894,-0.10364820567334,
    -0.153572235117542,0.180339482186409,0.0566876206447625,-0.140537892644139,
    0.0252441742388871,0.340689505466622,0.852833653689839,-1.07051231019616,
    -0.0937704380137284,0.460677118593916,0.37444382348273,-0.83783628206217,
    -0.0154896108244113,1.34259279914848,-0.495978821807168,-0.472464634960208,
    -0.415481769949074,1.36767605087962,-0.891896943918948,-0.279228283931612,
    -0.148703043863421,2.04524590138255,-1.98431486665908,0.0602356391036573,
    -0.0902730939678147,0.243344379963862,-0.074421904114315,-0.309150440565139,
    0.43675531763949,0.178787692802827,0.0799271040758849,-0.657946157906476,
    1.14668210755046,-0.791665479471326,0.482533897248175,-0.798737571552661,
    0.439024256063545,0.177114631209318,2.19942374686687,-2.9488856529422)

n = length(Xvec)
n

simp.Efix = simplex(Xvec, E = 2, stats_only = FALSE)
rEDM.points = simp.Efix[[1]]$model_output       # time, obs, pred and pred_var
# Make each row correspond to t:
rEDM.points = rbind(c(1, Xvec[1], NA, NA), rEDM.points)
head(rEDM.points)

# Issue (i): rEDM seems to use x(t^*+2) as a nearest neighbour, using the
#  Deyle et al. (2013) notation that x is the vector of lagged values of X,
#  i.e. x = (X(t), X(t-1))

# For t^*=94, my R code calculates nearest neighbours with indices psivec
#  (I am using psi instead of the t with a line through in Deyle et al. 2013).
psivec94 = c(6, 57, 88)
# and corresponding weights,
weights94 = c(0.3678794, 0.3205861, 0.2895013)
# giving the estimate of X(95), from [S1] of Deyle et al. (2013) as
X95est = sum(weights94 * Xvec[psivec94+1]) / sum(weights94)
X95est

# However, rEDM gives
X95est.rEDM = rEDM.points[95, "pred"]
X95est.rEDM

# But I can reproduce the rEDM result by allowing x(96) to be a nearest neighbour
#  of x(94),  giving
psivec94.allow = c(96, 6, 57)
  # so 6 and 57 are now 2nd and 3rd nearest neighbours

weights94.allow = c(3.678794e-01, 1.405278e-04, 4.146457e-05)
# Note that the first weight is the same as above (by definition it's always
#  exp(-1)), but the second and third are very small because x[96] is
#  very close to x[94].

X95est.allow = sum(weights94.allow * Xvec[psivec94.allow+1]) /
    sum(weights94.allow)
X95est.allow

# However, surely we should not be allowed to use x(96) because, by definition,
#  it contains X(95), which is the value we are trying to estimate from x(94).
#  X(95) is presumably the 'one' that we are leaving out in 'leave-one-out'.

# I get the same issue with t^*=75.


# Issue (ii): for t^*=64, rEDM seems to only use the one nearest neighbour,
#  and estimates X(65) to be X(3)
X65est.rEDM = rEDM.points[65, "pred"]
X65est.rEDM
Xvec[3]
# So these are the same, and X(3) is the largest value of X in the time series
#  (I am not sure if that is important):
which.max(Xvec)

# Thus rEDM has essentially calculated weights of (w_1, w_2, w_3) = (1, 0, 0).
# However, I calculate weights of (0.3679, 0.1795, 0.1333):
psivec64 = c(2, 61, 60)
# With a vector of distances from x[64] of
dist64 = c(sqrt((Xvec[psivec64[1]] - Xvec[64])^2 + (Xvec[psivec64[1]-1] -
                                                             Xvec[64-1])^2),
           sqrt((Xvec[psivec64[2]] - Xvec[64])^2 + (Xvec[psivec64[2]-1] -
                                                             Xvec[64-1])^2),
           sqrt((Xvec[psivec64[3]] - Xvec[64])^2 + (Xvec[psivec64[3]-1] -
                                                             Xvec[64-1])^2))
# And corresponding weights
weights64 = exp(- dist64 / dist64[1])
# weights64 = c(0.3678794, 0.1795047, 0.1333414)  from my longer R knitr code
weights64

# My resulting estimate of X65 is:
X65est = sum(weights64 * Xvec[psivec64+1]) / sum(weights64)
X65est

# So I do not understand why the weights are different. The nearest neighbour
#  is not so close that the second and third are relatively far away, which
#  is issue (iii).

# I think issue (iii) may get resolved with further understanding of
#  issues (i) and (ii).

Add validation checking on column and length arguments

The core C++ code will crash when trying to access memory that it doesn't have, such as columns beyond the limits of a matrix or values past the end of a vector.

We can add some simple checks to the R wrappers to check for these to produce an error that won't crash RStudio.

One MRE:

dat <- rnorm(83)
pred <- c(43, 83)
lib <- c(1, 42)
crash_out <- simplex(dat, lib, pred, tau = 1:10)

Causality detection in the short time series

Hello, thanks for the rEDM package, which provides a very convenience way to find the cross-interaction between some infectious diseases . However, when used the rEDM package, i came across some problems and hope you can help me.

Assuming that there are three dynamic process named A, B, C (C depends on A), and i want to test whether there is a synergistic effect on C when A and B coexist.

There are several weakness in my dataset. Firstly, my dataset is only with 104 week. Secondly, most of B and C have zero value. The dataset is here

GPU acceleration? (feature)

I used CCM in a massive and resource consuming research project, and some of my simulations took days to complete even with using a small number of random samples. Which made me think of possible acceleration methods that we can implement for CCM, especially with iterative computations on the same data.

I think CCM is a great tool, and should not be hindered by the computational complexity that neural networks and other algorithms have been able to surpass with computational optimization.

Since some if not most of the work done by CCM is calculating distances, would this not be a local operation that can be distributed to many nodes? Perhaps after a "master" manifold has been built?

Any ideas? I would be happy to help with what I can.

Error in INTERNAL_EmbedDimension(pathIn, dataFile, dataFrame, pathOut, : FindNeighbors(): Library is too small to resolve 2 knn neighbors.

I try to use the package to analyze & predict the sunspot monthly data in rolling windows. Here is the fully reproducible code:

library(rEDM)

df <- data.frame(yr = as.numeric(time(sunspot.month)), 
                 sunspot_count = as.numeric(sunspot.month))

# make indices for 11 rolling splits

train_splits <- rep(NA, 11)
test_splits <- rep(NA, 11)

periods_train <- 12 * 50 # 50 yrs
periods_test  <- 12 * 10 # 10 yrs
skip_span     <- 12 * 20 # 20 yrs

for (k in 1:11) {
  train_start <- (k-1)*skip_span + 1
  train_stop <- train_start + periods_train -1
  test_start <- train_stop + 1
  test_stop <- test_start + periods_test -1
  
  train_splits[k] <- paste(as.character(train_start), as.character(train_stop))
  test_splits[k] <- paste(as.character(test_start), as.character(test_stop))
}

# END make indices

# Try embeddings & predictions

k = 1

E.opt = EmbedDimension( dataFrame = df,                # input data
                        lib     = train_splits[k],     # portion of data to train
                        pred    = test_splits[k],      # portion of data to predict
                        columns = "sunspot_count",
                        target  = "sunspot_count")

# works OK with k = 1-3
# for k > 3, fails with:

# Error in INTERNAL_EmbedDimension(pathIn, dataFile, dataFrame, pathOut,  : 
#   FindNeighbors(): Library is too small to resolve 2 knn neighbors.

It works OK with k = 1, 2, 3, but for larger values of k (it goes up to 11), it fails with the subject error message.

I wonder, since the size of the library is the same for every split (600 data points), and it works OK with the first 3 splits, why is this happening?

I have tried it with the just released v1.2.0, as well as with the previous version 1.1.0 - same behavior.

Session info:

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rEDM_1.2.0

loaded via a namespace (and not attached):
[1] compiler_3.6.1 tools_3.6.1    Rcpp_1.0.3   

Compilation failed for package ‘rEDM’

I am trying to install rEDM 1.2.2 because the later version does not have the EmbedDimension. I trying to install using the archive repository https://cran.r-project.org/src/contrib/Archive/rEDM/, devtools::install_github("SugiharaLab/rEDM") and install.packages("rEDM") but it gives the same error message, which follows below:

install.packages("~/GW_SW_Correlation_Analysis/rEDM_1.2.2.tar.gz", repos = NULL, type = "source")
Installing package into ‘/home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)

  • installing source package ‘rEDM’ ...
    ** package ‘rEDM’ successfully unpacked and MD5 sums checked
    ** using staged installation
    ** libs
    g++ -std=gnu++11 -I"/opt/R/3.6.3/lib/R/include" -DNDEBUG -I ./cppEDM/src/ -I"/opt/R/3.6.3/lib/R/library/Rcpp/include" -I"/home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include" -I/usr/local/include -fpic -g -O2 -c CCM.cpp -o CCM.o
    In file included from /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread.h:11:0,
    from RcppEDMCommon.h:8,
    from CCM.cpp:2:
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/Thread.hpp: In lambda function:
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/Thread.hpp:42:19: error: parameter packs not expanded with ‘...’:
    f(args...);
    ^
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/Thread.hpp:42:19: note: ‘args’
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/Thread.hpp:42:23: error: expansion pattern ‘args’ contains no argument packs
    f(args...);
    ^
    In file included from /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread.h:13:0,
    from RcppEDMCommon.h:8,
    from CCM.cpp:2:
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp: In member function ‘void RcppThread::ThreadPool::push(F&&, Args&& ...)’:
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp:127:31: error: expected ‘,’ before ‘...’ token
    jobs_.emplace([f, args...] { f(args...); });
    ^
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp:127:31: error: expected identifier before ‘...’ token
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp:127:34: error: parameter packs not expanded with ‘...’:
    jobs_.emplace([f, args...] { f(args...); });
    ^
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp:127:34: note: ‘args’
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp: In lambda function:
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp:127:44: error: expansion pattern ‘args’ contains no argument packs
    jobs_.emplace([f, args...] { f(args...); });
    ^
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp: In member function ‘std::future<decltype (f(args ...))> RcppThread::ThreadPool::pushReturn(F&&, Args&& ...)’:
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp:144:54: error: expected ‘,’ before ‘...’ token
    auto job = std::make_shared([&f, args...] {
    ^
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp:144:54: error: expected identifier before ‘...’ token
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp:144:57: error: parameter packs not expanded with ‘...’:
    auto job = std::make_shared([&f, args...] {
    ^
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp:144:57: note: ‘args’
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp: In lambda function:
    /home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/RcppThread/include/RcppThread/ThreadPool.hpp:145:22: error: expansion pattern ‘args’ contains no argument packs
    return f(args...);
    ^
    make: *** [CCM.o] Error 1
    ERROR: compilation failed for package ‘rEDM’
  • removing ‘/home/gbonotto/R/x86_64-pc-linux-gnu-library/3.6/rEDM’
    Warning in install.packages :
    installation of package ‘/home/gbonotto/GW_SW_Correlation_Analysis/rEDM_1.2.2.tar.gz’ had non-zero exit status

Multiview() chooses the maximum lag based on the number of predictor columns used, regardless of E.

On https://cran.r-project.org/web/packages/rEDM/vignettes/rEDM-tutorial.pdf in the description of function Multiview there is written: Multiview() operates by constructing all possible embeddings of dimension E with lag up to E-1.

I have noticed that the function behaves oddly in that regard, namely it chooses the maximum lag to be equal to the number of predictor columns used, regardless of other parameter values. I'm showing this with some examples:

example 1: 2 predictor columns and E=3

library(rEDM)
data(block_3sp)

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
               E = 3, columns = "x_t y_t", target = "x_t", multiview = 100)
head(L$View)
  col_1 col_2   name_1   name_2      rho    MAE   RMSE
1     1     2 x_t(t-0) x_t(t-1)  0.92240 0.2548 0.3259
2     1     3 x_t(t-0) y_t(t-0)  0.86910 0.3214 0.4187
3     1     4 x_t(t-0) y_t(t-1)  0.90120 0.2883 0.3663
4     2     3 x_t(t-1) y_t(t-0)  0.73710 0.4535 0.5764
5     2     4 x_t(t-1) y_t(t-1)  0.65050 0.5242 0.6892
6     3     4 y_t(t-0) y_t(t-1) -0.01939 0.8524 1.0530

As can be seen, the lag is indeed E-1=2, but the the dimensions of the single views are 2 and not 3.

example 2: 3 predictor columns and E=3

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
               E = 3, columns = "x_t y_t z_t", target = "x_t", multiview = 100)
head(L$View)
  col_1 col_2 col_3   name_1   name_2   name_3    rho    MAE   RMSE
1     1     2     7 x_t(t-0) x_t(t-1) z_t(t-0) 0.9208 0.2485 0.3164
2     1     2     6 x_t(t-0) x_t(t-1) y_t(t-2) 0.8677 0.3294 0.4113
3     1     2     3 x_t(t-0) x_t(t-1) x_t(t-2) 0.9319 0.2277 0.2934
4     1     2     8 x_t(t-0) x_t(t-1) z_t(t-1) 0.9183 0.2476 0.3205
5     1     7     9 x_t(t-0) z_t(t-0) z_t(t-2) 0.8858 0.3031 0.3738
6     1     4     9 x_t(t-0) y_t(t-0) z_t(t-2) 0.7774 0.4191 0.5116

In this case the dimensions of each view is indeed 3, but the max lag is not E-1=2 but it is 3.

example 3: 4 predictor columns and E=3

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", 
               E = 3, columns = "x_t y_t z_t y_t-1", target = "x_t", multiview = 100)
head(L$View)
  col_1 col_2 col_3 col_4   name_1   name_2   name_3     name_4    rho    MAE   RMSE
1     1     2     7    14 x_t(t-0) x_t(t-1) y_t(t-2) y_t-1(t-1) 0.8602 0.3372 0.4208
2     1     2     4     9 x_t(t-0) x_t(t-1) x_t(t-3)   z_t(t-0) 0.8766 0.2989 0.3881
3     1     2     9    11 x_t(t-0) x_t(t-1) z_t(t-0)   z_t(t-2) 0.8781 0.2955 0.3858
4     1     2     8    15 x_t(t-0) x_t(t-1) y_t(t-3) y_t-1(t-2) 0.8546 0.3188 0.4201
5     1     2     9    16 x_t(t-0) x_t(t-1) z_t(t-0) y_t-1(t-3) 0.8849 0.2992 0.3751
6     1     2     3    16 x_t(t-0) x_t(t-1) x_t(t-2) y_t-1(t-3) 0.8639 0.3134 0.4062

In this case neither the embedding dimension nor the maximum makes sense: each view is of dimension 4 and the maximum lag is also 4.

The pattern is quite obvious: Embedding dimension = maximum lag = number of columns.

I don't think this is intended, as following the description it should be: maximum lag = Embedding dimension-1; and both the maximum lag and E are independent from the number of predictor columns (to an extent).

I am also confused regarding the 2 arguments E (embedding dimension) and D (multivariate dimension). What exactly is the difference? D seems to overwrite E, meaning that if D ist set, the value of E has no influence and the maximum lag is still chosen based on the number of predictor columns. Example:

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", D=3,
               E = 3, columns = "x_t y_t", target = "x_t", multiview = 4)
head(L$View)
  col_1 col_2 col_3   name_1   name_2   name_3    rho    MAE   RMSE
1     1     2     4 x_t(t-0) x_t(t-1) y_t(t-1) 0.8977 0.2913 0.3673
2     1     2     3 x_t(t-0) x_t(t-1) y_t(t-0) 0.8738 0.3037 0.4016
3     1     3     4 x_t(t-0) y_t(t-0) y_t(t-1) 0.7524 0.4366 0.5402
4     2     3     4 x_t(t-1) y_t(t-0) y_t(t-1) 0.4988 0.6344 0.7492

Max lag is 2 (= number of predictors), and 3 variables are used for each view (= D)

L = Multiview( dataFrame = block_3sp, lib = "1 99", pred = "105 190", D=3,
               E = 4, columns = "x_t y_t", target = "x_t", multiview = 4)
head(L$View)
  col_1 col_2 col_3   name_1   name_2   name_3    rho    MAE   RMSE
1     1     2     4 x_t(t-0) x_t(t-1) y_t(t-1) 0.8977 0.2913 0.3673
2     1     2     3 x_t(t-0) x_t(t-1) y_t(t-0) 0.8738 0.3037 0.4016
3     1     3     4 x_t(t-0) y_t(t-0) y_t(t-1) 0.7524 0.4366 0.5402
4     2     3     4 x_t(t-1) y_t(t-0) y_t(t-1) 0.4988 0.6344 0.7492

Same as above, changing E did nothing.

Also, why is the maximum lag set to E-1? I would like the possibility of constructing a model in which each view is e.g. of size 2 and the maximum lag is 3. That is, I would like to have the possibility of choosing the dimension of the views and the maximum lag separately. Is that possible?

Thank you and best regards,
Uriah

sessionInfo(package = "rEDM")

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Switzerland.1252  LC_CTYPE=English_Switzerland.1252    LC_MONETARY=English_Switzerland.1252
[4] LC_NUMERIC=C                         LC_TIME=English_Switzerland.1252    

attached base packages:
character(0)

other attached packages:
[1] rEDM_1.7.3

loaded via a namespace (and not attached):
 [1] compiler_4.0.3  graphics_4.0.3  htmltools_0.5.0 tools_4.0.3     utils_4.0.3     yaml_2.2.1      grDevices_4.0.3
 [8] Rcpp_1.0.5      stats_4.0.3     datasets_4.0.3  rmarkdown_2.6   knitr_1.30      methods_4.0.3   xfun_0.20      
[15] digest_0.6.27   rlang_0.4.10    base_4.0.3      evaluate_0.14  

typo in code for indicating separations between plots?

I think it's including one extra element in each individual time series because it selects the biggest year + the next point of the next time series... AKA I think this:

for(i in 2:nrow(composite_ts)) {
if(composite_ts$year[i] < composite_ts$year[i-1]) {
segments <- rbind(segments, c(startpos, i))
startpos <- i+1
}
}

should instead be:

for(i in 1:(nrow(composite_ts)-1)) {
if(composite_ts$year[i] > composite_ts$year[i+1]) {
segments <- rbind(segments, c(startpos, i))
startpos <- i+1
}
}

setting knn < E+1

Hi George and all, we are running SImplex in rEDM and trying to set knn < E+1 - we used to be able to do this but using a newer version of rEDM (or maybe R) it looks like we no longer can. Now, we get an error message

"Error in RtoCpp_Simplex(pathIn, dataFile, dataFrame, pathOut, predictFile, : Parameters::Validate(): Simplex knn of 1 is less than E+1 = 2"

Is this intentional? Are you trying to prevent knn from ever being less than E+!? And if so, why? Thanks for any help you can be. Best, Jeff Houlahan

dgelss failed

Hi, I'm having a problem while executing PredictNonLinear (using the tutorial with my data). Maybe it is too obvious but I'm very new to R.

My data looks something like this:

head(datos,2)
Year Month Area Date
1 2000 1 NA ene 2000
2 2000 2 24.012 feb 2000

Error section:

E = 3

rho_theta_e3 =PredictNonlinear(dataFrame = datos, columns = "Area",target = "Area", lib = "2 228", pred = "2 228", E = E)

Error in RtoCpp_PredictNonlinear(pathIn, dataFile, dataFrame, pathOut, :
Lapack_SVD(): dgelss failed.

time indices, concatenated blocks with model_output with stats_only=FALSE

If I remember, there was briefly an argument for "short_output" that truncated model_output data.frame to just the pred set. Since that has been removed, it looks like the code is padding NAs to get NROW(model_output) = NROW(block), but not putting the NaNs in the right place and not giving them time indices.

block <- data.frame(time=1:10,x=sin((1:10)/pi),y=cos((1:10)/pi))
out <- block_lnlp(block,tp=2,columns=c("x","y"),target_column = "x",stats_only = FALSE)

out$model_output[[1]]

time obs pred pred_var
1 3 0.81627311 0.9655879 0.0003981692
2 4 0.95605566 0.9135897 0.0072468536
3 5 0.99978466 0.9284073 0.0024075621
4 6 0.94306673 0.8425109 0.0241078565
5 7 0.79160024 0.7335355 0.0534826032
6 8 0.56060280 0.5976110 0.0790172007
7 9 0.27328240 0.3439944 0.1140423509
8 10 -0.04149429 0.3950724 0.0318934212
9 NaN NaN NaN NaN
10 NaN NaN NaN NaN

Additionally, if you give a split library this does not appear to deter block_lnlp from making predictions across the gaps.

out <- block_lnlp(block,lib=rbind(c(1,5),c(6,10)),tp=2,columns=c("x","y"),target_column = "x",stats_only = FALSE)
out$model_output[[1]]
time obs pred pred_var
1 3 0.81627311 0.9593595 0.003831564
2 4 0.95605566 0.8972787 0.011777949
3 5 0.99978466 0.8827621 0.014781495
4 6 0.94306673 0.8959425 0.031103864
5 7 0.79160024 0.5932581 0.058161169
6 8 0.56060280 0.2666997 0.076164443
7 9 0.27328240 0.2824588 0.104255915
8 10 -0.04149429 0.3657842 0.024814822
9 NaN NaN NaN NaN
10 NaN NaN NaN NaN

Although if you do something similar with simplex() you get the correct breaks in the predictions corresponding to the breaks given in the library.

out_simplex <- simplex(block$x,lib=rbind(c(1,5),c(6,10)),E=1,tp=2,stats_only = FALSE)
out_simplex$model_output[[1]]
time obs pred pred_var
1 3 0.81627311 0.42321690 0.2476161338
2 4 0.95605566 -0.03897166 0.0007877017
3 5 0.99978466 0.27779014 0.0012748452
4 6 0.94306673 NaN NaN
5 7 0.79160024 NaN NaN
6 8 0.56060280 0.67176509 0.1307101216
7 9 0.27328240 0.99722448 0.0011178303
8 10 -0.04149429 0.95403246 0.0013772935
9 NaN NaN NaN NaN
10 NaN NaN NaN NaN

[[Although the padding at the end of the time series is still wrong]].

Finally, just as a double check, if you put a break in the time-series, the NaNs are for that are correct.

block_broken <- data.frame(time=1:10,x=sin((1:10)/pi),y=cos((1:10)/pi))
block_broken[5,c('x','y')] <- NA
out_broken <- block_lnlp(block_broken,tp=2,columns=c("x","y"),target_column = "x",stats_only = FALSE)
out_broken$model_output[[1]]
time obs pred pred_var
1 3 0.81627311 0.9443627 0.003857110
2 4 0.95605566 0.8381385 0.006627352
3 5 NA 0.9284073 0.002407562
4 6 0.94306673 0.7074185 0.055627882
5 7 0.79160024 NaN NaN
6 8 0.56060280 0.3496210 0.111945403
7 9 0.27328240 0.3071343 0.114579057
8 10 -0.04149429 0.3781828 0.030487316
9 NaN NaN NaN NaN
10 NaN NaN NaN NaN

Smap "Error in model$set_time_series(time_series)"

Smap calculations keep failing with the error "Error in model$set_time_series(time_series)" despite the simplex working fine for the same data.

I'd be happy to share the dataset for anyone interested in testing.

Different sampling between legacy and current version?

Maybe this was addressed in NEWS, but I came across some different behavior in cross-mapping using v0.7.5 (found in ha0ye's repository) and the latest version (v1.9.x). From what I can understand in the code, there seems to be a different sampling algorithm while using random sampling without replacement. This causes high variability in the confidence interval of the cross-mapping skill

I'm using paramecium_didinium as an example

data <- paramecium_didinium
data <- data %>% 
  mutate(paramecium = (paramecium-mean(paramecium))/sd(paramecium),
         didinium = (didinium-mean(didinium))/sd(didinium))

Behavior with 1.9.x

ccm_1.9 <- CCM(dataFrame = data, E = 5, 
               columns = "didinium", target = "paramecium", 
               libSizes = "12 67 5", sample = 100,
               seed = 123, includeData = TRUE)

# median and confidence intervals
ccm_1.9 <- ccm.0_1.9$CCM1_PredictStat %>% 
    group_by(LibSize) %>%
    summarise(rho_0.025 = quantile(rho, probs = 0.025),
              rho_0.500 = quantile(rho, probs = 0.500),
              rho_0.975 = quantile(rho, probs = 0.975))

# Plot
ggplot(ccm_1.9, aes(x = LibSize)) +
    geom_ribbon(aes(ymin = rho_0.025, ymax = rho_0.975), alpha = 0.5) + 
    geom_line(aes(y = rho_0.500)) +
    coord_cartesian(ylim = c(0.3, 0.9))

Behavior with v0.7.5

ccm_0.7 <- ccm(block = data, E = 5, 
             lib_column = "didinium", target_column = "paramecium", 
             lib = c(1, 67), pred = c(1, 67),
             lib_sizes = seq(12, 67, by = 5), num_samples = 100,
             RNGseed = 123, silent = TRUE)

# median and confidence intervals
ccm_0.7 <- ccm_0.7 %>%
  rename(LibSize = lib_size) %>%
  group_by(LibSize) %>%
    summarise(rho_0.025 = quantile(rho, probs = 0.025),
              rho_0.500 = quantile(rho, probs = 0.500),
              rho_0.975 = quantile(rho, probs = 0.975))

# plot
ggplot(ccm_0.7, aes(x = LibSize)) +
    geom_ribbon(aes(ymin = rho_0.025, ymax = rho_0.975), alpha = 0.5) + 
    geom_line(aes(y = rho_0.500)) +
    coord_cartesian(ylim = c(0.3, 0.9))

I'm trying to implement this on a delayed ccm, but there the differences between the two versions are more pronounced. Do you have any suggestions/comments about the sampling method?

Thanks!
Juan

CCM Parallel Processing

Regarding the question of how to multiprocess with CCM, a set of R functions that do this in various ways using foreach %dopar% and clusterApply are shown here.

Please note that since CCM is multithreaded computing both forward and reverse mappings, the number of cores applied to these functions should be less than 1/2 the number of cores available.

Also note that mclapply should not be used with multithreaded functions/applications such as CCM Multiview EmbedDimension PredictNonlinear PredictInterval.

error in installation

A constant problem shows when I try to install the package using
devtools::install_github("SugiharaLab/rEDM")

The last several lines were pasted below, any thoughts?

ps: installing from cran works but would be stuck in library(rEDM)


make[1]: Leaving directory '/c/Users/ZHANG/AppData/Local/Temp/RtmpKc06eT/R.INSTALL2bc0341f380e/rEDM/src-x64/cppEDM/src'
C:/rtools40/mingw64/bin/g++ -shared -s -static-libgcc -o rEDM.dll tmp.def CCM.o ComputeError.o DataFrame.o Embed.o EmbedDim.o Multiview.o PredictInterval.o PredictNL.o RcppEDMCommon.o RcppExports.o SMap.o Simplex.o -L./cppEDM/src/ -lEDM -LD:/PROGRA1/R/R-4.0.2/bin/x64 -lRlapack -LD:/PROGRA1/R/R-4.0.2/bin/x64 -lR
C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ./cppEDM/src//libEDM.a: error adding symbols: archive has no index; run ranlib to add one
collect2.exe: error: ld returned 1 exit status
no DLL was created
ERROR: compilation failed for package 'rEDM'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.