lindeloev / mcp Goto Github PK
View Code? Open in Web Editor NEWRegression with Multiple Change Points
Home Page: http://lindeloev.github.io/mcp
Regression with Multiple Change Points
Home Page: http://lindeloev.github.io/mcp
Add the functionality to fix some parameter values rather than having them inferred. Since this is just a 100% prior of a certain value, this should be done in prior
. They have to be numerical values.
prior = list(
int_1 = "dnorm(0, 1)",
cp_1 = 10,
int_2 = 20,
cp_2 = "dnorm(40, 10) T(cp_1, )"
)
mcp(segments, data, prior)
I think it should still be included in summaries, just so that they always express the full model.
This should be possible:
mcp(..., variance = ~ ma(1))
. Parameter will be named ma1
.mcp(..., variance = ~ ar(3))
. Parameters will be named ar1
, ar2
, ar3
mcp(..., variance = ~ arma(2))
; same as mcp(..., variance = ~ ma(2) + ar(2)
. Parameters will be named ma1
, ma2
, ar1
, ar2
.As with other effects in #56, it would apply to the whole model.
mcp
already supports loo
. Maybe we should also support more classical hypothesis tests.
brms::hypothesis
. Should be straight forward using tidybayes::tidy_draws()
and eval(parse(text = hypothesis))
?
_nolik
, e.g., cp_1_nolik
and cp_2_id_nolik[5]
?Test accuracy of fits:
rstanarm
, and loo
and waic
too.rstanarm
or lme4
fit individual segments and compare fits. If reasonably vague priors and reasonably well-defined change points are used, they should be practically identical.vdiffr
package which ggplot2
uses.I suspect this has to do with tidybayes
version.
Reprex below.
library(mcp)
my_data <- data.frame(
x = 1:50,
y = c(
rep(30, 25) * abs(rnorm(25)),
rep(30, 25) * -abs(rnorm(25))
)
)
# Define segments
segments <- list(
y ~ 1 + x, # Intercept
1 ~ 1 # Intercept
)
# Start sampling
fit <- mcp(segments, my_data, cores = 1)
#> Compiling data graph
#> Resolving undeclared variables
#> Allocating nodes
#> Initializing
#> Reading data back into data table
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 50
#> Unobserved stochastic nodes: 5
#> Total graph size: 631
#>
#> Initializing model
#>
#> user system elapsed
#> 2.2 0.5 2.7
summary(fit)
#> Warning: unnest() has a new interface. See ?unnest for details.
#> Try `df %>% unnest(c(.lower, .upper))`, with `mutate()` if needed
#> # A tibble: 5 x 4
#> name mean .lower .upper
#> <chr> <dbl> <dbl> <dbl>
#> 1 cp_1 25.5 24.0 27.0
#> 2 int_1 21.4 7.54 35.1
#> 3 int_2 -24.4 -31.2 -17.7
#> 4 sigma 16.9 13.3 20.5
#> 5 x_1 -0.354 -1.29 0.652
# Plot fit
plot(fit)
#> Error in spread_draws_long_(tidy_draws(model), variable_names, dimension_names, : No variables found matching spec: draws
plot(fit, "combo")
plot(fit, "overlay")
#> Error in spread_draws_long_(tidy_draws(model), variable_names, dimension_names, : No variables found matching spec: draws
packageVersion("tidybayes")
#> [1] '1.1.0'
Specify parameters as relative or absolute. E.g. relative parameters represent changes from the former segment while absolute is more like "classical" Piecewise Linear Regression. This will be useful in some cases where changes are more meaningful. I think that absolute parameters should be the default. Changing to relative parameterization could be done like this:
segments = list(
y ~ 1 + x,
rel(1) ~ rel(1) + rel(x))
All parameters in the first segment must be absolute. Fail if rel
is used there. But a change in relative intercept following a slope-only segment is meaningful enough and should work.
rel
on RHSrel
on LHSrel
in segment 1Do this by adding print.mcplist
and print.mcpstr
methods and corresponding classes to the following:
Lists:
fit$pars
fit$prior
fit$model
attr(fit$data$y, "simulated")
Code:
fit$jags_code
. Used should not need to do cat(fit$jags_code)
The datasets in ex_demo
, ex_ar
, etc. clutter up the namespace and the documentation. Move it to a function just like https://github.com/stan-dev/posterior/blob/master/R/example_draws.R.
So mcp_example("demo")
returns a list with the fields
$model
: the model used to generate the data.$data
: the data. "simulated")`$fit
: an mcpfit
. Defaults to NULL
.A typical example in the README would then look like this:
model = list(
....,
....
)
fit = mcp(model, mcp_example("demo")$data)
or
ex = mcp_example("demo")
ex$model # Show it
fit = mcp(ex$model, ex$data)
Undecided:
$simulated
) or just have the user extract them via attr($data$response, "simulated")
.ex("demo")
) but less semantically clear.TO DO:
mcp_example()
docHey! Great package! Initially I had trouble to install because some of my packages were outdated. But after updating everything it ran pretty smooth.
It took me a bit to understand the logic of the list. I think a simple comment in the quick-start would fix this (e.g. "between each entry (a "segment") of the list, a changepoint is modelled"). After understanding this, the toolbox was intuitive to use
readme: sampling the prior:
empty = mcp(segments, sample=FALSE)
Here it is implicitly assumed that segments defines "x" somehow.
I think I'm a bit confused what the underlying model is. I get discrete changepoints but at points where there are no samples. This is still confusing me tbh.
the rel(1) command lets you parameterize the parameter relative to the last segment. Is it also possible to parameterize to any other ones? I am thinking of a situation where two changepoints define a plateau that is different and then going back to the initial value. i.e.
In this example, I might want to assume that the first and last segment / plateau have identical parameters (or at least put a prior that the difference is quite small)
Pretty good job! worked fine for me so far. I only ran it on simulated data, I have to check for real data :-) My problem with real data is that I usually have strong autocorrelation, i.e. changes are not really discrete hinges, but smoothed over time. I guess one could fit plateaus & slopes, but still no smoothness in the fit.
There should be an mcp(..., sigma = ~ [formula])
argument to set it up for the whole model. The default should be mcp(..., sigma = ~ 1)
. The estimated sigma should just be called sigma
instead of sigma_1
. If any sigma()
is specified in the segments, the mcp argument would apply up to that point, but the parameter would be called sigma_1
, etc.
The downside is that it is redundant to specifying + sigma(1)
in segment 1 (which is already the default if not done explicitly). Having two ways of doing the same thing is confusing.
I judge that the advantages outweigh this downside, though:
brms
and others.mcp(..., autocor)
.One thing to decide:
sigma = ~ 1
or sigma = ~ sigma(1)
? The latter is more verbose, but more consistent with usage in the formulas and the upcoming autocor
.Would be great to:
dens_overlay
on the x-axiscp_1
is red, cp_2
is blue, etc.)dens_overlay
to inspect convergence.This would bring the non-normal nature of the change point posteriors to the fore. It would also constitute an in-your-face inspection of convergence issues.
I would like it to be default, but you should be able to turn it off, plot(fit, show_dens = FALSE)
. I need to think of a good argument name here.
binomial()
and bernoulli()
gaussian()
poisson()
?The hard part is coming up with priors where these don't fail too often.
Implementing #90, I now see that rel()
becomes cumbersome to support henceforth (it already was) and that it becomes somewhat ambiguous for categorical predictors. Users seem to misunderstand it anyway.
prior = list(groupX_2 = "groupX_1+ DEFAULT", catY_3 = "catY_1 + dnorm(0, 1) T(0, )")
. The returned parameter should be the distribution part so the JAGS code needs to be like catY_1_return ~ dnorm(0, 1) T(0, ); catY_1 = catY_1 + catY_3
rel()
.An advantage of this approach is that it expands functionality: allows relativity across segments (not just the former), it allows percentage-wise relativity (divide), etc.
mcp 2.0 will support stan
in addition to JAGS. It is far out in the future but this issue collects working points.
stan
model, pass data, and sample it.bridgesampling
-based Bayes Factorsmcp(model, data, backend = "stan")
). Otherwise, the dependencies would be quite heavy for non-JAGS and non-stan users.stan
samples more effectively using a continuous step function, e.g., as in this post.I tried basic rjags
examples and they work fine (model compiled in seconds), to ensure JAGS
is running properly.
Then, I experimented with mcp
, but running mcp()
doesn't seem to kick off JAGS
at all - keeps loading, without any message; this was tried on both R
3.5.1 and 3.6.0.
I proceed with killing the run and the following message shows:
> library(mcp)
>
> my_data <- data.frame(x = 1:50,
+ y = c(rep(30, 25) * abs(rnorm(25)),
+ rep(30, 25) * -abs(rnorm(25))))
>
> # Define segments
> segments = list(
+ y ~ 1 + x, # Intercept
+ 1 ~ 1 # Intercept
+ )
> # Start sampling
> fit = mcp(segments, my_data)
Warning message:
In system(cmd, wait = FALSE, input = "") :
'CreateProcess' failed to run 'C:\Users\Public\Data\R\R-35~1.1\bin\x64\Rscript.exe --default-packages=datasets,utils,grDevices,graphics,stats,methods -e "parallel:::.slaveRSOCK()" MASTER=localhost PORT=11743 OUT="/dev/null" SETUPTIMEOUT=120 TIMEOUT=2592000 XDR=TRUE'
>
Seesion info
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mcp_0.1 rjags_4-9 coda_0.19-2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2.2 rstudioapi_0.10.0-9002 magrittr_1.5 tidyselect_0.2.5
[5] munsell_0.5.0 colorspace_1.4-1 lattice_0.20-38 R6_2.4.0
[9] rlang_0.4.0 stringr_1.4.0 dplyr_0.8.3 tools_3.5.1
[13] parallel_3.5.1 grid_3.5.1 packrat_0.5.0 gtable_0.3.0
[17] loo_2.1.0 ellipsis_0.3.0 matrixStats_0.54.0 lazyeval_0.2.2
[21] assertthat_0.2.1 lifecycle_0.1.0 tibble_2.1.3 crayon_1.3.4
[25] tidyr_1.0.0 purrr_0.3.2 ggplot2_3.2.1 vctrs_0.2.0.9001
[29] zeallot_0.1.0 glue_1.3.1 stringi_1.4.3 compiler_3.5.1
[33] pillar_1.4.2 backports_1.1.5 scales_1.0.0 pkgconfig_2.0.3
This should work:
segments = list(
y ~ 1 + x + I(x^2.4),
1 ~ 0 + log(x)
)
The best solution would be to allow anything that can run in JAGS. See user manual page 42.
The README is growing quite big. Write more focused vignettes and link to them:
fit$func_y
and fit$jags_code
.JAGS uses precision (1/sigma^2)
instead of SD as the spread parameter in dnorm
, dcauchy
, and dt
. This is confusing. Allow users to specify using SD instead:
prior = list(
cp_1 = "dnorm(10, 5)",
int_2 = "dcauchy(1, 0.2)"
)
should generate JAGS code
cp_1 ~ dnorm(10, 1/5^2)
int_2 ~ dcauchy(1, 1/0.2^2)
If users want to specify precision, they can do this:
prior = list(
cp_1 = "dnorm(10, sqrt(1/5))",
int_2 = "dcauchy(1, sqrt(1/0.2))"
)
Look to brms
for inspiration
summary(fit)
summary(fit)
with info about model, convergence, etc.fitted(fit)
. This is already done in plot,mcpfit
. Could also be predict(fit)
ranef(fit)
would also be nice.This should be fairly simple to do. Just run the JAGS model without data and the predictive formula, and collect samples. Two ideas:
mcp(segments, data = "prior")
Although philosophically sound (the prior acts in the same way as data), it does look kind of weird. Another option is to take a dedicated argument like brms::brm
:
mcp(segments, sample_prior = TRUE)
Perhaps these samples should be stored in fit$prior_samples
. Down the line, they could be useful for computing Bayes Factors of point-null models.
I discarded a third option immediately, because users who forget to provide data
would get very confused that the model "ignores" it. The idea is to take the lack of a data
argument to mean that only priors should be sampled (because sample == TRUE
by default)
mcp(segments)
Family-specific functions are currently scattered over several locations in the code and dealt with in an if-else like fashion. This hard-coding of response families makes it harder to implement new ones and maintain existing ones.
I propose storing all of this in an mcpfamily()
object which extends gaussian()
, etc.
mcpfamily()
mcpfamily()
mcpfamily()
(used in fit$simulate()
).dpar
) for each family, e.g., c("mu", "sigma")
for gaussian()
and build regression models for each of these. Use intercept-models for all that are not explicitly included in the formulas.ar()
, weight()
, etc.This is relevant for #89. Once implemented, support for stan
will also be easier (#100).
Hi there,
Can I use glm.nb for the regression?
Thank you!
Hello Jonas Kristoffer Lindelรธv,
Great work on a package which serves a very good purpose which I am sure will see plenty of uptake. This is a question. I would like to extract the data behind the posterior draws/lines in the plot.mcpfit
function. Is there user exposed function which can create these data by passing the mcpfit
object and a few arguments? Many thanks,
Stuart.
Reprex below.
The tweak for plot()
method suggested in #30 is not quite perfect, as it generates the following error:
library(mcp)
# Define the segments that are separated by change points
segments = list(
score ~ 1 + year, # intercept + slope
1 ~ 0 + year, # joined slope
rel(1) ~ 0, # joined plateau starting at relative change point
1 ~ rel(1) # disjoined plateau with relative intercept parameterization
)
# Get an mcpfit object without samples
empty = mcp(segments, sample=FALSE)
# Now use empty$func_y() to generate data from this model.
# Set some parameter values to your liking:
data = data.frame(
year = 1:100, # Evaluate func_y for each of these
score = empty$func_y(
year = 1:100, # x
sigma = 12, # standard deviation
cp_1 = 20, cp_2 = 35, cp_3 = 80, # change points
int_1 = 20, int_4 = 20, # intercepts
year_1 = 3, year_2 = -2 # slopes
)
)
prior = list(
year_1 = "dnorm(0, 5)", # Slope of segment 1
int_1 = "dt(10, 30, 1) T(0, )", # t-distributed prior. Truncated to be positive.
cp_2 = "dunif(0, 60)", # Change point is after the first, but within 60 years.
year_2 = -2 # Fixed slope of segment 2.
)
fit = mcp(segments, data, prior)
# adapted from plot.mcpfit
my_custom_plot(fit, "overlay")
Error: (list) object cannot be coerced to type 'double'
One more tweak to make it work; please refer to code marked with # <<<
below:
[...]
# First, let's get all the predictors in shape for func_y
Q = Q %>%
tidyr::expand_grid(!!x$pars$x := eval_at) %>% # correct name of x-var
# Add fitted draws (vectorized)
dplyr::mutate(!!x$pars$y := purrr::invoke(func_y, ., type = "fitted")) %>%
# Add line ID to separate lines. Mark a new line when "eval_at" repeats.
dplyr::mutate(
line = !!sym(x$pars$x) == min(eval_at), # <<<
line = cumsum(line)
)
[...]
fit$prior
and in jags_code
.This should work:
segments = list(
y | trials(N) ~ 1 + x,
1 ~ 0)
mcp(data, segments, prior, family=binomial())
This means that all parameters are on rates/probabilities rather than observed values. family = binomial()
will need to take an additional column to specify the number of trials. I have a hard-coded model working with binomial so it is very doable.
Reprex below.
Should user initialize where the changepoint may be in the data?
library(mcp)
my_data <- data.frame(x = 1:50,
y = c(1:25 * 3 + abs(rnorm(25)),
1:25 * -3 + abs(rnorm(25) + 75)))
plot(x = my_data$x, y = my_data$y)
segments = list(
y ~ 1 + x, # intercept + slope
1 ~ 0 + x # joined slope
)
fit = mcp(segments, my_data, cores = 1)
#> Compiling data graph
#> Resolving undeclared variables
#> Allocating nodes
#> Initializing
#> Reading data back into data table
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Deleting model
#> Error in rjags::jags.model(model, data, n.chains = n.chains, n.adapt = n.adapt, : RUNTIME ERROR:
#> Compilation error on line 39.
#> Unknown variable cp_2
#> Either supply values for this variable with the data
#> or define it on the left hand side of a relation.
Created on 2019-10-27 by the reprex package (v0.2.1)
I have a question regarding how to configure relative slope in a formula form.
I read the documentation and it seems we can use "rel" to fit the relative change. So I did the same for my time series use case as following.
But it gives me an upward slope which is supposed to show a downward trend based on our domain knowledge. So I inspect the "jags_code". It seems it's modeled as "(x_i-1 + x_i)" for the relative part. What I would like to fit is to model the relative change of the previous slope as a multiplier.
Thanks to the "custom_jags" argument. I twisted a little bit of the "jags_code" by specifying a multiplier parameter "f_2" that follows a uniform distribution ([0.1 ~1]) and constructing the last slope as "x_2 = f_2*x_1" to replace the previous "(x_2+x_1)" component. That gives me a much better fit in the sense of being consistent with our prior knowledge.
I'm wondering whether this is a bug of fitting with a relative slope as I'm not sure why the additive formulation(x_i-1 + x_i)) can help extrapolate the relative change. Or do you have any other suggestions to specify in a formula form for my goal? (so I don't need to twist the jags code)
Thanks very much!
As discussed in the last part of this article mcp
can be "hacked" to model change points in regions with no data. This is useful when forecasting. A more user-friendly API should be included.
This is a spin-off of #78 where some details are already discussed.
Include optional random effects for all parameters. Here is a great guide for JAGS and stan
Or add a varying slope too:
list(y~ 1 + x,
(1|id) ~ 0 + (1|x))
Keep it intercept-only at first with uncorrelated random effects. Think about whether to allow for fixed and random of the same parameter (mean-centering random):
list(y~ 1 + x,
1 + (1|id) ~ 0 + x+ (1|x))
Hi!
I was wondering if you could help me with this. When I run this code:
model = list(
acc | trials(N) ~ 1,
1 + (1|id) ~ 0 + numerosity)
fit<- mcp(model, data = df, family = binomial())
summary(fit)
ranef(ift)
I get the following error:
Error in UseMethod("ranef") :
no applicable method for 'ranef' applied to an object of class "mcpfit"
Before that I tried using the following code with a slightly different dataset so I could use the bernoulli family:
model = list(
acc ~ 1,
1 + (1|id) ~ 0 + numerosity
)
fit<- mcp(model, data = df, family = bernoulli())
summary(fit)
ranef(ift)
I got the same error.
Thanks in advance!!
Saw the announcement of the new release on Twitter, looks awesome!
I had two questions about tidybayes integration:
I see you're using tidy_samples
instead of implementing a generic for tidybayes::tidy_draws
... Are there limitations of the tidy_draws
interface that motivate this choice or is there a difference in semantics? If there are changes in tidybayes that would make it possible to make the interfaces consistent I'd be open to it.
Any interest in implementing add_fitted_draws
/add_predicted_draws
/etc? I haven't come up with a good way to make that easy on package developers yet but if there was interest on your part this might be the opportunity to think through how to make that happen. There is also a feature discussion for {posterior} related to generics for these kinds of functions (stan-dev/posterior#39) that I will probably be basing future implementations of add_fitted_draws
/add_predicted_draws
/etc on, having your input there might be helpful too.
It currently draws using the fitted values y_[i]
. This bloats the fit$samples
and is very slow in every way.
Instead, we need to make a function which respects fit$segments
like
fit$func = function(x, cp_1, int_1. x_1, x_2) {
...
}
... and use that for plotting and possibly prediction by running it for some/all MCMC draws. The fit$func
may be useful for other purposes as well.
log-likelihood is currently computed during sampling. This is >90% of the size of the fit
object and may slow sampling too. It's only used for loo
and waic
so WWBD (What Would brms
Do)?:
fit = mcp::add_loglik(fit)
which fills fit$loglik
with a (Nchains*Nsample) x Ndata matrixadd_loglik
in loo
and waic
if is.null(fit$loglik)
As described in #5
Here are some ideas which could use some discussion and careful consideration. It extends the current model specification: https://lindeloev.github.io/mcp/articles/formulas.html
In the order from "soonish" (top) to "in your dreams":
Survival models are relatively simple and we should support them, including censoring too. The API for the model itself would be something like brms
:
model = list(
eventtime | cens(status) ~ 0,
~ 0 + x
)
It should support both exponential decay and Cox proportional hazards. This would probably be specified via the mcp(..., family = )
argument, but I'm unsure what would be the best.
If there are multiple (piecewise) lines over a single change point, and each line is associated with a different parameter x
, we can use that to predict the change point. For example, we could assess subitizing in participants of varying age, and it would be reasonable to expect the subitizing range (location of first change point) to increase with age in childhood and decrease with age in adulthood.
How to implement this in a formula, I'm unsure. Maybe it has to be in the random effect: (1 + x | age)
since that specifies the grouping of multiple lines. This would also ensure that the parameter names stay intact. cp_i
, cp_i_age
, and then probably cp_i_x
.
Multiple response variables predicted from single change points. Something like
model = list(
c(y1, y2) ~ 1 + x, # segment 1
~ 0 # seg2: joined intercept
)
This could be merged with "Variance change (Specify y)" to specify a change in one response but not the other:
model = list(
c(y1, y2) ~ 1 + x, # segment 1
~ 0 # seg2: joined intercept on y1 and y2, segment 2
y1 ~ 1 ~ x # seg3: slope change in y1 but not y2
)
mcp
currently supports AR(N) models. It should go more general. Take a look at:
I have got a list of observations along a spatial axis, that I would like to fit a simple intercept model to (as described here).
However, I am not equally confident in all of my observations, so I would like to add weights
to account for uncertainty. I could do this for instance with ecp::e.divisive
.
Is this something you are considering for mcp
as well?
Otherwise, there is of course the possibility to explicitly model the generative process in JAGS
, but that would remove the convenience of mcp
.
The overview should include the yet-to-be-reviewed/tested packages mentioned at the top, and perhaps an additional example: https://lindeloev.github.io/mcp/articles/packages.html.
Also update:
strucchange::breakpoints
does take time series and returns confidence intervals on the break points.changepoint.geo
. It reduces multivariate problems to angle+length in vector space, and calls cpt.mean
and cpt.var
on this using PELT. So it's changepoint
but with a tweak to allow for high-dimensional data.changedetection
.Update fit$func_y
to support simulating and predicting autocorrelations. Update plot.mcpfit()
appropriately too, including intervals.
Turning off would just be changing to 0th order:
segments = list(
y ~ 1 + x,
~ 0 + ar(1), # start AR(1)
~ 0 + ar(0) # Stop all AR
)
Implementation-wise, the "0" would just set zeros for the involved parameters, e.g., ar_[i_] = 0
, i.e., leaving it purely to sigma
to model the residuals.
First, fantastic package! Thank you.
When plotting the results of the model fit it can often be challenging to determine which posterior cp estimates (blue lines at bottom) correspond to the visual change points shown in the upper model curves. This is particularly difficult when one or more of the posterior estimates are bimodal, and worse still if overlapping :-(
If the individual CP posterior estimates were color coded things would be easier. You will note in my graph an attempt to identify the CP range using shaded regions and lines for the mean value.
model = list(
y ~ 1, # plateau (int_1)
~ 0 + x, # joined slope (time_2) at cp_1
~ 1 + x, # disjoined slope (int_3, time_3) at cp_2
~ 1 + x # disjoined slope (int_4, time_4) at cp_3
)
model.string <- paste(sapply(model, function(x) Reduce(paste, deparse(x))), collapse = ", ")
prior = list(
int_1 = 0, # Constant, not estimated
cp_1 = "dunif( 0, 200)", # has to occur in this interval
cp_3 = "dunif(300, 400)" # has to occur in this interval
)
I was happy to see that EnvCpt is included in your comparison documentation, thanks for including it.
In a couple of places you write "I suspect EnvCpt uses cpt.np() in the background" for the change in mean parts. I wanted to clarify that it is using cpt.meanvar() from the changepoint package. You will see that the changepoint.np package is not a dependency and therefore it could not use this in the background.
This explains the differences you see between EnvCpt and the cpt.mean() function.
Each segment should take an arbitrary number of linear predictors. As with the segmented
package, the only requirement is that one continuous predictor (say, x
) is the dimension of the change point. The change point is simply the value on x
where the predictions of y
changes to a different regression model (parameter structure and/or values).
So this API should work. It has the following features:
model = list(
y ~ 1 + x*group + z + sigma(1 + group), # interactions and main effects and a covariate.
~ 0 + x + ar(2, z), # only one slope
~ 1 + group # a range of x where group is the only predictor
)
JAGS-wise the indicator functions would be the same but now we additionally pass design matrices (X1_
, X2_
, etc.) and use inprod()
per segment. The model above would be something like:
# Priors for individual parameters
cp_1 ~ ... T(MINX, MAXX)
cp_2 ~ ... T(cp_1, MAXX)
int_1 ~ dnorm(0, 1^-2)
int_3 ~ ...
xGroupFemale_1 ~ dnorm(0, 1) # check R naming convention
z_1 ~ dunif(0, 100)
x_2 ~ dnorm(4, 3^-2)
xGroupFemale_3 ~ ...
# Model and likelihood
for(i_ in 1:length(x1_)) {
y_[i_] = (x_[i_] > cp_0) * (int_1 + inprod(c(xGroupFemale_1, z_1), x1_[i_, ])) +
(x_[i_] > cp_1) * (0 + inprod(c(x_2), x2_[i_, ])) +
(x_[i_] > cp_2) * (int_3 + inprod(c(xGroupFemale_3), x3_[i_, ]))
response[i_] ~ dnorm(y_[i_], sigma_[i_])
}
where xi_
is a model matrix that is built R-side and x_
is par_x
along which change points are defined. Implementing this adds the following work points:
Intercept_i
instead of int_i
.par_x
if there is not exactly one continuous predictor.lm
and brms
but add _segmentnumber
.base::qr()
.get_formula()
to match the new segment table.get_jagscode()
run_jags()
and get_jags_data()
to work with design matrices. One per (dpar-segment) combo.sigma(1 + x * group)
etc. (I think it will out of the box).plot_pars()
, hypothesis()
, summary()
, fixef()
, etc.get_summary()
: Translate between code parameter name and user-facing parameter name.fit$simulate(fit, data, par1, par2, ...)
, i.e., add fit
and replace par_x
with data.frame/tibble
.data
have the correct format. Set factor levels to match the original data.fit$simulate()
a wrapper around a lower-level fast function to use internally. Call it fit$.internal$simulate_vec(par_x, cp_1, ..., rhs_par1, rhs_par2, ...)
. Only the former should do asserts, call add_simulated()
, etc.ar()
simulate_vectorized()
from all internal functions instead of fit$simulate()
.fitted()
, predict()
, pp_check()
, etc.mcp_examples
mcp_examples
?plot(fit, facet_by = c("my_rhs", "my_varying_cp"))
. Still default to no facets.color_by = c("my_categorical1", "my_categorical2)
. It defaults to color_by = "all_categorical"
, i.e. all unique combinations of categorical levels on RHS. This will also set the grouping for spaghettis. I think that color_by
should pertain solely to the RHS which share change points. Varying change points will not be accepted.plot(fit, effects = "my_categorical1")
. It's like brms::marginal_effects()
. This should probably be implemented in tidy_samples()
.plot(fit, filter = data.frame(my_categorical1 = c("levelA", "levelB"), my_categorical2 = "level1")
. This is like brms::marginal_effects()
, only filter
using a data.frame replaces int_conditions
which is a named list. For variables in effects
that are not in filter
, all levels will be included. This should probably be implemented in tidy_samples()
.plot_pars()
~exp(1 + x )
.tidy_draws()
changepoint
, ecp
, bcp
, segmented
, strucchange
, tsmcp
, robts:changerob
.Reprex below.
Using ex_demo
, there are match
and sim
columns in mcpfit
object, but not when using my_demo
.
Is this driven by Rhat
or n.eff
values?
library(mcp)
#> Warning: package 'mcp' was built under R version 3.5.3
# Define the model
model <- list(
response ~ 1, # plateau (int_1)
~ 0 + time, # joined slope (time_2) at cp_1
~ 1 + time # disjoined slope (int_3, time_3) at cp_2
)
# Fit it. The `ex_demo` dataset is included in mcp
fit <- mcp(model, data = ex_demo)
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 100
#> Unobserved stochastic nodes: 7
#> Total graph size: 1731
#>
#> Initializing model
#> Finished sampling in 7.5 seconds
fit
#> Family: gaussian(link = 'identity')
#> Iterations: 9000 from 3 chains.
#> Segments:
#> 1: response ~ 1
#> 2: response ~ 1 ~ 0 + time
#> 3: response ~ 1 ~ 1 + time
#>
#> Population-level parameters:
#> name match sim mean lower upper Rhat n.eff
#> cp_1 OK 30.0 30.57 22.57 37.96 1 355
#> cp_2 OK 70.0 69.78 69.28 70.24 1 5621
#> int_1 OK 10.0 10.27 8.77 11.59 1 1152
#> int_3 OK 0.0 0.46 -2.51 3.41 1 776
#> sigma_1 OK 4.0 4.01 3.46 4.62 1 4109
#> time_2 OK 0.5 0.54 0.41 0.67 1 379
#> time_3 OK -0.2 -0.22 -0.38 -0.04 1 741
my_demo <- structure(list(
response = c(
138.989, 97.232, 45.717, 25.919,
12.67, 39.103, 57.598, 39.518, 43.226, 2.374, 7.972, 6.779
),
time = 1:12
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(
NA,
-12L
))
fit <- mcp(model, data = my_demo)
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 12
#> Unobserved stochastic nodes: 7
#> Total graph size: 235
#>
#> Initializing model
#> Finished sampling in 0.9 seconds
fit
#> Family: gaussian(link = 'identity')
#> Iterations: 9000 from 3 chains.
#> Segments:
#> 1: response ~ 1
#> 2: response ~ 1 ~ 0 + time
#> 3: response ~ 1 ~ 1 + time
#>
#> Population-level parameters:
#> name mean lower upper Rhat n.eff
#> cp_1 2.2 1.0 5.1 1.3 110
#> cp_2 3.3 1.2 9.1 1.3 67
#> int_1 108.9 46.2 151.7 1.1 212
#> int_3 36.4 -1.4 69.0 1.1 306
#> sigma_1 24.7 12.7 41.8 1.1 236
#> time_2 -1.1 -12.9 11.3 1.0 1191
#> time_3 -2.0 -6.8 3.4 1.0 818
Give users something to toy with to get acquainted with mcp
.
Should it loaded with mcp
or should one run data("mcp_sim1")
?
Once done, update the README and mcp
examples to use these datasets.
Test that it does not break, and that the data structures are inteact:
It seems there isn't a "predict" function. Please correct me if I'm wrong?
I think the predicted values would be generated from the model parameters in the last segment, correct?
#42 show how the current resolution of default plots can be confusing. Simply make eval_at
finer around change points. Or fall back on adding a resolution
argument to plot.mcpfit
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.