helske / kfas Goto Github PK

KFAS: R Package for Exponential Family State Space Models

R 61.44% C 1.95% Fortran 34.05% TeX 2.56%

r fortran time-series state-space exponential-family gaussian-models dynamic-linear-model

kfas's Introduction

KFAS: R Package for Exponential Family State Space Models

Package KFAS provides tools for modelling exponential family state space models such as structural time series, ARIMA models, generalized linear models and generalized linear mixed models.

Paper at JSS

If you use KFAS in your paper, please cite it properly, see citation("KFAS") in R, or above link to the paper.

See also bssm package if your are more interested in Bayesian inference on state space models.

Main features

Kalman filtering
Fixed interval smoothing (Kalman smoothing)
Simulation smoothing of Gaussian models
Importance sampling based inference of non-Gaussian models
Exact diffuse initialization
Sequential processing (univariate treatment of multivariate models)
Multivariate models with mixed distributions

Most of the algorithms are based on book "Time Series Analysis by State Space Methods" and related articles by J. Durbin and S.J. Koopman.

KFAS is available at CRAN. You can install the latest development version from the GitHub using the devtools package:

install.packages("devtools")
library(devtools)
install_github("helske/KFAS")

See

help(KFAS) in R for examples, and many more examples under different functions, as well as the Paper at JSS.
ChangeLog for changes.

kfas's People

Contributors

Stargazers

Watchers

Forkers

snowdj gragusa isaac2lord fcbarbi gregoiregauriot chuckcoleman arpitsisodia hal2001 majeste116 linhv02 tatamiya byrongibby sbgraves237 rmallof gunesims filoluc98 277jah

kfas's Issues

SSMSeasonal harmonics wrong indices

I believe the harmonics indexing logic is bugged:


library(KFAS)
seas.full <- SSMseasonal(12, Q=.1, sea.type='trigonometric', P1=1e6)
seas.full$state_names
# [1] "sea_trig1"  "sea_trig*1" "sea_trig2"  "sea_trig*2" "sea_trig3"  "sea_trig*3" "sea_trig4"  "sea_trig*4"
# [9] "sea_trig5"  "sea_trig*5" "sea_trig6" 
# Suppose I want the 1st, 2nd, and 4th pairs of functions to get the slowest, next slowest, and 4th slowest frequencies.
harmonics = c(1, 2, 4)
# Wanted indices == 
names.wanted = c("sea_trig1",  "sea_trig*1", "sea_trig2",  "sea_trig*2", "sea_trig4",  "sea_trig*4")
ind.wanted = which(seas.full$state_names %in% names.wanted)
# [1] 1 2 3 4 7 8

seas.actual <- SSMseasonal(12, harmonics=harmonics, Q=.1, sea.type='trigonometric', P1=1e6)
# Note this warning:
# Warning message:
  # In rep(harmonics, each = 2) * rep(1:2, each = 2) :
  # longer object length is not a multiple of shorter object length
ind.actual <- which(seas.actual$state_names %in% names.wanted)

assertthat::are_equal(ind.actual, ind.wanted)
# [1] FALSE

# Double check:
seas.actual$state_names
# [1] "sea_trig1"  "sea_trig*1" "sea_trig*2" "sea_trig3"  "sea_trig*2" "sea_trig3" 


# Looking at the source code
# in SSMseasonal.R line 142
# is
ind.actual.142 <- rep(harmonics, each = 2) * rep(1:2, each = 2) + 0:1
# Note warning message here

# Should be
ind.shouldbe.142 <- rep(harmonics, each = 2) * 2 - 1:0

assertthat::are_equal(ind.actual.142, ind.wanted)
# [1] FALSE
assertthat::are_equal(ind.shouldbe.142, ind.wanted)
# [1] TRUE

This relates to #21. I am OK with making a pull request but didn't see contributing instructions so I am putting this in as an issue.

Problem with one-step-ahead prediction residuals (standardised) when using an intervention variable

I am trying to run the model with an intervention variable. It seems that the output of function rstandard() does not provide correctly one-step-ahead prediction residuals (standardised). Many of them are not available. The script with data file are attached. I am using KFAS ver 1.3.7.

UKdriversKSI.zip

LDL Composition of H failed.

Dear,

I get an error when I run my code. The error is: In logLik.SSModel(object = model, check.model = FALSE, ... :
LDL decomposition of H failed.
How can I fix this? I think my update function is correct with the right parameters but I really don't see the fold in my code.
Can you please help me?
Another question about another subject, the hessian matrix. You want the eigenvalues to be positive otherwise there is no optimum. Is there a function within the package that requires the eigenvalues to be positive?
I hope you can help me with both questions. Thanks a lot!!

model <- SSModel(covid_new[, 1:4] ~ SSMtrend(2, Q = list(matrix(NA, 4, 4), matrix(NA, 4, 4))) +
SSMregression(list(~index$Germany, ~index$Netherlands, ~index$Spain, ~index$Sweden)), H = matrix(NA, 4, 4))

updatefn <- function(pars, model, ...) {
Q <- diag(exp(pars[1:4]))
Q[upper.tri(Q)] <- pars[5:10]
model["Q", etas = "level"] <- crossprod(Q)
Q <- diag(exp(pars[11:14]))
Q[upper.tri(Q)] <- pars[15:20]
model["Q", etas = "slope"] <- crossprod(Q)
H <- diag(exp(pars[21:24]))
H[upper.tri(Q)] <- pars[25:30]
model["H"] <- crossprod(H)
model
}

init <- chol(cov(covid_new[, 1:4]))
fitinit <- fitSSM(model, updatefn = updatefn,
inits = rep(c(diag(init), init[upper.tri(init)]), 3),
method = "BFGS", hessian = TRUE)
-fitinit$optim.out$val

Corrections to diffuse likelihood?

After reading the paper "Likelihood functions for state space models with diffuse initial conditions" (http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9892.2010.00673.x/abstract) I think careful consideration of likelihood computations in KFAS is in place. Adding proposed correction terms should be pretty easy according to the paper.

For some reason I could not reproduce the last illustration in article, need to study the whole paper in more detail.

fitSSM does not pass nsim to optim when method = 'SANN'

Want to Omit Observations at Beginning of Series when Calculating Log-Likelihood

When estimating MA models, it is often necessary to effectively throw away observations at the beginning of the time series. (There is an extensive literature about this.) I've been estimating IMA(1,1) models as the sum of 2 unobserved AR(1) processes with diffuse initialization. The results from using all of the data to maximize the likelihood are ludicrous. The first few estimates are way off and the MA coefficient is nonsensical. Since those first few estimates depend on the estimated parameters, I can't simply throw them away. I'd like to keep those estimates, but not use them in the likelihood.

I tried to tinker with the source code, but it's a little beyond me. It looks like the ymiss used for the likelihood calculation has to have the desired number of values flagged at the beginning. I've attached a modification of gloglik.f95 to show what I mean.

gloglik.zip

Can't calculate multivariate time-series

Hello. I've been working on estimating coordinates with kalman filter.

I can't calculate estimated values of one of multivariate time-series. Time-series have no missing value and its sizes are same.

Code:
library(KFAS)

cam1 <- read.table("cam1.dat", header = T)
cam2 <- read.table("cam2.dat", header = T)

x1 <- ts(cam1$x)
x2 <- ts(cam2$x)

modSUTSE <- SSModel(cbind(x1,x2) ~
SSMtrend(1, Q = matrix(NA,2,2), type = "distinct"), H = matrix(NA,2,2))
fitSUTSE <- fitSSM(modSUTSE, numeric(6), method="BFGS")
kfsSUTSE <- KFS(fitSUTSE$model)

afilt <- kfsSUTSE$alphahat[1:554,]

plot(x1,lty=3,xlab="フレーム",ylab="距離")
lines(afilt[,1],lwd=2)

plot(x2,lty=3,xlab="フレーム",ylab="距離")
lines(afilt[,2],lwd=2)

Result:

And, I can't calculate if I replace x1 with x2 as well.

How should I do this?

Calculating AIC, BIC and AICc

The latest version of the R-package "KFAS" provides a function for calculating the log-likelihood of an SSModel ("logLik.SSmodel"), however the package does not provide straightforward calculation of the Akaike's Information Criteria, namely the AIC, BIC and AICc. This would be a useful update for the package.

Furthermore, the output from "logLik.SSmodel" does not provide the attributes "df" (degrees of freedom), giving the number of (estimated) parameters in the model, nor "nobs", the number of observations used in estimation, as returned by the function "logLik" in the R-package "stats". Such attributes would also be useful for calculating the Akaike's Information Criteria.

In the meantime, I am calculating the AIC, BIC and AICc criteria as in the following example:

  nobs <- length(ts)

  model_arima <- SSModel(ts ~ SSMarima(ar=ar.coef,ma=ma.coef,d=d,Q=NA,stationary=FALSE),H=NA)
  model_arima <- fitSSM(model_arima, inits=c(ar.coef,ma.coef,d,rep(0, 2)))$model

  p <- length(ar.coef)
  q <- length(ma.coef)
  npar <- (p+q+1)

  ll <- logLik(model_arima)
  AIC <- -2*ll+2*npar
  BIC <- -2*ll+log(nobs)*npar
  AICc <- AIC + 2*npar*(npar+1)/(nobs-npar-1)

Is this approach for calculating Akaike's Information Criteria for an SSModel adequate?

Function rstandard(kfs, "state") not working properly

Most of the times, when I call the following functions to obtain the auxiliary residuals

kfs1 <- KFS(fit1$model, smoothing = c("signal", "disturbance", "state"))
res_eps <- rstandard(kfs1, "pearson")
res_eta <- rstandard(kfs1, "state")

I get the error

Error in dimnames(data) <- list(NULL, names) :
length of 'dimnames' [2] not equal to array extent

associated to the call rstandard(kfs1, "state") .

My workaround is the following, but I think this bug should be fixed.

res_eta1 <- kfs1$etahat[, 1] / sqrt(fit1$model$Q[1, 1, ] - kfs1$V_eta[1, 1, ])
res_eta2 <- kfs1$etahat[, 2] / sqrt(fit1$model$Q[2, 2, ] - kfs1$V_eta[2, 2, ])

Negative K

Hello, I am working on applying KFAS for remote sensing data time series (lots of missing values here). I am trying to do one-step prediction for a short-wave band using the codes as follow:
`TwoHarmonicsCycle <- SSModel(swir ~ SSMtrend(2, Q = list(matrix(5),matrix(NA)),
P1 = matrix(0, 2, 2), a1 = matrix(c(162, 0))) +
SSMcycle(period=365.25,Q = matrix(NA)) +
SSMcycle(period=365.25/2,Q=matrix(NA)),
H = matrix(NA)) # here I intend to get intercept fixed (162 is the results from an initial model)

TwoHarmonicsCycle.fit <- fitSSM(TwoHarmonicsCycle,
inits=c(0,0,0,0,0,0),method="BFGS")
TwoHarmonicsCycle.out <- KFS(TwoHarmonicsCycle.fit$model,nsim=10000, simplify = FALSE)
TwoHarmonicsCycle.predict <- predict(TwoHarmonicsCycle.fit$model,
interval="prediction",
nsim=10000,level=.9, filtered = TRUE)
`
I got the following results:

(The black line is the one-step prediction; the red line is the real observation)

The one-step prediction (black lines) appears to be very wild at the very beginning of the curve (has extreme values), and then converge to an ideal prediction after some point in 2000. I also noticed that some parts of K are negative values at the beginning of the curve, and become all positive after one-step prediction converge. I doubt that negative K is the reason causing the wild curve here.
My question are: 1) is there any treatment to such-like the wild curve at the beginning of kalman filtering; 2) if no treatment (kalman filtering may need an initialization phase), can I use 'all K are positive' as a signal for the start of a regular prediction?

Problems with standardized residuals

Dear Jouni,

the output of the function rstandard() produces NA values for the first 19 standardized residuals. My script is attached.
Is there a solution to this? Thanks a lot!

Standardized residuals .R.zip

Problem to set a state space model

Hi. I'm having some issues to set my SSMcustom.

Here is what I'm doing:

y=base1$close
Z=array(rbind(t(base2$close),t(rep(1,length(y)))),dim=c(1,2,length(y)))
matT=diag(1,2,2)
matR=matT
Q=matT*0.01
modeloCustom=SSMcustom(Z=Z,T=matT,R=matR,Q=Q,P1inf=diag(1,2,2),n=length(y))

However, if I check

is.SSModel(modeloCustom)

It returns false.

Also, if I try:

modelo=SSModel(y~-1+modeloCustom)

It shows:

> modelo=SSModel(y~-1+modeloCustom)
Error in model.frame.default(formula = y ~ -1 + modeloCustom, na.action = na.pass) : 
  invalid type (list) for variable 'modeloCustom'

What I'm doing wrong? Someone could help me?

Thank you.

Overdispersed distributions

I'm interested in running an overdispersed binomial model. I was looking at the example in the vignette, and it looks like the overdispersed Poisson was handled by adding an additional random term to the state. I'm not sure that this is really an overdispersed Poisson, as my understanding is that the overdispersion should be at the observation level, not in the state. When I tried a formulation similar to the Poisson model in the vignette for my binomial model, I also got similar results for the standard (not overdispersed) and "overdispersed" formulations. If anything, the model with the additional state-level error term was actually more "wiggly" than the overdispersed model, which is the opposite of what should be happening with an overdispersion parameter.

For non-Gaussian models, would it be possible to add a random coefficient at the observation level so that it does not affect the state? I think would be equivalent to adding an H matrix to the non-Gaussian models.

P[t+1] = TPtt[t]T’ + RQR' not true in KFAS

I encountered a problem while using the "alcohol" example to
verify (4.24) on page 85 of the book (Koopman and Durbin, 2012).

To be specific, I wanted to get P[t+1] = T*Ptt[t]*T’ + RQR'. But it failed.

My R code is attached.

`
library("KFAS")
options(digits=5)

KFAS, P-5 ----

data("alcohol")
deaths <- window(alcohol[, 2], end = 2007)
population <- window(alcohol[, 6], end = 2007)
Zt <- matrix(c(1, 0), 1, 2)
Ht <- matrix(NA)
Tt <- matrix(c(1, 0, 1, 1), 2, 2) # a = a + slope
Rt <- matrix(c(1, 0), 2, 1) # selection
Qt <- matrix(NA)
a1 <- matrix(c(0, 0), 2, 1)
P1 <- matrix(0, 2, 2)
P1inf <- diag(2)

model_gaussian <- SSModel(deaths / population ~ -1 +SSMcustom(Z = Zt, T = Tt, R = Rt, Q = Qt, a1 = a1, P1 = P1, P1inf = P1inf), H = Ht)
fit_g <- fitSSM(model_gaussian, inits = c(0, 0), method = "BFGS") # for Ht, Qt
logLik(fit_g$model) #[1] -108.9734

out_g <- KFS(fit_g$model)

attach(out_g) # database is attached to the R search path
str(a) # Time-Series [1:40, 1:2] from 1969 to 2008: 0 23.7 20.6 27 26.3 ...
str(att) # Time-Series [1:39, 1:2] from 1969 to 2007: 23.7 22.2 25.6 25.5 20.2 ...
str(P) # num [1:2, 1:2, 1:40] 0 0 0 0 13.7 ...
str(Ptt) # num [1:2, 1:2, 1:39] 9.49 0 0 0 9.49 ...

n=dim(att)[1]

H= c(model$H); H # [1] 9.4884
Q= c(model$Q); Q #[1] 4.257
Q=diag(c(Q,0)) # already RQR'

t=n-2;

check Koopman P-85, (4.24)

#Ft = Z Pt * Z' + H
F= Zt %% P[,,t] %% t(Zt) +H ; F # 18.877
#att = at + PZ' F-¹ * vt,
att[t,] - a[t,] # -0.919809 -0.026336
da = P[,,t] %% t(Zt)/c(F) * v[t]; c(da) # -0.919809 -0.026336

#P[t+1] = TP[tt]T’ + Q
P[,,t+1] - Tt%% Ptt[,,t]%% t(Tt) - Q # already RQR'

[,1] [,2]

#[1,] -0.133697 -0.066848 -- ?
#[2,] -0.066848 0.000000

detach(out_g)`
kp_alco_Ptt.r.txt

Order of columns of bivariate data for Poisson models

Why do these two chunks give different results? Have I miss-specified?

model <- SSModel(cbind(DriversKilled, VanKilled) ~ SSMtrend(1, Q = matrix(NA,2,2)),
data = Seatbelts, distribution = rep("poisson",2))
fit <- fitSSM(model, diag(c(-4,-4)), method = "BFGS")
predict(fit$model)

model <- SSModel(cbind(VanKilled, DriversKilled) ~ SSMtrend(1, Q = matrix(NA,2,2)),
data = Seatbelts, distribution = rep("poisson",2))
fit <- fitSSM(model, diag(c(-4,-4)), method = "BFGS")
predict(fit$model)

SSMcustom() does not return valid SSModel

SSMcustom() returns an object that does not have class "SSModel". Therefore is.SSModel() returns FALSE and KFS aborts with error.

It looks like SSMarima(), SSMcycle(), SSMregression() have the same problem.

Results for slope and trend using function signal()

Dear Mr Helske,
In my stochastic level and slope model (see code below), function coef(outKFS, states = "slope") provides correct values for slope, wheras function signal(outKFS, states = "slope")$signal gives just 0s. In consequence, signal(outKFS, states = "trend")$signal seems not to provide a correct sum of level and slope. The code is below.

#Stochastic level and slope####
#Removing all objects except functions
rm(list = setdiff(ls(), lsf.str()))

#Loading data
dataUKdriversKSI <- read.table(text =
"
1687
1508
1507
1385
1632
1511
1559
1630
1579
1653
2152
2148
1752
1765
1717
1558
1575
1520
1805
1800
1719
2008
2242
2478
2030
1655
1693
1623
1805
1746
1795
1926
1619
1992
2233
2192
2080
1768
1835
1569
1976
1853
1965
1689
1778
1976
2397
2654
2097
1963
1677
1941
2003
1813
2012
1912
2084
2080
2118
2150
1608
1503
1548
1382
1731
1798
1779
1887
2004
2077
2092
2051
1577
1356
1652
1382
1519
1421
1442
1543
1656
1561
1905
2199
1473
1655
1407
1395
1530
1309
1526
1327
1627
1748
1958
2274
1648
1401
1411
1403
1394
1520
1528
1643
1515
1685
2000
2215
1956
1462
1563
1459
1446
1622
1657
1638
1643
1683
2050
2262
1813
1445
1762
1461
1556
1431
1427
1554
1645
1653
2016
2207
1665
1361
1506
1360
1453
1522
1460
1552
1548
1827
1737
1941
1474
1458
1542
1404
1522
1385
1641
1510
1681
1938
1868
1726
1456
1445
1456
1365
1487
1558
1488
1684
1594
1850
1998
2079
1494
1057
1218
1168
1236
1076
1174
1139
1427
1487
1483
1513
1357
1165
1282
1110
1297
1185
1222
1284
1444
1575
1737
1763
")[, 1] %>%
log() %>%
ts(start = 1969, frequency=12)
#Defining model
model <- SSModel(dataUKdriversKSI ~ SSMtrend(degree = 2, Q = list(matrix(NA), matrix(NA))), H = matrix(NA))
ownupdatefn <- function(pars, model){
model$H[,,1] <- exp(pars[1])
diag(model$Q[,,1]) <- exp(pars[2:3])
model
}

#Fitting model
fit <- fitSSM(model, inits = log(c(0.001, 0001, 0001)) , method = "BFGS")
outKFS <- KFS(fit$model, smoothing = c("state", "mean", "disturbance"))

Slope estimates

(coef(outKFS, states = "slope"))
(signal(outKFS, states = "slope")$signal)

Observation updates

Suppose I fit a model to a univariate time series, maybe like this:

y = Seatbelts[,"drivers"]
model = SSModel(y ~ SSMtrend(degree=2, Q=list(matrix(NA), matrix(NA))), H=matrix(NA))
fit = fitSSM(model, inits=c(0,0,0), method="BFGS")
KFS(fit$model)

and then one new consecutive observation becomes available. What code can I use to update my model and get a new prediction?

Filtered estimator VS. one-step ahead predictor

Hello, I have two questions about KFAS. The first one is about filtered estimator(a_t|t) versus one-step ahead predictor(a_t).

In Time Series Analysis by State Space Methods by Durbin and Koopman, a_t is the one step ahead predictor of alpha_t, a_t|t is the filtered estimator of alpha_t. In other words, a_t is E(alpha_t | y_t-1, ..., y_1), a_t|t is E(alpha_t | y_t, ..., y_1). For local level model, a_t+1 = a_t|t. But for general linear Gaussian, a_t|t != a_t+1.

My question is: which one is the a_t|t in KFS results? I only find two state estimations, a and alphahat. And a is a_t, alphahat is smoothed one. It seems you take a also as filtered estimation. But I think they are different?

Secondly, I want to confirm the definition of univariate approach. I am using Kalman filter to perform Fama/French factor analysis on stock returns, so the model would be y_t = alpha + beta1_tfactor1_t + beta2_tfactor2_t + beta3_tfactor3_t + error_t (betas are state variables, y_t is a 11 vector at time t). This is a univariate model, right?

Thank you very much for your time! Look forward to your reply.

Univariate approach for maximum loglikelihood in multivariate state space models

Is it possible to calculate maximum loglikelihood for each time series in multivariate state space models in KFAS? If so, how could it be done in KFAS?
Koopman’s STAMP software provides “univariate” AIC for each time series in multivariate state space models so “univariate” maximum loglikelihood seems to be calculated there. I would like to replicate this in KFAS.

How to increase default number of iterations in fitSSM

The fitSSM function looks like this

fitSSM(model, inits, updatefn, checkfn, update_args = NULL, ...)

Writing the fitSSM in this way

fitSSM(inits = inits, model = modelH, method = "BFGS")

helps control the method of optim. What further arguement should be added to change the number of iterations as well if I want more iterations than the default 100 which BFGS provides

Limiting the number of trig functions in SSMseasonal using multiple SSMcycle

Thank you again for all you work on this! I haven't seen this mentioned in any of the documentation, but am I correct that I can limit the number of trigonometric functions / harmonics in SSMseasonal by using multiple SSMcycles? For example, SSMseasional(period=12, sea.type=c("trigonometric") produces 11 functions, (6 harmonics with sea_trig*6 = 0). If I want only use the first 4 functions (2 harmonics), I could do something like SSMcycle(period=12) + SSMcycle(period=6). Is this correct, or am I missing something? Thanks again!

incude multiple trends in SSMseasonal() function

I was trying to include multiple seasonality in the SSModel. For example, I would like include both monthly and weekly seasonality in my daily time series data. Is it possible with a single SSMseasonal component, or is it necessary to inlcude two of them in order to capture each seasonality type separately?

KFS smoothed variance bug (?)

The following code suggests that the V_mu diagonal term returned by KFS is off by the variance of the observation equation. (The example is univariate, but for the multivariate case this is still the case, that is, the returned diagonal entry of the smoothed variance of the observation are off by the diagonal term of H).

## Test KFAS vs DLM

library(KFAS)
library(dlm)

## Generate fake data

set.seed(1)
theta <- array(0, 101)
y <- array(0, 101)
for (j in 1:100) {
  theta[j + 1] = 0.9 * theta[j] + rnorm(1)
  y[j + 1] <- theta[j + 1] + rnorm(1)
}
Y <- y[-1]

Z <- matrix(1)
T <- matrix(0.9)
R <- matrix(1)
Q <- matrix(1)
H <- matrix(1)
a1 <- matrix(0)
P1 <- matrix(1)

Y <- matrix(y[-1])

model_kfas = SSModel(Y ~ -1 + SSMcustom(Z  = Z,
                                    T  = T,
                                    R  = R,
                                    Q  = Q,
                                    a1 = a1,
                                    P1 = P1), 
                H  = H)


m <- attr(model_kfas, "m")
p <- attr(model_kfas, "p")
k <- attr(model_kfas, "k")

n.ahead <- 12
states <- as.integer(1:attr(model_kfas, "m"))
timespan <- 1:attr(model_kfas, "n")
endtime <- end(model_kfas$y)

timespan <- attr(model_kfas, "n") + 1:n.ahead
n <- attr(model_kfas, "n") <- attr(model_kfas, "n") + as.integer(n.ahead)
endtime <- end(model_kfas$y) + c(0, n.ahead)
model_kfas$y <- window(model_kfas$y, end = endtime, extend = TRUE)

PRED_KFAS <- KFS(model = model_kfas, smoothing = c("mean", "state"))

model_dlm <- dlm(m0 = a1, C0 = P1, FF = Z, V = H, GG = T, W = Q)
PRED_DLM <- dlmForecast(dlmFilter(Y, model_dlm), nAhead = 12)


all.equal(c(PRED_DLM$a), PRED_KFAS$muhat[101:(100+n.ahead)])

## This fail
all.equal(unlist(PRED_DLM$Q), PRED_KFAS$V_mu[,,101:(100+n.ahead)])

## This is correct
all.equal(unlist(PRED_DLM$Q), PRED_KFAS$V_mu[,,101:(100+n.ahead)] + H)

Number state names in case of multiple instances of same component

One can build a model which contains for example two cycle components with different periods. States from these components should be identified somehow. Now

> print(SSModel(rnorm(1)~SSMcycle(period=10)+SSMcycle(period=15)))
Call:
SSModel(formula = rnorm(1) ~ SSMcycle(period = 10) + SSMcycle(period = 15))

State space model object of class SSModel

Dimensions:
[1] Number of time points: 1
[1] Number of time series: 1
[1] Number of disturbances: 1
[1] Number of states: 5
Names of the states:
[1]  (Intercept)  cycle        cycle*       cycle        cycle*     
Distributions of the time series:
[1]  gaussian

Object is a valid object of class SSModel.```

Fitting a Multivariate Local Linear Trend model

Dear Helske,

I'm trying to fit a Local Linear Trend model but unfortunately I do not understand the update function and the initial values in the fitSSM line.
Can you please help me?

covid_new <- ts(data = covid_new, frequency = 1, start = c(1))
ts.plot(window(covid_new[, 1:4]), col = 1:4,
ylab = "Log first differences confirmed cases",
xlab = "Time")
legend("topright",col = 1:4, lty = 1, legend = colnames(covid_new)[1:4])

covid_new <- window(covid_new, start = 1, end = 145)
model <- SSModel(covid_new[, 1:4] ~ SSMtrend(2, Q = list(matrix(NA, 4, 4), matrix(NA, 4, 4))), H = matrix(NA, 4, 4))

updatefn <- function(pars, model, ...) {
Q <- diag(exp(pars[1:4]))
Q[upper.tri(Q)] <- pars[5:8]
model["Q", etas = "level"] <- crossprod(Q)
Q <- diag(exp(pars[9:12]))
Q[upper.tri(Q)] <- pars[13:16]
model["Q", etas = "slope"] <- crossprod(Q)
model
}

init <- chol(cov(covid_new[, 1:4]) / 10)
fitinit <- fitSSM(model, updatefn = updatefn,
inits = rep(c(log(diag(init)), init[upper.tri(init)], init[upper.tri(init)]), 3),
method = "BFGS")
-fitinit$optim.out$val

fit <- fitSSM(model, updatefn = updatefn, inits = fitinit$optim.out$par, method = "BFGS", nsim = 250)
-fit$optim.out$val

varcor <- fit$model["Q", etas = "level"]
varcor[upper.tri(varcor)] <- cov2cor(varcor)[upper.tri(varcor)]
print(varcor, digits = 2)

varcor <- fit$model["Q", etas = "slope"]
varcor[upper.tri(varcor)] <- cov2cor(varcor)[upper.tri(varcor)]
print(varcor, digits = 2)

(out <- KFS(fit$model, nsim = 1000))

Kind regards,

Vera Vial

Tests

The amount and complexity of code in KFAS has slowly increased so much that it is pretty impossible to keep track of everything without proper testing (and version control). So lots of unit tests need to be added for finding and preventing current and future bugs.

model components in the KFAS package

Hi! My research is primarily on SSM using KFAS but I always wondered if the model components were stochastic in KFAS? is there a specific method to differentiate in terms of building a model for stochastic and non stochastic model coefficients? I know there is a specific way to do that using DLM but not sure with KFAS

Extrapolating covariate for hierarchical model

Hello again! I am trying to model a variable (y) as a function of a trend and a covariate (x). However, I do not have values of the covariate for all future predictions of y. I've come up with brute-force way of predicting future Xs, and then passing those simulations to multivariate model to get future predictions of Y. However, this is very slow and not very elegant. I'm thinking there must be a more straight-forward way to do this. (Note, I'm doing it all in log-space in order to capture geometric growth.) Do you have any suggestions? I've attached sample code below. Thanks! Cheers, Andy

library(reshape2)
library(KFAS)

MyData <- data.frame(Year = 2011:2020,
MyY = c(3437,4422,4665,4983,5459,5925,NA,NA,NA,NA),
MyX = c(2242,3033,3158,3348,3602,4102,4135,NA,NA,NA))

Build the model for X

MyModelX <- SSModel((log(MyData$MyX)) ~
SSMtrend(degree=2,Q = list(matrix(NA),matrix(NA))), H=NA)

Fit the model for X

MyFitX <- fitSSM(MyModelX, inits = c(rep(0, 3)),method="L-BFGS-B")

Simulate the singal for X

MySimsSignalX <- simulateSSM(MyFitX$model, type = "signals", nsim = 100,
antithetics = FALSE)

Simulate the epsilons for X

MySimsEpsilonX <- simulateSSM(MyFitX$model, type = "epsilon", nsim = 100,
antithetics = FALSE, conditional = FALSE)

Calculate the future values for X

MySimsX <- data.frame((MySimsSignalX[ , 1, ] + MySimsEpsilonX[ , 1, ]))

Replace simulated past values with actual observed values of X

MySimsX[1:7,] <- log(MyData$MyX[1:7])

Reformat data for multivariate KFAS

datay <- matrix(log(MyData$MyY),nrow=length(MyData$MyY),ncol=100)
meltedX <- melt(MySimsX)
datax <- split(meltedX,meltedX$variable)

Build model for trend in Y along with regression on log(X)

MyModelY <- SSModel(datay ~-1+SSMregression(rep(list(~value),100),Q=matrix(NA,1,1),type="common",data=datax) +
SSMtrend(degree=2,type="common",Q=c(NA,NA)),
H=diag(NA,100))

Fit the model

MyFitY <- fitSSM(MyModelY, inits = c(rep(0, 105)),method="L-BFGS-B")

Simulate the signal for Y

MySimsSignalY <- simulateSSM(MyFitY$model, type = "signals", nsim = 50,
antithetics = FALSE)

Simulate the epsilons for Y

MySimsEpsilonY <- simulateSSM(MyFitY$model, type = "epsilon", nsim = 50,
antithetics = FALSE, conditional = FALSE)

Calculate the forecasted Ys

MySimsY <- data.frame(exp(MySimsSignalY + MySimsEpsilonY))

Regression in multivariate LLT model

Dear,

My dataset consists 4 time series. I already fitted a Local Linear Trend on this data but now I want to add a regression to this model. The only problem is that I want to regress index_new[, 1] on covid_new[, 1], index_new[, 2] on covid_new[, 2] and so on. Is that possible using SSMregression?

Weighted regression

I'm wondering if it's possible in SSMregression() to do weighted regression so that the observations are weighted via a vector or matrix of weights. My model is a binomial model if that makes a difference

Also, thanks for the package! This has helped me tackle a very messy state-space model.

Change "data" in an already defined SSModel

Is there a way where we can change underlying data specified in SSModel function or copy the entire contents of the model into a new model and specify new data then?

Major bug in SSMcycle

Testing schemes start to give results, I just found and corrected a major bug in SSMcycle which basically caused erroneus models in all cases due to improper definition of system matrix T. There should be - sign in lower and not in upper diagonal.

Why should standard auxiliary residuals ever be NA?

Hi, I am replicating a little program of mine that was built with dlm package with KFAS (which, in my experience, is much faster, congratulations).

In this program I use auxiliary residuals to detect interventions. In dlm package there is no function (as far as I know) to calculate auxiliary residuals so I had to create my own, but with KFAS i would rather use the package rstandard function.

Comparing the two functions I get, as expected, very similar results, but for some cases rstandard returns just a bunch of NA values while my function return actual numbers. This mostly happen when I am calculating "state" residuals and the variance is very close to 0.

Why should this happen? Why should a standard auxiliary residual ever be NA?

Inspecting you code is easy to see what is happening, in state_standardized function we have:

D <- sqrt((object$model$Q[, , i * tvq + 1] - object$V_eta[, , i])[1 + 0:(k - 1) * (k + 1)])
pos <- D > sqrt(.Machine$double.eps) * max(D, 0)
eta[i, pos] <- eta[i, pos] / D[pos]
eta[i, !pos] <- NA

eta[i, ] is NA because D is less than sqrt(.Machine$double.eps) * max(D, 0). I just fail to see the meaning of this, what are we testing? Is this to avoid some bug?

In my specific case: etahat[1,3] = 1.44e-14 and D[3] = 1.16e-13, I was expecting to get standard eta as 1.44e-14/1.16e-13 = 0.12, but I get NA.

state first moment condition on y(1:t) output missing from KFAS

I am using the KFAS instead of MARSSkfss because my data is non-gaussian. in MARSSkfss i get as part of the output the state of the first moment condition on E[x(t):y(1:t)](there it is called xtt) whereas in the KFAS it gives the output of E[x(t):y(1:t-1)]. Is it possible to extract this expectation from the KFAS package? the output in the KFS it is also missing.

daily time series data runs very slow on SSM

Hello,
i have tried analyzing data of all frequencies in KFAS and daily data runs extremely slow and almost takes about 3 hours most of the time to stop running. i have used frequency of 365 and 250 to exclude all the holidays when analyzing stock price data and it is the same. is there a way to get results faster?

Documentation about Poisson

In the documentation of KFAS (https://www.rdocumentation.org/packages/KFAS/versions/1.3.7/topics/KFAS), Poisson model is described as below.

y_t ~ Poisson(u_t \lambda_t)
\theta_t = \log(u_t \lambda_t)

Yet in the paper of KFAS (https://cran.r-project.org/web/packages/KFAS/vignettes/KFAS.pdf), it is described as

\theta_t = \log(\lambda).

I suspect that the latter is more appropriate than the former since u_t should be the exposure.
Is my guess correct?

Predicting with missing values in non-Gaussian case

I have noticed that in the case of predicting new data with missing values when the model is non-Gaussian your predict.SSModel throws out an error of NA's in a foreign function call. This is from a call to .Fortran() (in my case line #329 of predict.SSModel.R). It is simply fixed by changing the default argument of NAOK = FALSE to NAOK = TRUE.

It has occurred to me that this could be deliberate due to the issues surrounding prediction with missing values, but in my case I had no choice. I thought it would be wise to let you decide how to proceed with this issue - perhaps allow NA values and give a warning?

Many thanks.
P.S. I'd suggest that line #329 is not the only case of this problem.

No effect of missing observations on seasonal estimation error variance?

Dear Mr Helske,
I have run with KFAS a stochastic level and deterministic seasonal model with missing observations as presented in the book "An Introduction to State Space Time Series Analysis" (pages 103 - 106) of J.F. Commandeur and S.J Koopman (see my code with data below). In my results, it can be seen that the level estimation error variance increases at time points with the missing observations as presented in the book. It seems, however, that in my results, the seasonal estimation error variance does not increase for those time points as it is shown in the book. Should we also expect to get increases for the seasonal estimation error variance at time points with missing observations?

`
#Libraries
library(KFAS)
if(!(require(rstudioapi))){install.packages('rstudioapi')}
if(!(require(dplyr))){install.packages('dplyr')}
library(dplyr)

#Loading data and treating some observations as missing
dataUKdriversKSI <- read.table(text =
"
1687
1508
1507
1385
1632
1511
1559
1630
1579
1653
2152
2148
1752
1765
1717
1558
1575
1520
1805
1800
1719
2008
2242
2478
2030
1655
1693
1623
1805
1746
1795
1926
1619
1992
2233
2192
2080
1768
1835
1569
1976
1853
1965
1689
1778
1976
2397
2654
2097
1963
1677
1941
2003
1813
2012
1912
2084
2080
2118
2150
1608
1503
1548
1382
1731
1798
1779
1887
2004
2077
2092
2051
1577
1356
1652
1382
1519
1421
1442
1543
1656
1561
1905
2199
1473
1655
1407
1395
1530
1309
1526
1327
1627
1748
1958
2274
1648
1401
1411
1403
1394
1520
1528
1643
1515
1685
2000
2215
1956
1462
1563
1459
1446
1622
1657
1638
1643
1683
2050
2262
1813
1445
1762
1461
1556
1431
1427
1554
1645
1653
2016
2207
1665
1361
1506
1360
1453
1522
1460
1552
1548
1827
1737
1941
1474
1458
1542
1404
1522
1385
1641
1510
1681
1938
1868
1726
1456
1445
1456
1365
1487
1558
1488
1684
1594
1850
1998
2079
1494
1057
1218
1168
1236
1076
1174
1139
1427
1487
1483
1513
1357
1165
1282
1110
1297
1185
1222
1284
1444
1575
1737
1763
")[, 1] %>%
log() %>%
replace(c(48:62, 120:140), NA) %>%
ts(start = 1969, frequency=12)

#Defining model
model <- SSModel(dataUKdriversKSI ~ SSMtrend(degree=1, Q=list(matrix(NA))) + SSMseasonal(12, sea.type='dummy', Q=matrix(0)), H=matrix(NA))

ownupdatefn <- function(pars,model,...){
model$H[,,1] <- exp(pars[1])
diag(model$Q[,,1]) <- c(exp(pars[2]), 0)
model
}

#Fitting model
w <- 2 #Number of estimated hyperparameters (i.e. disturbance variances)
fit <- fitSSM(inits = log(rep(0.005, w)), model = model, updatefn = ownupdatefn, method = "Nelder-Mead")
outKFS <- KFS(fit$model, smoothing = c("state", "mean", "disturbance"))

#Level and seasonal estimation error variances
levEstErVar <- outKFS$V[1, 1, ] %>% ts(start = 1969, frequency=12)
seasEstErVar <- outKFS$V[2, 2, ] %>% ts(start = 1969, frequency=12)

#Figure 8.17. Stochastic level estimation error variance for log drivers KSI
#with observations at t=48,...,62 and t=120,...,140 treated as missing
plot(levEstErVar, xlab = "", ylab = "", lty = 1)
title(main = "Figure 8.17. Stochastic level estimation error variance for log drivers KSI with observations at t=48,...,62 \n and t=120,...,140 treated as missing",
cex.main = 0.8)
legend("topleft",leg = "level estimation error variance",
cex = 0.6, lty = 1, horiz = T)

#Figure 8.19. Seasonal estimation error variance for log drivers KSI
#with observations at t=48,...,62 and t=120,...,140 treated as missing
plot(seasEstErVar, xlab = "", ylab = "", lty = 1)
title(main = "Figure 8.17. Seasonal estimation error variance for log drivers KSI with observations at t=48,...,62 \n and t=120,...,140 treated as missing",
cex.main = 0.8)
legend("topleft",leg = "seasonal estimation error variance",
cex = 0.6, lty = 1, horiz = T)

Modify the distribution of observation errors

Hi, I would like to know if there is a way to implement a SSModel in which the observation errors follow a heavy-tailed distribution like Student's t, instead of the Gaussian distribution.

Restrict coefficients in gaussian state space models

Can we specify or impose restrictions on regression coefficients in state space models? May be, put additional constraints on the optimization algorithm in fitSSM?

Request, not an issue ...

It would be great if, in addition to the Poisson, Negative Binomial, Gamma, and Gaussian distributions, KFAS could support the t-distribution.

I have the repository cloned, using Atlassian BitBucket and look forward to any progress you make in this regard.

Count me in for testing any beta code.

Thank you, and Happy New Year.

Estimating the dispersion parameter for the negative binomial

Hello! I've been exploring KFAS for a few days now, and it's simply amazing! Thank you for creating it! One thing I have yet to figure out how to do is estimate the dispersion parameter for the negative binomial. u(t) appears to be set to 1 when I fit the model with fitSSM. For now, I have:

model2 <- SSModel(TERMS_ts ~ SSMseasonal(period=12,Q=NA,sea.type = c("dummy")) + SSMtrend(degree=2,Q = list(matrix(NA), matrix(NA))),distribution="negative binomial")

fit_silly2 <- fitSSM(model2,inits=c(rep(0,13)))

When I extract fit_silly2$model$u, it's just a time-series of 1s.

Is there a way to have the model estimate the dispersion parameter as well? My understanding is that I need this in order to estimate a function (e.g., sum of the observations from the most recent 4 time points) of the prediction values (e.g., apply the negative binomial observation model to the signals produced from the importanceSSM())

Thank you for your help!

T_t unknown

Hello helske,

Is it possible to include unknown parameters in system matrix T_t? If so, how to specify T_t matrix?
I tried to use NA as unknown parameters:

Tt <- matrix(c(NA,0,0,1), nrow = 2, ncol = 2)

But I get an error when running fitSSM,:

is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)), : System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 10000000

What could I do to solve this ? Thank you!

PS: model: y_t = gamma + Z_t * beta_t + eps_t
beta_t+1 = T_t * beta_t + R_t * eta_t
in the model, gamma and beta_t are states, y_t, Z_t, R_t is known.

ymiss and yt arrays in kfilter2

Hello again. If it is not clear, I am enjoying working with your package! Unfortunately I do not have a working example of this suggestion.

This problem regards the reading of variables in the subroutine kfilter2 (and old kfilter). During the routines, ymiss and yt are declared to be dimension(n,p) and in calls to dfilter1step and filter1step, the following line is used:
dfilter1step(ymiss(d,:),yt(d,:),...).
Sometimes I believe there are issues in the the way Fortran reads variables (perhaps related to an incorrect storage mode) and it is not always the case that rows will be taken using ymiss(d,:). Problems can occur when, instead of behaving as expected, Fortran stacks columns of ymiss and yt on top of each other and takes, in this case, the next p values to feed dfilter1step (not matching up with the dth row values).

I wonder if you have come across this behaviour before, due to the different declarations in the subroutine gloglik. ymiss and yt are declared to be dimension(p,n) here and later columns are used instead:
dfilter1step(ymiss(:,d),yt(:,d),...).
This will not have the same problem as default behaviour will result in correct values taken at each step.

My question is this: Is there a reason for one routine to use dimension(n,p) and the other to use dimension(p,n)? I believe it is only with incorrectly declared variables that this issue will appear, but it may be worth fixing nevertheless.

Many thanks!

Converting Fortran codes to C++ with help of Rcpp

Big issue which may not ever happen due to time limitations, but I would like to drop Fortran in favor of C++, although my reasons are not completely clear besides personal growth by learning C++ better. Not sure about performance benefits. Using sparse matrix presentation of Matrix package together with C++ codes would probably be useful, and add S4 to the mix for total overhaul.

Unable to correctly estimate ARMA parameters

Dear Dr. Helske,
I am having trouble using the KFAS package to estimate ARMA coefficients. I did a test with the Nile river data, and the estimated coefficients differ a lot from what can be estimated by the standard arima in R. The codes are reproduced below:

data(Nile)
Nile = diff(Nile)
model = SSModel( Nile ~ SSMarima(ar=c(0.1,0.2),ma=c(0.1)), H=1e-7 )
fn <- function(pars,model,...){

update T (which contains AR coefficients) and R (which contains MA coefficients)

model$T[2,2,] <- pars[1]
model$T[3,2,] <- pars[2]
model$R[3,,1] <- pars[3]

update the ARIMA Q

model$Q[1,1,1] <- exp( pars[4] )

model
}
inits = c(0.1,0.5,0.5,100)
modelFit<-fitSSM( model=model, inits = inits,
updatefn = fn, method='BFGS', control=list(REPORT=1,trace=1) )

the above does not coincide with the estimation result of

f = arima(x =Nile, order = c(2,0,1))

Could you take a look at this and see where I went wrong? Your help will be greatly appreciated,
Yaoping Wang
PhD student in the Ohio State University

Inverse of artransform

I suggest implementing a function to compute the inverse of artransform.

Requests - Options for no intercepts and not storing data

I am working on some large, complex models, and to do so I am using SSMCustom(). But when I do so, intercepts are automatically added, even though I don't want them, and there doesn't appear to be an option to suppress them. It would be nice if a custom SSM is just that, where I can define all the elements without having anything else added, or at least to have the option to do so.

Also by default, KFS() stores the data in the return. For a very large problem, I would like to have the option to not have the data stored, but read from the data array. Without this option, I have two copies of the data in memory, and while the data array might fit into memory, 2 copies of it may well push the limits.

Thanks,

-Roy

Use hasName

From R-devel: "Convenience function ‘hasName()’ has been added; it is intended to replace the common idiom ‘!is.null(x$name)’ without the usually unintended partial name matching. "

I have used the latter method several times in KFAS and the debugging has been pain... And due to partial name matching I have been resorting this kind of ugliness:
(!is.null(x[["a", exact = TRUE]]) || !is.null(x$alphahat))

helske / kfas Goto Github PK

kfas's Introduction

KFAS: R Package for Exponential Family State Space Models

Main features

kfas's People

Contributors

Stargazers

Watchers

Forkers

kfas's Issues

My R code is attached.

KFAS, P-5 ----

check Koopman P-85, (4.24)

[,1] [,2]

Slope estimates

Build the model for X

Fit the model for X

Simulate the singal for X

Simulate the epsilons for X

Calculate the future values for X

Replace simulated past values with actual observed values of X

Reformat data for multivariate KFAS

Build model for trend in Y along with regression on log(X)

Fit the model

Simulate the signal for Y

Simulate the epsilons for Y

Calculate the forecasted Ys

update T (which contains AR coefficients) and R (which contains MA coefficients)

update the ARIMA Q

the above does not coincide with the estimation result of

Recommend Projects

Recommend Topics

Recommend Org

Jobs