ellisp / forecastxgb-r-package Goto Github PK
View Code? Open in Web Editor NEWAn R package for time series models and forecasts with xgboost compatible with {forecast} S3 classes
License: GNU General Public License v3.0
An R package for time series models and forecasts with xgboost compatible with {forecast} S3 classes
License: GNU General Public License v3.0
I tried running xgbar
on a small time series -
y <- structure(c(11.3709584471467, 9.43530182860391, 10.3631284113373,
10.632862604961, 10.404268323141, 9.89387548390852, 11.5115219974389,
9.9053409615869, 12.018423713877, 9.93728590094758, 11.3048696542235,
12.2866453927011, 8.61113929888766), .Tsp = c(1, 2, 12), class = "ts")
xgbar(y)
Executes with the following error -
Error in
[.default
(origy, -(1:(maxlag))) :
only 0's may be mixed with negative subscripts
In addition: Warning message:
In xgbar(y) :
y is too short for 24 to be the value of maxlag. Reducing maxlags to -2 instead.
I think this error can be fixed with a simple check for negative maxlag.
forecastxgb-r-package/pkg/R/xgbar.R
Line 166 in fe2894a
if (maxlag < 0 ) {
stop('Try a longer time-series as maxlag is negative')
}
from the CRAN checks:
1. Error: Modulus transform works when lambda = 1 or 0 (@test-modulus-transform.R#21)
object 'y' not found
1: expect_equal(y, JDMod(y, lambda = 1)) at testthat/test-modulus-transform.R:21
2: compare(object, expected, ...)
testthat results ================================================================
OK: 25 SKIPPED: 0 FAILED: 1
1. Error: Modulus transform works when lambda = 1 or 0 (@test-modulus-transform.R#21)
not yet supported
Building on the problems in #20 that have been fixed but probably not very well. for series of length < (f * 3 + 1), or perhaps even some higher amount, should probably not introduce seasonal dummy variables.
It looks like you've already setup the coviates to feed into xgboost using lagged values of the timseries--a sensible approach. It could also make sense to included fixed effects for day of week/month/etc.
I started building a package designmatrix
a few years back to generate xreg
values to feed into forecasting models and anticipated using it with forecastHybrid
eventually. It is barely off the ground, but the basic idea is to make it easy to generate covariates for day of week, weekend, month, quarter, etc. Eventually interactions and holidays for these would be nice as well. If you want to import it, it could serve as a good excuse for me to restart and to finish development. Take a look here: https://github.com/dashaub/designmatrix
for example
> thedata <- subset(tourism, "quarterly")[[36]]
> mod1 <- xgbar(thedata$x, trend_method = "differencing", seas_method = "decompose")
Show Traceback
Rerun with Debug
Error in xgb.DMatrix(data, label = label) :
There are NAN in the matrix, however, you did not set missing=NAN
need to bypass all the seasonal dummy variables
Hi,
thanks for this great package and the new approach option for forecasting time series.
But I've ran into two problems with two different time series, while others are working without any problems.
1:
Error in x[, maxlag + 1] <- time(y2) :
number of items to replace is not a multiple of replacement length
2:
Error in x[, maxlag + 2:f] <- seasons :
number of items to replace is not a multiple of replacement length
In addition: Warning message:
In xgbts(y = ...) :
y is too short for cross-validation. Will validate on the most recent 20 per cent instead.
Using the stlf
function from the forecasting package works without any errors.
Can you explain me what causes the errors and how to avoid them to enable xgb forecasting?
Thanks in advance! 👍
I have installed forecastxgb from GitHub, but forecast.xgbar() is unavailable even when the package is loaded into the workspace. The version of R being used is 3.5.0.
Reproducible example:
`install_github("ellisp/forecastxgb-r-package/pkg")
library(forecastxgb)
sample_ts <- ts(sample(8:30, replace = TRUE, size = 25))
xgbar_season <- xgbar(sample_ts)
fcast <- forecast.xgbar(xgbar_season)`
This returns the error:
Error in forecast.xgbar(xgbar_season) :
could not find function "forecast.xgbar"
xgbar() works fine. Additionally, the help files for the forecastxgb functions are unavailable and return an error saying that the forecastxgb.rdb file is corrupt.
When I type the code - "model <- xgbar(gas)",
some information about errors and warnings came out:
"Error in begin_iteration:end_iteration :
result would be too long a vector
In addition: Warning messages:
1: 'early.stop.round' is deprecated.
Use 'early_stopping_rounds' instead.
See help("Deprecated") and help("xgboost-deprecated").
2: In min(cv$test.rmse.mean) :
no non-missing arguments to min; returning Inf
3: In min(which(cv$test.rmse.mean == min(cv$test.rmse.mean))) :
no non-missing arguments to min; returning Inf"
This's my first attempt in R and forecastxgb, so I have no idea about how to handle it.
Is it possible to help me ? Thank you.
not sure how to do this because I don't know of a large scale testing suite of data equivalent to Mcomp with explanatory variables
eg
gold_model <- xgbar(gold, maxlag = 100)
Show Traceback
Rerun with Debug
Error in xgb.DMatrix(data, label = label) :
There are NAN in the matrix, however, you did not set missing=NAN
this would make it consistent with nnetar.
I have solved this problem~ thanks
forecast.xgbts()
throws a warning if no h
is provided and defaults to 24. You might want to save the frequency of the input time series in the xgbts
object and default to 2 * frequency(inputSeries)
as used in the "forecast" package.
> a <- xgbts(AirPassengers)
Stopping. Best iteration: 43
> forecast(a)
No h provided so forecasting forward 24 periods.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
1961 454.0111 446.6804 444.8749 503.9522 535.9165 621.6365 621.3412 603.3748 556.0723 474.5930
1962 494.8933 477.6807 470.5114 553.3421 621.3992 621.3412 621.3412 621.3412 602.2322 522.1175
Nov Dec
1961 419.3743 450.0060
1962 427.1246 468.4285
> b <- auto.arima(AirPassengers)
> forecast(b)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 1961 446.7582 431.6858 461.8306 423.7070 469.8094
Feb 1961 420.7582 402.5180 438.9984 392.8622 448.6542
Mar 1961 448.7582 427.8241 469.6923 416.7423 480.7741
Apr 1961 490.7582 467.4394 514.0770 455.0952 526.4212
May 1961 501.7582 476.2770 527.2395 462.7880 540.7284
Jun 1961 564.7582 537.2842 592.2323 522.7403 606.7761
Jul 1961 651.7582 622.4264 681.0900 606.8991 696.6173
Aug 1961 635.7582 604.6796 666.8368 588.2275 683.2889
Sep 1961 537.7582 505.0258 570.4906 487.6983 587.8181
Oct 1961 490.7582 456.4516 525.0648 438.2908 543.2256
Nov 1961 419.7582 383.9466 455.5698 364.9891 474.5273
Dec 1961 461.7582 424.5023 499.0141 404.7803 518.7361
Jan 1962 476.5164 431.4567 521.5761 407.6036 545.4292
Feb 1962 450.5164 400.9938 500.0390 374.7781 526.2547
Mar 1962 478.5164 424.9010 532.1318 396.5188 560.5141
Apr 1962 520.5164 463.0993 577.9335 432.7045 608.3283
May 1962 531.5164 470.5341 592.4987 438.2520 624.7808
Jun 1962 594.5164 530.1661 658.8667 496.1011 692.9317
Jul 1962 681.5164 613.9659 749.0670 578.2068 784.8261
Aug 1962 665.5164 594.9105 736.1223 557.5340 773.4988
Sep 1962 567.5164 493.9820 641.0508 455.0552 679.9776
Oct 1962 520.5164 444.1657 596.8671 403.7481 637.2847
Nov 1962 449.5164 370.4497 528.5831 328.5943 570.4385
Dec 1962 491.5164 409.8239 573.2089 366.5785 616.4543
I am very interested in using this package to do some forecasting. Is this package still actively maintained?
for example, here are the four different seasonal adjustment methods with differencing on:
model5 <- xgbar(AirPassengers, maxlag = 24, trend_method = "differencing", seas_method = "dummies")
model6 <- xgbar(AirPassengers, maxlag = 24, trend_method = "differencing", seas_method = "decompose")
model7 <- xgbar(AirPassengers, maxlag = 24, trend_method = "differencing", seas_method = "fourier")
model8 <- xgbar(AirPassengers, maxlag = 24, trend_method = "differencing", seas_method = "none")
fc5 <- forecast(model5, h = 24)
fc6 <- forecast(model6, h = 24)
fc7 <- forecast(model7, h = 24)
fc8 <- forecast(model8, h = 24)
par(mfrow = c(2, 2), bty = "l")
plot(fc5, main = "dummies"); grid()
plot(fc6, main = "decompose"); grid()
plot(fc7, main = "fourier"); grid()
plot(fc8, main = "none"); grid()
library(forecastxgb)
model <- xgbar(gas)
model$y
model$y2
model$x
model$model
model$fitted
model$maxlag
model$seas_method
model$diffs
model$lambda
model$method
library(fpp)
consumption <- usconsumption[ ,1]
income <- matrix(usconsumption[ ,2], dimnames = list(NULL, "Income"))
consumption_model <- xgbar(y = consumption, xreg = income)
consumption_model$origxreg
consumption_model$ncolxreg
Can you explain the model$y、model$y2、model$x、model$model、model$fitted、model$maxlag、model$seas_method、model$diffs、model$lambda model$method、consumption_model$origxreg、consumption_model$ncolxreg? Thank you
The choice of maxlag is the most obvious way to improve overall performance. I see two ways ahead:
a. do some comprehensive testing of different values and work out a better default formula
b. let the user do it by brute force - some kind of cross-validation to choose the best value.
Probably will want to do both these - ie have good performing defaults, but also the option to determine the optimal value of the hyperparameter. Doing b. first will help with the a. anyway.
We're reluctant to add this xgboost functionality to forecastHybrid until
We might be able to mimic the approach used by forecast::nnetar.
need a way to deal with issues like this, raised in #20 :
bla_2 <- ts(runif(1076, min = 5000, max = 10000), start = c(2013, yday("2013-12-03")),
frequency = 365.25)
bla_2_XGB_model <- xgbar(y = bla_2)
It's superficially the non-integer frequency, but more broadly we need a way of handling daily data that takes into account leap years, and has a more sophisticated way than 365 or 366 dummy variables. Could draw on http://robjhyndman.com/hyndsight/dailydata/.
added as a (failing) test
test_that("works with series of 35 with frequency 12", {
bla_1 <- ts(runif(35, min = 5000, max = 10000), start = c(2013,12), frequency = 12)
expect_error(bla_1_XGB_model <- xgbts(y = bla_1), NA)
})
One of the two problems brought up in #20.
woolmod <- xgbar(woolyrnq, xreg = data.matrix(rep(1, length(woolyrnq))))
forecast(woolmod, xreg = data.matrix(rep(1, 2))) #works fine
forecast(woolmod, xreg = data.matrix(rep(1, 1))) #gives error
It appears passing param
arguments to xgboost()
and xgb.train()
don't have any impact. For example,
> library(forecastxgb)
> set.seed(3)
> a <- xgbts(AirPassengers, params = list(eta = .0001))
Stopping. Best iteration: 64
>
> set.seed(3)
> a <- xgbts(AirPassengers)
Stopping. Best iteration: 64
Any ideas what's going on here?
Hi,
is there a way to set different maxlags for xregs and for Y?
For instance, I want xregs to have a maxlag of 3 and Y to have a maxlag of 12.
Thanks!,
Nahuel
test case:
library(Mcomp)
thedata <- M1[[1]]
mod <- xgbts(thedata$x, maxlag = 1, nrounds_method = "cv")
fc <- forecast(mod, h = thedata$h)
error is in forecast.xgbts:
Error in `colnames<-`(`*tmp*`, value = c("lag1", "time")) :
length of 'dimnames' [2] not equal to array extent
or even sign(x)(boxcox(abs(x))). At least do it and see if it helps as an option or not.
at a minimum, the default v. 1.
Hi, I wonder to know how could I feed new data y to predict? Seem that the forecast function only use xgb model to predict next h ?
Can we also pass the params list to the xgbar
function? I think this is a good functionality to include. I would also like to see custom objective functions included in the call to the xgbar
function call. I think it is fairly easy to do this
The following line of code:
x[ , maxlag + 1] <- time(y2)
returns this error message:
Error in x[, maxlag + 1] <- time(y2) : number of items to replace is not a multiple of replacement length
It appears that the error is caused by x and y2 having inconsistent lengths from R's handling of indexing with decimals, in the event that (as seems to usually be the case) maxlag is a floating point number.
Consider the outcome if maxlag = 54.75 and orign = 120:
n <- orign - maxlag y2 <- ts(origy[-(1:(maxlag))], start = time(origy)[maxlag + 1], frequency = f)
n will be 120 - 54.75 = 65.25. In determining the length of y2 with the decimal indexing of maxlag, R rounds the index of 54.75 down to 54, which causes y2 to be of length 120 - 54 = 66.
However, when the matrix x is created, n = 65.25 is used for the number of rows. R rounds this number down to the nearest integer less than this value, 65, which creates a matrix with 65 rows:
ncolx <- ifelse(seas_method == "dummies", maxlag + f, maxlag + 1) x <- matrix(0, nrow = n, ncol = ncolx)
Thus, y2 is of length 66, and x has 65 rows, which causes a "number of items to replace is not a multiple of replacement length" error when this line is run:
x[ , maxlag + 1] <- time(y2)
have a careful look at making it consistent with arima, nnetar, etc if possible
at a minimum, "none" and "cyclic" (in which case, question will be how to decide shape of the cycle). Should help #22.
model2 <- xgbar(co2, seas_method = "decompose")
plot(forecast(model2), main = "Decomposition seasonal adjustment for seasonality")
the in-sample accuracy is astonishingly and suspiciously good. Needs thorough checking. It might be though that proper investigation of #6 reveals the strengths and weaknesses.
Hi.
I'd like know if is possible change the training period in function xgbar.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.