GithubHelp home page GithubHelp logo

ellisp / forecastxgb-r-package Goto Github PK

View Code? Open in Web Editor NEW
139.0 139.0 42.0 446 KB

An R package for time series models and forecasts with xgboost compatible with {forecast} S3 classes

License: GNU General Public License v3.0

R 26.77% HTML 73.23%

forecastxgb-r-package's People

Contributors

ellisp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

forecastxgb-r-package's Issues

Maxlag is negative for short time series

I tried running xgbar on a small time series -

y <- structure(c(11.3709584471467, 9.43530182860391, 10.3631284113373, 
10.632862604961, 10.404268323141, 9.89387548390852, 11.5115219974389, 
9.9053409615869, 12.018423713877, 9.93728590094758, 11.3048696542235, 
12.2866453927011, 8.61113929888766), .Tsp = c(1, 2, 12), class = "ts")

xgbar(y)

Executes with the following error -

Error in [.default(origy, -(1:(maxlag))) :
only 0's may be mixed with negative subscripts
In addition: Warning message:
In xgbar(y) :
y is too short for 24 to be the value of maxlag. Reducing maxlags to -2 instead.

I think this error can be fixed with a simple check for negative maxlag.

maxlag <- orign - f - round(f / 4)

if (maxlag < 0 ) {
    stop('Try a longer time-series as maxlag is negative')
}

lambda problem

from the CRAN checks:

 1. Error: Modulus transform works when lambda = 1 or 0 (@test-modulus-transform.R#21) 
  object 'y' not found
  1: expect_equal(y, JDMod(y, lambda = 1)) at testthat/test-modulus-transform.R:21
  2: compare(object, expected, ...)
  
  testthat results ================================================================
  OK: 25 SKIPPED: 0 FAILED: 1
  1. Error: Modulus transform works when lambda = 1 or 0 (@test-modulus-transform.R#21) 

better treatment of seasons for short series

Building on the problems in #20 that have been fixed but probably not very well. for series of length < (f * 3 + 1), or perhaps even some higher amount, should probably not introduce seasonal dummy variables.

possibly use designmatrix package

It looks like you've already setup the coviates to feed into xgboost using lagged values of the timseries--a sensible approach. It could also make sense to included fixed effects for day of week/month/etc.

I started building a package designmatrix a few years back to generate xreg values to feed into forecasting models and anticipated using it with forecastHybrid eventually. It is barely off the ground, but the basic idea is to make it easy to generate covariates for day of week, weekend, month, quarter, etc. Eventually interactions and holidays for these would be nice as well. If you want to import it, it could serve as a good excuse for me to restart and to finish development. Take a look here: https://github.com/dashaub/designmatrix

decompose with non-complete cycles creates NA problems

for example

> thedata <- subset(tourism, "quarterly")[[36]]
> mod1 <- xgbar(thedata$x, trend_method = "differencing", seas_method = "decompose")
 Show Traceback
 
 Rerun with Debug
 Error in xgb.DMatrix(data, label = label) : 
  There are NAN in the matrix, however, you did not set missing=NAN 

Error in x[, maxlag + 1] <- time(y2) & Error in x[, maxlag + 2:f] <- seasons

Hi,
thanks for this great package and the new approach option for forecasting time series.
But I've ran into two problems with two different time series, while others are working without any problems.
1:

Error in x[, maxlag + 1] <- time(y2) : 
  number of items to replace is not a multiple of replacement length

2:

Error in x[, maxlag + 2:f] <- seasons : 
  number of items to replace is not a multiple of replacement length
In addition: Warning message:
In xgbts(y = ...) :
  y is too short for cross-validation.  Will validate on the most recent 20 per cent instead.

Using the stlf function from the forecasting package works without any errors.
Can you explain me what causes the errors and how to avoid them to enable xgb forecasting?

Thanks in advance! 👍

forecast.xgbar() is inaccessible with R 3.5.0

I have installed forecastxgb from GitHub, but forecast.xgbar() is unavailable even when the package is loaded into the workspace. The version of R being used is 3.5.0.

Reproducible example:

`install_github("ellisp/forecastxgb-r-package/pkg")
library(forecastxgb)

sample_ts <- ts(sample(8:30, replace = TRUE, size = 25))

xgbar_season <- xgbar(sample_ts)

fcast <- forecast.xgbar(xgbar_season)`

This returns the error:

Error in forecast.xgbar(xgbar_season) :
could not find function "forecast.xgbar"

xgbar() works fine. Additionally, the help files for the forecastxgb functions are unavailable and return an error saying that the forecastxgb.rdb file is corrupt.

When I run the demo code, an error occurried - "result would be too long a vector"

When I type the code - "model <- xgbar(gas)",
some information about errors and warnings came out:
"Error in begin_iteration:end_iteration :
result would be too long a vector
In addition: Warning messages:
1: 'early.stop.round' is deprecated.
Use 'early_stopping_rounds' instead.
See help("Deprecated") and help("xgboost-deprecated").
2: In min(cv$test.rmse.mean) :
no non-missing arguments to min; returning Inf
3: In min(which(cv$test.rmse.mean == min(cv$test.rmse.mean))) :
no non-missing arguments to min; returning Inf"
This's my first attempt in R and forecastxgb, so I have no idea about how to handle it.
Is it possible to help me ? Thank you.

Handle NAs

eg

gold_model <- xgbar(gold, maxlag = 100)
 Show Traceback
 
 Rerun with Debug
 Error in xgb.DMatrix(data, label = label) : 
  There are NAN in the matrix, however, you did not set missing=NAN

default value for h in forecast.xgbts()

forecast.xgbts() throws a warning if no h is provided and defaults to 24. You might want to save the frequency of the input time series in the xgbts object and default to 2 * frequency(inputSeries) as used in the "forecast" package.

> a <- xgbts(AirPassengers)
Stopping. Best iteration: 43
> forecast(a)
No h provided so forecasting forward 24 periods.
          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug      Sep      Oct
1961 454.0111 446.6804 444.8749 503.9522 535.9165 621.6365 621.3412 603.3748 556.0723 474.5930
1962 494.8933 477.6807 470.5114 553.3421 621.3992 621.3412 621.3412 621.3412 602.2322 522.1175
          Nov      Dec
1961 419.3743 450.0060
1962 427.1246 468.4285
> b <- auto.arima(AirPassengers)
> forecast(b)
         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
Jan 1961       446.7582 431.6858 461.8306 423.7070 469.8094
Feb 1961       420.7582 402.5180 438.9984 392.8622 448.6542
Mar 1961       448.7582 427.8241 469.6923 416.7423 480.7741
Apr 1961       490.7582 467.4394 514.0770 455.0952 526.4212
May 1961       501.7582 476.2770 527.2395 462.7880 540.7284
Jun 1961       564.7582 537.2842 592.2323 522.7403 606.7761
Jul 1961       651.7582 622.4264 681.0900 606.8991 696.6173
Aug 1961       635.7582 604.6796 666.8368 588.2275 683.2889
Sep 1961       537.7582 505.0258 570.4906 487.6983 587.8181
Oct 1961       490.7582 456.4516 525.0648 438.2908 543.2256
Nov 1961       419.7582 383.9466 455.5698 364.9891 474.5273
Dec 1961       461.7582 424.5023 499.0141 404.7803 518.7361
Jan 1962       476.5164 431.4567 521.5761 407.6036 545.4292
Feb 1962       450.5164 400.9938 500.0390 374.7781 526.2547
Mar 1962       478.5164 424.9010 532.1318 396.5188 560.5141
Apr 1962       520.5164 463.0993 577.9335 432.7045 608.3283
May 1962       531.5164 470.5341 592.4987 438.2520 624.7808
Jun 1962       594.5164 530.1661 658.8667 496.1011 692.9317
Jul 1962       681.5164 613.9659 749.0670 578.2068 784.8261
Aug 1962       665.5164 594.9105 736.1223 557.5340 773.4988
Sep 1962       567.5164 493.9820 641.0508 455.0552 679.9776
Oct 1962       520.5164 444.1657 596.8671 403.7481 637.2847
Nov 1962       449.5164 370.4497 528.5831 328.5943 570.4385
Dec 1962       491.5164 409.8239 573.2089 366.5785 616.4543

"decompose" method doesn't work well in combination with differencing

for example, here are the four different seasonal adjustment methods with differencing on:

model5 <- xgbar(AirPassengers, maxlag = 24, trend_method = "differencing", seas_method = "dummies")
model6 <- xgbar(AirPassengers, maxlag = 24, trend_method = "differencing", seas_method = "decompose")
model7 <- xgbar(AirPassengers, maxlag = 24, trend_method = "differencing", seas_method = "fourier")
model8 <- xgbar(AirPassengers, maxlag = 24, trend_method = "differencing", seas_method = "none")

fc5 <- forecast(model5, h = 24)
fc6 <- forecast(model6, h = 24)
fc7 <- forecast(model7, h = 24)
fc8 <- forecast(model8, h = 24)

par(mfrow = c(2, 2), bty = "l")
plot(fc5, main = "dummies"); grid()
plot(fc6, main = "decompose"); grid()
plot(fc7, main = "fourier"); grid()
plot(fc8, main = "none"); grid()

image

the meaning of the model reults

library(forecastxgb)
model <- xgbar(gas)
model$y
model$y2
model$x
model$model
model$fitted
model$maxlag
model$seas_method
model$diffs
model$lambda
model$method
library(fpp)
consumption <- usconsumption[ ,1]
income <- matrix(usconsumption[ ,2], dimnames = list(NULL, "Income"))
consumption_model <- xgbar(y = consumption, xreg = income)
consumption_model$origxreg
consumption_model$ncolxreg
Can you explain the model$y、model$y2、model$x、model$model、model$fitted、model$maxlag、model$seas_method、model$diffs、model$lambda model$method、consumption_model$origxreg、consumption_model$ncolxreg? Thank you

Better way to choose maxlag

The choice of maxlag is the most obvious way to improve overall performance. I see two ways ahead:

a. do some comprehensive testing of different values and work out a better default formula
b. let the user do it by brute force - some kind of cross-validation to choose the best value.

Probably will want to do both these - ie have good performing defaults, but also the option to determine the optimal value of the hyperparameter. Doing b. first will help with the a. anyway.

prediction intervals

We're reluctant to add this xgboost functionality to forecastHybrid until

  1. this time series implementation has been proven to work in a wide variety of situations (eg against Mcomp and Tcomp data at a minimum) ; and
  2. we have prediction intervals of some sort.

We might be able to mimic the approach used by forecast::nnetar.

daily data - better treatment of series with high value of frequency

need a way to deal with issues like this, raised in #20 :

bla_2 <- ts(runif(1076, min = 5000, max = 10000), start = c(2013, yday("2013-12-03")), 
            frequency = 365.25)

bla_2_XGB_model <- xgbar(y = bla_2)

It's superficially the non-integer frequency, but more broadly we need a way of handling daily data that takes into account leap years, and has a more sophisticated way than 365 or 366 dummy variables. Could draw on http://robjhyndman.com/hyndsight/dailydata/.

Better handling of trends

Not currently satisfactory, as shown by

library(forecastxgb)
model <- xgbar(AirPassengers)
plot(forecast(model, h = 48))

image

short monthly series fail

added as a (failing) test

test_that("works with series of 35 with frequency 12", {
  bla_1 <- ts(runif(35, min = 5000, max = 10000), start = c(2013,12), frequency = 12)
  expect_error(bla_1_XGB_model <- xgbts(y = bla_1), NA)
})

One of the two problems brought up in #20.

Passing params to xgboost

It appears passing param arguments to xgboost() and xgb.train() don't have any impact. For example,

> library(forecastxgb)
> set.seed(3)
> a <- xgbts(AirPassengers, params = list(eta = .0001))
Stopping. Best iteration: 64
> 
> set.seed(3)
> a <- xgbts(AirPassengers)
Stopping. Best iteration: 64

Any ideas what's going on here?

MAXLAG XREGS

Hi,
is there a way to set different maxlags for xregs and for Y?
For instance, I want xregs to have a maxlag of 3 and Y to have a maxlag of 12.
Thanks!,
Nahuel

fails when maxlag = 1

test case:

library(Mcomp)
thedata <- M1[[1]]
mod <- xgbts(thedata$x, maxlag = 1, nrounds_method = "cv")
fc <- forecast(mod, h = thedata$h)

error is in forecast.xgbts:

Error in `colnames<-`(`*tmp*`, value = c("lag1", "time")) : 
  length of 'dimnames' [2] not equal to array extent

Add Box Cox option

or even sign(x)(boxcox(abs(x))). At least do it and see if it helps as an option or not.

How to predict step by step

Hi, I wonder to know how could I feed new data y to predict? Seem that the forecast function only use xgb model to predict next h ?

Hyperparameter tuning for xgboost?

Can we also pass the params list to the xgbar function? I think this is a good functionality to include. I would also like to see custom objective functions included in the call to the xgbar function call. I think it is fairly easy to do this

Bug - cannot handle non-integer frequency

xgbar.R appears to assign incorrect number of rows to matrix x for some values of maxlag

The following line of code:
x[ , maxlag + 1] <- time(y2)
returns this error message:
Error in x[, maxlag + 1] <- time(y2) : number of items to replace is not a multiple of replacement length

It appears that the error is caused by x and y2 having inconsistent lengths from R's handling of indexing with decimals, in the event that (as seems to usually be the case) maxlag is a floating point number.

Consider the outcome if maxlag = 54.75 and orign = 120:

n <- orign - maxlag y2 <- ts(origy[-(1:(maxlag))], start = time(origy)[maxlag + 1], frequency = f)

n will be 120 - 54.75 = 65.25. In determining the length of y2 with the decimal indexing of maxlag, R rounds the index of 54.75 down to 54, which causes y2 to be of length 120 - 54 = 66.

However, when the matrix x is created, n = 65.25 is used for the number of rows. R rounds this number down to the nearest integer less than this value, 65, which creates a matrix with 65 rows:

ncolx <- ifelse(seas_method == "dummies", maxlag + f, maxlag + 1) x <- matrix(0, nrow = n, ncol = ncolx)

Thus, y2 is of length 66, and x has 65 rows, which causes a "number of items to replace is not a multiple of replacement length" error when this line is run:

x[ , maxlag + 1] <- time(y2)

overfitting

the in-sample accuracy is astonishingly and suspiciously good. Needs thorough checking. It might be though that proper investigation of #6 reveals the strengths and weaknesses.

Training period

Hi.
I'd like know if is possible change the training period in function xgbar.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.