mpiktas / midasr Goto Github PK

R package for mixed frequency time series data analysis.

Home Page: http://mpiktas.github.io/midasr/

License: Other

R 100.00%

midasr's Introduction

The midasr R package provides econometric methods for working with mixed frequency data. The package provides tools for estimating time series MIDAS regression, where response and explanatory variables are of different frequency, e.g. quarterly vs monthly. The fitted regression model can be tested for adequacy and then used for forecasting. More specifically, the following main functions are available:

midas_r -- MIDAS regression estimation using NLS.
midas_nlpr -- Non-linear parametric MIDAS regression estimation.
midas_sp -- Semi-parametric and partialy linear MIDAS regression.
midas_qr -- Quantile MIDAS regression.
mls -- time series embedding to lower frequency, flexible function for specifying MIDAS models.
mlsd -- time series embedding to lower frequency using available date information.
hAh.test and hAhr.test -- adequacy testing of MIDAS regression.
forecast -- forecasting MIDAS regression.
midasr_ic_table -- lag selection using information criteria.
average_forecast -- calculate weighted forecast combination.
select_and_forecast -- perform model selection and then use the selected model for forecasting.

The package provides the usual methods for generic functions which can be used on fitted MIDAS regression object: summary, coef, residuals, deviance, fitted, predict, logLik. It also has additional methods for estimating robust standard errors: estfun and bread.

The package also provides all the popular MIDAS regression restrictions such as normalized Almon exponential, normalized beta and etc.

The package development was influenced by features of the MIDAS Matlab toolbox created by Eric Ghysels.

The package has the project webpage and you can follow its development on github.

The detailed description of the package features can be found in the JSS article.

Development

The stable versions of the package have version numbers x.y. All the stable versions are submitted to CRAN. The development versions have version numbers x.y.z.

To install the development version of midasr, it's easiest to use the devtools package:

# install.packages("devtools")
library(devtools)
install_github("midasr","mpiktas")

midasr's People

Contributors

Stargazers

Watchers

midasr's Issues

Add data parameter to shorten formulas

In lm you can enter y ~ x + b with all variable being listed in a data.frame which you pass on to lm via the data parameter.
I would like to prepare a list of "mls" matrices and do the same as in lm.

E.g.:
eq <- midas_r(
midasdata$rGDPg2 ~ midasdata$MonMulti,
start=list(MonMulti=c(1,-0.1))
)

where midasdata contains mls matrices produced with:
midasdata[[i]] <- fmls(MonMulti,ratio-1, ratio, nealmon)

eq <- midas_r(
rGDPg2 ~ MonMulti,
start=list(MonMulti=c(1,-0.1)),
data=midasdata
)

Understanding the forecasts calculated by `select_and_forecast`

In the user's guide on the pages 23-24 there is a demonstration of the function select_and_forecast.
This function calculates, among other things, forecasts according to supplied specifications. The given example calculates one-step-, two-step- and three-step-ahead out-of-sample forecasts 50 times.

In order to check my understanding, I tried to calculate the forecasts "manually" using the suggested models (every time the first of the suggested models for each horizon).

I manage to get the first values of the forecasts for each horizon:

Preparation of the data for the first forecasts:

yy<-y[1:200]
ttrend<- trend[1:200]
xx<-x[1:800]
zz<-z[1:2400]

Calculate one-step-ahead forecast per hand with cbfc$bestlist[[1]][[1]]:

m<-midas_r(yy ~ ttrend + mls(xx, 4:18, 4, nealmon) + mls(zz, 12:25, 12, nealmon),start=list(xx=rep(1,3),zz=rep(1,3)))
round(forecast(m, newdata = list(xx = rep(NA, 4), zz = rep(NA, 12),ttrend = 201)),8)==round(cbfc$forecasts[[1]]$forecast[1,1],8)
TRUE

Calculate two-step-ahead forecast per hand with cbfc$bestlist[[2]][[1]]:

mm<-midas_r(yy ~ ttrend + mls(xx, 8:21, 4, nealmon) + mls(zz, 24:38, 12, nealmon),start=list(xx=rep(1,3),zz=rep(1,3)))
round(forecast(mm, newdata = list(xx = rep(NA, 4), zz = rep(NA, 12),ttrend = 201)),8)==round(cbfc$forecasts[[2]]$forecast[1,1],8)
TRUE

Calculate three-step-ahead forecast per hand with cbfc$bestlist[[3]][[1]]:

mmm<-midas_r(yy ~ ttrend + mls(xx, 12:25, 4, nealmon) + mls(zz, 36:46, 12, nealmon),start=list(xx=rep(1,3),zz=rep(1,3)))
round(forecast(mmm, newdata = list(xx = rep(NA, 4), zz = rep(NA, 12),ttrend = 201)),8)==round(cbfc$forecasts[[3]]$forecast[1,1],8)
TRUE

Expand the data by one low-frequency period in order to calculate the next forecasts:

yye<-y[1:201]
ttrende<- trend[1:201]
xxe<-x[1:804]
zze<-z[1:2412]

Let's try to calculate the second three-sep-ahead forecast using the expanded data:

mmme<-midas_r(yye ~ ttrende + mls(xxe, 12:25, 4, nealmon) + mls(zze, 36:46, 12, nealmon),start=list(xxe=rep(1,3),zze=rep(1,3)))
forecast(mmme, newdata = list(xxe = rep(NA, 4), zze = rep(NA, 12),ttrende = 202))
21.94055

As can be seen, the result of the last commmand ist: 21.94055 but the function select_and_forecast gives: cbfc$forecasts[[3]]$forecast[2,1] : 21.96188

What am I doing wrong here?

One more question:

In the case of three-step-ahead forecasts, does the function select_and_forecast calculate forecasts for the periods 201-250, or for the periods 203:252, because the first forecast is calculated using the first 200 values of the observations, so that the first three-step-ahead forecast is for the period 203? If the first is true, the forecasting procedure must start by 198, or?

Thank you in advance!

Do forecast combinations with package ForecastCombinations

Base functions icwtab and iclagtab on function icwlagtab.

Pick the number of lags or weights from the formula and then pass it to the icwlagtab function.
Remove unnecessary printing functions.

AR* not working after latest updates

Here is the example:

 data("USrealgdp")
 y <- diff(log(USrealgdp))
 x <- window(diff(USunempr),start=1949)
 trend <- 1:length(y)
 midas_r(y~trend+fmls(x,11,12,nealmon)+mls(y,1,1,"*"),start=list(x=rep(0,3)))
 Erreur dans z[1] : objet de type 'symbol' non indiçable

Give examples for predict and forecast methods

One dynamic, one static model. Dynamic and static forecasts.

Add support for both restricted and unrestricted model fitting

At the current moment if embedlf is in the formula, it must have a restriction function. Is there a need for fitting model where we fit both unrestricted and unrestricted lags together?

Calculation of R squared

When I run a summary of the regression, I get the residual standard errors back, for example, 0.2704. Does this mean the R squared is (1-0.2704) = 0.7296?

Thank you

Forecast dates with no new data incorrect?

Hello,

The following is a Midas regression that has no new data as per p.27 of the midasr user guide. This example code is largely from the example in the forecast function.

To the best of my understanding, the forecasted time references are incorrect.

I've set the lags back 3 horizon's worth of 12 higher frequency lags (3*12) with the understanding that together with the forecast function I am forecasting doing one step ahead forecasts three times using data up to Yt-3 to forecast until Yt. Essentially, forecasting Yt-2, Yt-1, and Yt. Note that the Midas user guide only does 1 horizon with no new data.

In the terms of this code, Yt is 2011.

Because of my forecast setup, I expected the dates of the forecasted values to be 2010, 2011, and 2012. They're not. They're 2012, 2013, 2014. From checking the algebraic equation on p. 27 in the Midas User Guide and their subsequent example I can't find any reason why the dates would be 2012, 2013, 2014.

By setting my high-frequency lags back by the suggested (frq*h) amount and receiving 2012-20134 forecasts, wouldn't this infer I'm using Yt-3 to forecast Yt+1, Yt+2, and Yt+3 rather than Yt-2, Yt-1, and Yt?

It's very possible I'm misunderstanding something. Ultimately I need to know how to forecast a Q4 and a Q1 variable with data up to Q3 and an AR element (dynamic).

    set.seed(1234)
    data("USrealgdp")
    data("USunempr")

    y <- diff(log(USrealgdp))
    x <- window(diff(USunempr), start = 1949)
    trend <- 1:length(y)

    #High Frequency Variable's Frequency
    frq<-12

    ##Forecast horizon
    h <- 3

    ##Declining unemployment
    xn <- rep(-0.1, frq*h)

    ##New trend values
    trendn <- length(y) + 1:h

    ##Dynamic AR* model
    mr.dyn <- midas_r(y ~ trend + mls(y, h+1:2, 1, "*")
                      + mls(x, (h*frq)+0:11, frq, nealmon),
                      start = list(x = rep(0, 3)))
    summary(mr.dyn)

    forecast(mr.dyn, list(trend = trendn, x = rep(NA,frq*h)), method = "dynamic")

Gives the forecast output of

    #Point Forecast
    2012     0.03654885
    2013     0.01834644
    2014     0.01803171

Expand Stochastic Optimization Methods

Simulated Annealing or Metropolis derived methods shouldn't be too hard to implement. I'm aware you can use Ofunction = "optim", method = "SANN" but in my testing that yields a radically different result every time I run it due a lack of customization options. Would you consider expanding SO in the future?

Weekly Vars to forecast Monthly var

Hi, Lets say if I have a monthly variable to nowcast/forecast using 4 weekly variables.
I want to estimate the monthly value for Aug 2017. I have the weekly values of all the 4 weekly vars till Sep 1.

Could you tell me how to set up the midas eqn.
Lets say, if the monthly variable depends on its first own lag and past 6 weeks . Assuming I have the value till end of August for the weekly variables, does the following eqn make sense:
beta0 <- midas_r(y ~ mls(y, 1, 1) + mls(x, 0:6, 4, nbeta),

start = list(xx = c(1.7, 1, 5)))
I have few other doubts, could you tell me how to get the forecasted values for y for the 4 weeks of
August. Also there are times, where there will be 5 weekly values in a month. How to tackle this.
Also if I would like to automate this, how do I pass the starting lag (0 here ) as a variable.

Sorry for the long post.
Thanks

Do not normalize index in nealmon function

Now nealmon has normalized index. Revert back to non-normalized to conform to literature. Fix demos accordingly, or simply define the old function in the demos.

Add confidence interval for prediction and forecasts

For midas_u it is possible to utilise predict.lm interface. For midas_r another way must be found. In the end there should be no distinction between midas_u and midas_r, i.e. the result of midas_u should be a midas_r object.

Rename ghysels lag to multiplicative MIDAS

Allow make_ic_table to accept the output of itself as an input

Fix nbeta function. Add normalisation at the last step.

User supplied gradient

Add the feature of user supplied gradient. For each restriction gradients must be supplied separately.

Make function for choosing starting values

Follow the example of MIDAS toolbox for Matlab.

Add support for negative lags in mls function

Would be helpful to form forecasting models of the type

mls(y,-1,1)~y+fmls(x,11,12,nealmon)

Write forecasting function for the MIDAS regresion

There is already a function predict. But it does not take into account the lags. For forecasting only new data must be passed. This means that we need to keep the actual data to form necessary lag structures.

Also for AR models we need to iterate.

Make expand_weights_lags work with weigths "*" and ""

Only need to adjust starting values

Make intelligent selection of default kmin

Now kmin is taken to be zero, or the minimum lag supplied by mls. Increase that minimum to the number of parameters of weight function plus one.

How to deal with Monthly data with Daily regressors?

Hi, There.

Unlike weekly or monthly data usually have a fixed 'm' as 7, or 12, the daily data can be influenced by workday and holidays. I've now been facing with a problem that I have to make monthly data such as PMIs as the dependent variate, and, a daily based data such as trading price as the independent variates. How do I resample the daily variables in order to use the MIDAS regression function?

Make table in make_ic_table to have additional element

Having the functions expand_lag_weights and expand_ghysels to make tables for make_ic_table it makes much more sense to generate starting values also.

Implement support for factors in formula and a*b type formulas

Need a way to guess the final formula from terms, to know exactly how many variables appear after expansion. Currently I know only of ugly hack of matching text, which is not very elegant or robust.

Model selection and forecasting

Hi,

I am new to MIDAS and R. I am trying to understand how select_and_forecast works. From my understanding the command selects the best lag length for the high frequency variable by IC minimization within a class of restrictions and then goes on to test pseudo out of sample for the best selected restrictions. Am I right?

In my case I would like to estimate the following model:

Y (low frequency) would be a monthly variable (with 63 total observations available) and X (high frequency) would be a weekly variable that I am assuming is observed 4 times each time Y is observed.

I read the manual, but I couldn't figure out how incorporate the dependent variable in the model selection process without including the contemporaneous term and take into account the fact that I would need to include the dynamic pseudo out of sample forecasts when computing the accuracy measures.

On a side note, is it possible to calculate truly out of sample forecast averages like in EViews (see here). In my particular case, I need to calculate one step and two step forecasts (observations 64 and 65). Is that possible with average_forecast?

Trying some other polynomial

Hallo,
first, thank you so much for your great efforts and for making this package available.

I have a little problem here, I wrote the code for the two parameter beta function used by Eric in several publications.
see pp 11 (https://www.federalreserve.gov/pubs/feds/2006/200610/200610pap.pdf)

But I get this message each time i try to run my midas_r command.
Error in base::chol2inv(x, ...) :
element (4, 4) is zero, so the inverse cannot be computed

What does this mean and how should i handle it. thank you

Passing starting values to AR* term

Now it is not clear how to pass the starting values to AR* term. It is possible to pass to AR. The general rule that starting values should have the same names as the terms names, with exclusion of MIDAS terms with restrictions.

Consider passing functions, instead of the names in make_ic_table

Might be easier than it sounds. Leave old behaviour, by making a named list of the evaluated functions.

The problem is that it will be necessary somehow to pass these functions into environment of prepmidas_r. Maybe assign them into environment of the function body, then in theory they should exist in search path.

The restriction function stuff

Require that the first argument of the restriction function is the number of lags.
The second argument should be the parameters, maybe they should be named a la nls?
The rest are additional parameters needed for function.
The number of lags is supplied by embedlf. Pass this to restriction function.

Do gradients with madness

https://mran.revolutionanalytics.com/web/packages/madness/vignettes/introducing_madness.pdf

Built in function to plot the almon lag polynomial?

I've estimated 2 hyper parameters for an exponential almon lag polynomial, is there any easy way to plot the shape of the function? Thank you.

Investigate issues with saving

Saving lots of midasr objects sometimes can be troublesome, since they all contain environments. This can make the .RData file too large.

Write tests for making sure that correct environments are always used

Write up the exact troublesome use cases and write tests for them

Thanks for your reply!

Thanks for your reply!
As you say, i.e eq.r<-midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) , start = list(x = c(1, -0.5,)))
in this midas_r(), is the nealmon of x equal to nealmon(q=c(1,-0.5),2) ?
Any more, you say "The summary method for midas_r returns the coefficients of MIDAS restriction. The starting values for x and for z tells midas_r to use exponential polynomials of order 1 and 2 respectively, hence you get the corresponding coefficients (the first coefficient is the multiplier)".
I know the first coefficient is the multiplier of the "eq.r$midas_coefficients", but what the other coefficients mean?
Also in the result of coef(eq.r), the first coefficient is the multiplier of the "eq.r$midas_coefficients",but I don't understand what the other coefficients represent respectively.

which is the parameters of nealmon in midas_r ?

Hello, I'm sorry to that I'm not good at english. Thank you your work for the midas_r package,it's very usefull for me. But there are some problem I can't understand need your hlep.
First, I don't know which is the parameters of nealmon in the function midas_r(). like the code in the user_guide:
eq.r <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z,0:16, 12, nealmon), start = list(x = c(1, -0.5), z = c(2,0.5, -0.1)))
In this code ,which is the parameters of function nealmon use?
Second, aslo in the previous code, the lag K is 7 and 16, but I don't understand why the results of summary(eq.r) only 3 variables as follow:

Formula y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, nealmon)

Parameters:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.988196 0.115299 17.24 < 2e-16 ***

trend 0.099883 0.000777 128.57 < 2e-16 ***

x1 1.353343 0.151220 8.95 < 2e-16 ***

x2 -0.507566 0.096670 -5.25 3.3e-07 ***

z1 2.263473 0.172815 13.10 < 2e-16 ***

z2 0.409653 0.155685 2.63 0.00905 **

z3 -0.072979 0.020392 -3.58 0.00042 ***

---## Signif. codes: 0 '**' 0.001 '__' 0.01 '' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.932 on 242 degrees of freedom

Make model selection work with AR* terms

Since AR* changes other terms, it would be necessary to reevaluate the model frame for each lag and weight combination.

QAICc

Look at the package AICcmodavg and implement QAIC for midas_r objects.

Make data argument a list

Instead of having ldata, hdata arguments, let there be one data argument, which is a named list. The list elements should have names, if the element is the data.frame, its name is ignored. In the end all the required named variables will be put in the environment.

Make lag selection work with AR* model.

Add more lags for response variable, again model_frame will be reevaluated multiple times.

Fix a typo in nealmon man page

theta_h prieš \theta_h

Gradient parameter

The example of hAhr.test should be updated in some way. Seems that it should be user.gradient = FALSE instead of gradient = "default" in line 686 according to the comment in line 688. However, hAhr.test(mr) in line 684 does not use gradient function either.

Pass m to weight function

It seems that having m in the function is a bonus. You can always ignore it if you do not need it.

Normalized beta fuction

Maybe I have understood the normalized beta function that is being modeled in the package but I cannot seem to understand why it is using three paramters. The beta weighttning function used by Ghysels, Santa-Clara and Valkanov (2004, 2005 and 2006) only use two parameters. This is the ordinary normalized beta function built from standard gamma function. Is this specification also included in the package and am I perhaps missing something?

Thanks for a great package!

Cannot run midas_r_simple without additional low frequency variables

The documentation suggests this is possible (z=NULL), but I cannot get it to run. I always get the following error:
Error in matrix(z, ncol = 1) :
'data' must be of a vector type, was 'NULL'
Or am I not understanding the documentation correctly?

Write model selection function for Ghysels schema

Ghysels schema changes weight function for each lag. Create dummy functions for each lag, assign them to the environment and call make_ic_table.

Can target variable be high-frequency?

Hi, I was wondering if it was possible (maybe via some workaround) to use a target variable that would be of higher frequency than one of the explanatory variables. An example:

y.monthly ~ x1.quarterly + x2.monthly

Maybe I'm missing something and this can already be done -- please let me know if so!

Thanks.