signaflo / java-timeseries Goto Github PK
View Code? Open in Web Editor NEWTime series analysis in Java
License: MIT License
Time series analysis in Java
License: MIT License
Resolved.
Hi, thanks for your implementation it has been very useful for me.
I just wanna know if there is a way to save an ARIMA model on disk for future predictions.
It would be nice if there was a formal documentation somewhere with a "quick start" etc.
I'm not a Java programmer and other non-java programmers could quickly see how to use the library.
What I mean by documentation is not formal api documentation but rather tutorials and examples.
I would like to use the library for implementing ARIMA models for some data.
Currently, one needs to provide the complete time series, in order to build an ARIMA model and subsequently obtain forecasts. When used in a streaming application, so that the time series is of possibly infinite length, this approach becomes unfeasible.
Let me shortly introduce my use case:
I'm trying to use this framework in an application, where I consume a time series in a streaming fashion. Using window based aggregation, I downsample the series to exactly one value / 15 minutes. Given already fitted coefficients, I need to forecast one step ahead.
When evaluating the forecast method, I have access to the last p time series values and the last q errors¹ (but not the whole time series).
What I popose is a method to allow forecasting trough a "partial" ARIMA model given by
model(TimeSeries lastP_Observations, TimeSeries lastQ_Errors, ArimaCoefficients coeffs)
¹ The errors are basically obtained by joining the time series stream with the time shifted forecast stream and calculating the differences.
It's very common for scala developers to use the scala.math
package without the scala prefix as it's imported by default. Naming your package math conflicts with this such that simply adding java-timeseries as a dependency will break the build. Maybe you should have prefixed it with your own vendor prefix to avoid these conflicts?
Hello,
When I try to predict with ARIMA in some timeseries I get this WARN:
40 [main] WARN com.github.signaflo.math.optim.BFGS - Maximum step reductions, 25
and the prediction is not calculated and appears as NaN. For example:
I have a TS with 20 items, when I try to predict one value with ARIMA(1,0,0), the warm appears. However if I change to 21 items, it doesn't.
My code:
timeSeries = Ts.newAnnualSeries(1975, DoubleFunctions.arrayFrom(data));
modelOrder = ArimaOrder.order(p, d, q);
model = Arima.model(timeSeries, modelOrder);
Forecast forecast = model.forecast(1);
TimeSeries prediccion = forecast.pointEstimates();
System.out.println(prediccion.asList().get(0));
My TS
1.339910000000000082e+03
1.569730000000000018e+03
1.751859999999999900e+03
1.965369999999999891e+03
2.246050000000000182e+03
2.495900000000000091e+03
2.724000000000000000e+03
2.650000000000000000e+03
2.679699999999999818e+03
2.794699999999999818e+03
2.828500000000000000e+03
3.521099999999999909e+03
3.931500000000000000e+03
4.941399999999999636e+03
5.388399999999999636e+03
5.766199999999999818e+03
5.937899999999999636e+03
5.627399999999999636e+03
5.982699999999999818e+03
5.931600000000000364e+03
Any ideas?
Thanks.
Is there a way to use this model if my time series has some values missing?
Clients should be able to see whether the model they built is causal, stationary, and invertible.
Hi
Is there any way to restrict prediction values for positive values. Because the time series that I'm working can have only positive values. But the ARIMA model time series forecast gives me negative values some times.
I am making ARIMA predictions with yearly seasonality and data. with 24 months of history, the prediction is as good as cn be expected with s little data, but if I use 25 months of history, the prediction is very unstable (slightly modifying one measure can result in a massive trend change).
Example input : 10 | 11 | 12 | 8 | 6 | 4 | 5 | 6 | 6 | 5 | 4 | 3 | 2 | 2.5 | 3 | 2 | 1 | 0.5 | 0.1 | 0 | 1 | 0.2 | 1 | 0 | 0.3 | 0.4 | 0.2
with the following ARIMA parameters :
p = 0
d = 1
q = 1
aP = 0
aD = 1
aQ = 1,
the previsions are increasing (athough the history is clearly decreasing). Changing the second value (the 11) into a 13 or a 9 makes the predictions seem correct.
When I perform the same prediction with only the 24 first value, the prediction is not as sensitive, and changing one input value only marginally affects the prediction.
I suspect this is due to this line in ArimaModel.java
:
this.differencedSeries = observations.difference(1, order.d()).difference(seasonalFrequency, order.D());
Hi, first of all I would like to thank you for the library you developped for java because it was a great help for me in understanding how arima and regression model works.
I would like to know how to implement a regression model because I didn't find it on the wiki of the project.
Best regards.
Hello,after i read the API documentation(https://javadoc.io/doc/com.github.signaflo/timeseries/0.4),I found that the new TimeSeries only have weekly、monthly or quarterly predict,but I want to do the model predicted by the day,my data changes over the weekend(it low in weekend,high in workday),is there has any other way to predict my data using ARIMA?Did i use TimeSeries.from(double[] d) works out best predict?waiting for your replay,@signaflo
Hi, @signaflo !
Sorry to bother you. I have to say you've done so wonderful work that it helped a lot on ARIMA. As you know , there are so few materials about Arima in Java, no matter the realization of the model. Your work has benefited me a lot.
BUT, I still get some problems and I sincerely ask for your help. The question is --- as my ability is not so good ,I could find the main function about the use of the Arima model to predict (I mean the specific process ), so can you tell me how to use your API to run an example?
My second question is, is there function considering checking the time series's stability, which is the foudation of the use of Arima.
I will appreciate your reply if you can spare some time to help me.
Good Day!
The ArimaCoefficients is a simple structure containing arrays of doubles representing the AR and MA parameters in addition to the mean/drift terms.
There are a few issues. One is that we need to differentiate between a coefficient that is fixed and known, and therefore has no standard error, and a coefficient that either has been or is in the process of being estimated, and therefore should have an associated uncertainty measure.
Another issue is that the only way to obtain information about standard errors of the coefficients in the Arima class is by getting a flat array. Knowing which coefficient corresponds to which standard error in the array is then a matter of guesswork.
Finally, the coefficients in the Arima class are not public, which has been a problem for many users.
To fix these issues, the following tasks should be completed:
It's possible that changes should be made to the Arima class itself. For example, an ARIMA model whose coefficients are fixed is different from an ARIMA model whose coefficients are estimated through an optimization routine, but it is unclear whether that difference needs to be explicitly represented in the code structure.
Hi there i got a time series data,which i want to resample by given time interval by two methods i.e by close and average.
is there any possibility that we can achieve this like we have functionality in python pandas
dataframe.resample('3T',howto=mean)
if it is possible then how to,and if not is there any suggestions.
We are getting ArrayIndexOutOfBoundsException when using ARIMA model with yearly seasonality (Timeperiod.oneYear()). Is there any minimum number of points for training?
(we tried with many different sets of data and always ends up with this error) Here is the stack trace
java.lang.ArrayIndexOutOfBoundsException: -1809947671
at com.github.signaflo.timeseries.model.arima.ArimaKalmanFilter.inclu2(ArimaKalmanFilter.java:331)
at com.github.signaflo.timeseries.model.arima.ArimaKalmanFilter.getInitialStateCovariance(ArimaKalmanFilter.java:273)
at com.github.signaflo.timeseries.model.arima.ArimaKalmanFilter.initializePredictedCovariance(ArimaKalmanFilter.java:167)
at com.github.signaflo.timeseries.model.arima.ArimaKalmanFilter.<init>(ArimaKalmanFilter.java:68)
at com.github.signaflo.timeseries.model.arima.ArimaModel.kalmanFit(ArimaModel.java:288)
at com.github.signaflo.timeseries.model.arima.ArimaModel.access$500(ArimaModel.java:61)
at com.github.signaflo.timeseries.model.arima.ArimaModel$OptimFunction.at(ArimaModel.java:685)
at com.github.signaflo.math.optim.BFGS.<init>(BFGS.java:85)
at com.github.signaflo.timeseries.model.arima.ArimaModel.<init>(ArimaModel.java:126)
at com.github.signaflo.timeseries.model.arima.ArimaModel.<init>(ArimaModel.java:80)
at com.github.signaflo.timeseries.model.arima.Arima.model(Arima.java:64)
at com.yahoo.digits.druid.forecastquery.model.ArimaModel.train(ArimaModel.java:102)
at com.yahoo.digits.druid.forecastquery.model.ArimaModelTest.testArimaModel(ArimaModelTest.java:43)
OffsetDateTime is a critical but annoying piece of the library. I don't want external users of the library to have to mess with it, though there may be good reasons for keeping it around.
I would prefer to wrap it in a simple Time class that delegates the small chunk of behavior we need from it. If anything, we could get a much prettier toString representation and make it much easier to create an instance of the class.
Hi, thanks for your java-timeseries working! I'm reading this open source recently. I found that there might be some mistake in the code. Please see the class ArimaCoefficients.
There may be errors in the calculation process of the methods expandArCoefficients(...) and
expandMaCoefficients(...).
For example,
double[] arCoeffs = new double[] {1, 3, 5};
double[] sarCoeffs = new double[] {2};
int seasonalFrequency = 2;
double[] expandArcoeffs = new double[arCoeffs.length + sarCoeffs.length * seasonalFrequency];
expandArcoeffs = expandArCoefficients(arCoeffs, sarCoeffs, seasonalFrequency);
By the method expandArCoefficients(...), the array expandArcoeffs is euqal to
{1.0, 2.0, -2.0, -6.0, -10.0}.
In fact, through simple polynomial operations, the array expandArcoeffs should be equal to
{1.0, 5.0, 3.0, -6.0, -10.0}.
Furthermore, if arCoeffs.length >= seasonalFrequency
, then the method expandArCoefficients(...) may result in wrong results.
And if maCoeffs.length >= seasonalFrequency
, then the method expandMaCoefficients(...) may result in wrong results.
In my opinion, the correct code may be as follows.
static double[] expandArCoefficients(final double[] arCoeffs, final double[] sarCoeffs,
final int seasonalFrequency) {
double[] arC = new double[arCoeffs.length+1];
double[] sarC = new double[sarCoeffs.length+1];
double[] arSarCoeffs = new double[arCoeffs.length + sarCoeffs.length * seasonalFrequency];
double[] arSarC = new double[arSarCoeffs.length+1];
arC[0] = -1.0;
sarC[0] = -1.0;
System.arraycopy(arCoeffs, 0, arC, 1, arCoeffs.length);
System.arraycopy(sarCoeffs, 0, sarC, 1, sarCoeffs.length);
// Note that we take into account the interaction between the seasonal and non-seasonal coefficients,
// which arises because the model's ar and seasonal ar polynomials are multiplied together.
for (int i = 0; i < arC.length; i++) {
for (int j = 0; j < sarC.length; j++) {
arSarC[i + j* seasonalFrequency] += -arC[i] * sarC[j];
}
}
System.arraycopy(arSarC, 1, arSarCoeffs, 0, arSarCoeffs.length);
return arSarCoeffs;
}
// Expand the moving average coefficients by combining the non-seasonal and seasonal coefficients into a single
// array, which takes advantage of the fact that a seasonal MA model is a special case of a non-seasonal
// MA model with zero coefficients at the non-seasonal indices.
static double[] expandMaCoefficients(final double[] maCoeffs, final double[] smaCoeffs,
final int seasonalFrequency) {
double[] maC = new double[maCoeffs.length+1];
double[] smaC = new double[smaCoeffs.length+1];
double[] maSmaCoeffs = new double[maCoeffs.length + smaCoeffs.length * seasonalFrequency];
double[] maSmaC = new double[maSmaCoeffs.length+1];
maC[0] = 1.0;
smaC[0] = 1.0;
System.arraycopy(maCoeffs, 0, maC, 1, maCoeffs.length);
System.arraycopy(smaCoeffs, 0, smaC, 1, smaCoeffs.length);
// Note that we take into account the interaction between the seasonal and non-seasonal coefficients,
// which arises because the model's ar and seasonal ar polynomials are multiplied together.
for (int i = 0; i < maC.length; i++) {
for (int j = 0; j < smaC.length; j++) {
maSmaC[i + j * seasonalFrequency] += maC[i] * smaC[j];
}
}
System.arraycopy(maSmaC, 1, maSmaCoeffs, 0, maSmaCoeffs.length);
return maSmaCoeffs;
}
I hope I misunderstood your code.
Please forgive my poor English and code.
Thank you again for your open source. It's excellent.
Best wishes!
I think there should be a ADF check for the time series before starting the ARIMA process.
Does this lib support this function?
Hi, @signaflo !
Sorry to bother you.
When I try to use your code, I get some problems and I sincerely ask for your help. In fact ,I don't know how you calculate the AIC, are there some docs to help me better undetstand your code?
I will appreciate your reply if you can spare some time to help me.
Good Day!
See discussion in PR #3
Code
while (!(Double.isFinite(functionValue) &&
functionValue < priorFunctionValue + C1 * stepSize * slopeAt0) && !stop) {
can fall into infinite loop.
Any non-invertible MA model can be converted to an invertible one, except when the roots lie exactly on the unit circle. The algorithm for doing so first needs to be discovered and explained, and then implemented.
No issue
In the Streaming branch, we're using a StreamingSeries that implements the Flow.Processor interface. The idea is to be able to have parallel observation of this series by multiple different time series models that can each be updated upon the receipt of a new series observation.
I feel that the easiest way down this path is to not focus on model estimation yet, but to start with an ARIMA process with known coefficients to see in what ways this process changes as new data comes in.
@signaflo Hi
Sorry to bother you.
I run a Main.java according to your Wiki, the program can run.However, after I have checked the results, I found something strange: the prediction results after 2013-04-01 are all too small compared to before, so I want to assure if there is some problem in my Main.java.Below I will paste my Main.java and results so that you can better check.
Here is the Main.java:
public class Main2 {
public static void main(String[] args){
TimeSeries timeSeries = TestData.debitcards;
ArimaOrder modelOrder = ArimaOrder.order(2, 4, 6, 0, 1, 1);
Arima model = Arima.model(timeSeries, modelOrder);
System.out.println(model.aic()); // Get and display the model AIC
Forecast forecast = model.forecast(12);
System.out.println(forecast);
}
}
| 2013-01-01T00:00 | 144328.7145 | 144222.6794 | 144434.7496 |
| 2013-02-01T00:00 | 890130.1238 | 889636.4510 | 890623.7965 |
| 2013-03-01T00:00 | 3562769.397 | 3561322.414 | 3564216.379 |
| 2013-04-01T00:00 | 1.095551038 | 1.095211425 | 1.095890651 |
| 2013-05-01T00:00 | 2.833212465 | 2.832518799 | 2.833906131 |
| 2013-06-01T00:00 | 6.473400261 | 6.472112782 | 6.474687739 |
| 2013-07-01T00:00 | 1.347106513 | 1.346883862 | 1.347329164 |
| 2013-08-01T00:00 | 2.604723814 | 2.604359301 | 2.605088327 |
| 2013-09-01T00:00 | 4.745488526 | 4.744917359 | 4.746059693 |
| 2013-10-01T00:00 | 8.230283109 | 8.229419808 | 8.231146410 |
| 2013-11-01T00:00 | 1.369331700 | 1.369205099 | 1.369458301 |
| 2013-12-01T00:00 | 2.198720933 | 2.198540000 | 2.198901866 |
`
hey jacob,
when i'm forecasting weekly timeseries , it's taking a lot time to compute with ML strategy of using BFGS optimzer, i read that L-BFGS can slove this memory and computation time, is it possible ? if it is possible , the output results will be same or not?
If I want the predict values all to be greater than 0,how can i do
double[] series = new double[]{1, 46, 8, 3, 4, 6, 9, 2, 16, 3};
TimeSeries timeSeries = TimeSeries.from(TimePeriod.oneDay(), series);
The product of lag and times must be less than or equal to the length of the series, but 1 * 365 = 365 is greater than 9
Should the StreamingSeries emit a Double, an Observation, or an entire static TimeSeries?
Hi,
I use your library in a streaming environment to generate an ARIMA based time series. The simulated time series is supposed to be very long (possibly infinite).
Unfortunately, the whole series is simulated in advance and requires an array of size n. It would be nice to have the possibility to get an "iterator" based time series that allows to retrieve he latest Y_t only, which is then calculated on request. Space complexity should then reduce to max(p,q), right?
Is that something you consider as useful and would possibly implement?
The ARIMA models should be able to automatically apply a supplied one parameter Box-Cox transformation when fitting the model.
When the model is forecast, the Box-Cox transformation should be reversed in order to get forecasts on the scale of the original data.
For basic use cases and explanation, see here: https://www.otexts.org/fpp/2/4
Hi, @signaflo !
Sorry to bother you.
i want to ask if there any way to get the best p d q or list of acf、pacf 。
I will appreciate your reply if you can spare some time to help me.
Good Day!
xt1.txt
Hello Signaflo,
I have the attached 1 year data. After prediction, when I calculate RMSE, I get values above 10. Can you please guide me.
如果希望预测值全都大于0 该如何处理数据呢
It would be nice to be able to inspect the ARIMA coefficients after fitting. Unfortunately, the getter methods in the above mentioned class are package-private. Is there a reason for that?
Got NPE on simple code:
public static void main(String[] args) {
ArimaOrder arimaOrder = ArimaOrder.order(0, 1, 1, Arima.Constant.INCLUDE);
Arima model = Arima.model(TimeSeries.from(new double[]{
100.1,
200.4,
300.2,
400.5,
500.1,
600.7,
700.7,
800.7
}), arimaOrder);
Forecast forecast = model.forecast(5);
System.out.println(forecast);
}
You don't check the return boolean value, that can be false...
The current way of using coefficients is ugly.
A coefficient is either a known or estimated property of a process or model. It should have un uncertainty score associated with it. If the coefficient is pre-determined, the uncertainty score should be zero. Otherwise, the coefficient should be greater than zero.
A parameter is a numeric property of a process or model that is unknown and may vary. Once it no becomes known or no longer varies, it becomes a coefficient.
This repo: https://github.com/Workday/timeseries-forecast says it is an implementation of the Hannan-Rissanen algorithm for additive ARIMA models.
I was wondering what algorithm this repo uses?
I know my (p,d,q)(P,D,Q)_M (seven) coefficients, where should I put them? This library has only 6 coefficients as far as I could see.
This library has a nice sample code in README: https://github.com/Workday/timeseries-forecast
The TimeUnit enum is basically a wrapper around Java's native ChronoUnit enum, but it adds the concept of a Quarter and has two methods "frequencyPer(other time unit)" and "totalDuration()".
I'm beginning to think all of this should be moved to the TimePeriod class. The TimePeriod class should hold a reference to a TemporalUnit instead of a TimeUnit.
The TimePeriod class also has a frequencyPer(other time period) and totalSeconds() [Note that the totalDuration method in TimeUnit returns the duration in seconds, so its the same thing]. So at this point it appears the TimeUnit enum is basically just adding an unnecessary layer of redirection.
I'm going to remove it unless I get a compelling argument to keep it.
@signaflo I am trying to build Arima models separating the training and test phase. I cannot access the fields from the ArimaCoefficients class outside of the java-timeseries package in order to store them and use them later. Is it possible for you to change that ?
Thanks in advance
I used the latest release in my work to produce results that I plan to publish. Would it be possible to assign a DOI to this repository?
Thanks
`package testProj;
import com.github.signaflo.timeseries.TestData;
import com.github.signaflo.timeseries.TimeSeries;
import com.github.signaflo.timeseries.forecast.Forecast;
import com.github.signaflo.timeseries.model.Model;
import com.github.signaflo.timeseries.model.arima.Arima;
import com.github.signaflo.timeseries.model.arima.ArimaOrder;
public class MainFile {
public static void main(String[] args)
{
// TODO Auto-generated method stub
TimeSeries timeSeries = TestData.debitcards;
ArimaOrder modelOrder = ArimaOrder.order(0, 1, 1, 0, 1, 1); // Note that intercept fitting will automatically be turned off
Arima model = Arima.model(timeSeries, modelOrder);
Forecast forecast = model.forecast(1); // To specify the alpha significance level, add it as a second argument.
System.out.println(forecast);
System.out.println(forecast.pointEstimates().mean());
}
}
`
When I run java-timeseries with the following data, something is wrong.
double[] sales = new double[] {3.0, 3.0, 7.0, 2.0, 2.0, 1.0, 0.0, 3.0, 4.0, 3.0, 2.0, 3.0, 6.0, 1.0,
0.0, 3.0, 4.0, 2.0, 2.0, 0.0, 1.0};
long season = 8l;
TimePeriod day = TimePeriod.oneDay();
TimeSeries series = TimeSeries.from(day, sales);
TimePeriod timePeriod = new TimePeriod(TimeUnit.DAY, season);
ArimaOrder order = ArimaOrder.order(3, 1, 2, 1, 1, 2);
Arima model = Arima.model(series, order, timePeriod);
Forecast forecast = model.forecast(7);
TimeSeries forecastValue = forecast.pointEstimates();
double[] forecastValuesArray = forecastValue.asArray();
for ( double forecastValue : forecastValuesArray) {
System.out.println(forecastValue);
}
Then the printed results are all NaN.
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.