GithubHelp home page GithubHelp logo

Comments (8)

pennfranc avatar pennfranc commented on May 20, 2024

Hi woj-i, thanks for letting us know about the issue!

To answer your questions about the freq parameter:

  1. Any offset alias can be used, as can be found here: https://stackoverflow.com/a/35339226
  2. This parameter is used when we are dealing with very short (n < 3) TimeSeries instances where the frequency cannot be inferred.

This will raise a ValueError:

times = pd.date_range('20130101', '20130102')
series = TimeSeries.from_times_and_values(times, range(2))

This will work:

times = pd.date_range('20130101', '20130102')
series = TimeSeries.from_times_and_values(times, range(2), freq='D')
  1. Like explained above, sometimes the frequency can be detected and missing entries are filled with NaNs. But as you can see this functionality is not bullet proof.
  2. TimeSeries instances always require a pandas.DatetimeIndex, there is no way around that at this time.

That being said, as far as I can tell the frequency parameter is not really the problem here. Instead, it probably has to do with your input data. More specifically, it is most likely caused because your time index does not have a consistent frequency, meaning that the time difference between two subsequent indices is not constant.
If this is the case, the TimeSeries constructor tries to detect the frequency from subsequences of the time series, and, if only one such frequency is detected, it will fill the missing dates with NaN values such that we have a consistent frequency. However, if more than one frequency is detected, such as in your case (a calendar day and a business day frequency were detected), an error will be thrown.
Unfortunately it seems like in its current form Darts can't work with your (unmodified) data set. The only option I can see right now is to manually make sure that your index has a constant frequency. But this is definitely something we want to improve, we added this to our backlog. Thanks for your feedback! Also feel free to give this a shot yourself if you think you have a good solution!

I hope this helps. If not, please don't hesitate to reach out again!

from darts.

woj-i avatar woj-i commented on May 20, 2024

Thank you for explaining that!
My problem was values in the time-index. I have data from weekdays and no data from weekends.

What I would suggest is to put an information, that filling missing dates with NaN is required for the input. As I understood from the doc you may fill it, but it did not seem to be required.

Moreover, I've seen freq parameter is ignored if size of the frame > 2. The doc says, that it must be passed for len(df) < 3, but it does not say it is ignored for the other cases.

You could also add this reference https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases as possible values of "frequency" parameter.

from darts.

pennfranc avatar pennfranc commented on May 20, 2024

I think you're right about the documentation, it could definitely be made clearer. Also, the fact that a time series with business day frequency cannot be processed should be addressed as well I believe. Thanks for all of your inputs!

from darts.

eduardoansi avatar eduardoansi commented on May 20, 2024

I am facing the same issue with the weekends. I only have data for business days so the frequency is not consistent. Also I can't fill the weekends with null or 0 values because it would impact the model.

In this particular case seasonality is not so important for me, so what I am doing is getting all the values of my original data without the dates, and then just joining it into a dataframe with a regular interval. I lose the precise information about when each value happened, but I can at least see how past values impact future ones.

from darts.

jerryan999 avatar jerryan999 commented on May 20, 2024

When processing the stock data, I met the same problem.

It's such a common problem

from darts.

pennfranc avatar pennfranc commented on May 20, 2024

Update: Time series data with a business day index should now be supported, even when incomplete. In this PR we added the option to override the automatic frequency detection in the case of inconsistent frequency by setting the freq argument of the TimeSeries constructor. To be more specific, whereas this code snippet will result in the same error as before

df = pd.read_csv('AAPL.csv', delimiter=",")
series = TimeSeries.from_dataframe(df, 'Date', ['Close'])
series.plot()

passing freq='B' to the constructor will solve this problem. This code should execute correctly:

df = pd.read_csv('AAPL.csv', delimiter=",")
series = TimeSeries.from_dataframe(df, 'Date', ['Close'], 'B')
series.plot()

(source of data set used for test: https://www.kaggle.com/jacksoncrow/stock-market-dataset?)

This patch has already been published to pip, so you can get the updated version of Darts like this:

pip install u8darts

Please let us know if this solved your issue!

from darts.

eduardoansi avatar eduardoansi commented on May 20, 2024

Thanks for being quick to solve it!

I tried to update the package but I couldn't. Says that everything was already satisfied when I try to install it again, with or without --upgrade. Does it take some time to be available?

from darts.

TheMP avatar TheMP commented on May 20, 2024

Hi, it might take some time for the pypi to notice the changes, but meanwhile you should be able to install new version of darts by naming the specific version:

pip install u8darts==0.2.1

from darts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.