servicenow / n-beats Goto Github PK

N-BEATS is a neural-network based model for univariate timeseries forecasting. N-BEATS is a ServiceNow Research project that was started at Element AI.

License: Other

Dockerfile 0.53% Python 67.91% Makefile 0.68% Jupyter Notebook 30.88%

n-beats's People

Contributors

Stargazers

Watchers

Forkers

dmitri-carpov johndpope saeedseyyedi valeman kashif rorcde adrianomourthe arita37 xanderdunn derick3 dattachandan salmanmohebi kiminh keshav47 santiagoolivar2017 melouali pqy000 polaris79 haoli980405 nisargvp kperkins411 lifesonai azulgarza dare-pointers deeptavker byzhang shrimpceviche sambaiga liuzhixin1976 surajitdb blackmailer75 sdoof jorgeibanmon cognoscentai denissonleal aesmin lelayf catapulta justm57 frankfanwei alecjasen dolores2333 chenzi00103 zaterka chetanmehra simthsam edward-cy mar0093 gokhanyu charliechou1001 mohammadhajikhani jonathanbechtel macqing mdkhaledben undercurrentzyy thiagodasia tonylibing l-forecaster jinshui-wang bsbk1 smith6036 barnetteme1 taenikim saqibmamoon gemmanguen vishalbelsare jianpei-w mlhoffmann wallace-163 hamdi ksoumya lixixibj ashokcse135 sci2sci yoyomoyo vcerqueira c-pzzo yes-its-shivam zhuolinli-shu koszpe dhockaday mmuntag stulacy yuenxq lyapunovstability dainguyenvan vincehass clarenceke jiawei0322 karol-kuczmarz seanigami vasiache zdeli sandy4321 gogami1996110 renyakang drawson5570 acifonelli flashes2458 sandeep1005

n-beats's Issues

Increase N-beats reproducibility by providing pre-computed forecast on evaluated datasets in the original paper

Hi,

I propose that the point forecast of N-BEATS be included in this repo for all the dataset evaluated in the original paper like it has been done for the M4 competition github. This would ease the comparison of the N-beats model with others and increase the visibility of the N-BEATS paper.

Beside reproducing the experimental results presented in the paper, it is often more convenient to rely on precomputed forecast to compare different model on the same dataset. For instance, the M4 competition's github repository provides the point forecast of all the models submitted in compressed csv files which permits comparing the models per individual series. We can evaluate the performance of ES-RNN and FFORMA using different loss functions but not the N-beats model. Thankfully, this is the case for ES-RNN and FFORMA given the execution time to reproduce their forecasts. It would be great if N-BEATS wouldn't fall under the reproducibility exception. The argument holds for the other dataset evaluated.

Great repo btw!

M4 experiment is not avalable

The M4 dataset url link in the script is not available. Though I mannually downloaded the datasets from M4Competition , the make run command still fails.

The only variable as input should be the target which is part of time_varying_unknown_reals

My data set has : time_idx, Date, Ticker, Open, high, low, close, Stock split and Dividends.
My Time series data set is:
training = TimeSeriesDataSet(
combined_data[lambda x: x.time_idx <= training_cutoff],
time_idx="time_idx",
target="Close",
group_ids=["Ticker"],
max_encoder_length=60,
max_prediction_length=7,
static_categoricals=["Ticker"],
time_varying_known_reals=["time_idx", "Open", "High", "Low", "Volume", "Stock Splits", "Dividends"],
time_varying_unknown_reals=["Close"],
allow_missing_timesteps=True,
target_normalizer=GroupNormalizer(
groups=["Ticker"], transformation="softplus"
),
add_relative_time_idx=False,
add_target_scales=True,
add_encoder_length=True,
)

I get this error, when I run this :

Define the model with the suggested hyperparameters

model = NBeats.from_dataset(
    training,
    learning_rate=learning_rate,
    hidden_size=hidden_size,
    widths=widths,
    backcast_loss_ratio=backcast_loss_ratio,
    dropout=dropout
)

I see the data set has reals like encoder_length, Close_min, close_scaled and all time varying known reals I gave in the data set.
I tried removing all the reals, still I get the error as number of reals are encoder, and Close related variables. I cannot not give encoder length. Issue arise from this piece of code:

assert (
len(dataset.flat_categoricals) == 0
and len(dataset.reals) == 1
and len(dataset.time_varying_unknown_reals) == 1
and dataset.time_varying_unknown_reals[0] == dataset.target
), "The only variable as input should be the target which is part of time_varying_unknown_reals"

history_size/window_sampling_limit is shorter than input size

For certain datasets, e.g. Yearly/Quarterly/Monthly M4 datasets, the quantity history_size is set to 1.5, leading window_sampling_limit to be 1.5 times of the horizon length. Yet, the input size could be up to 7 times the horizon length, meaning that during training phase the model mostly observes padding. Is this an issue, and possibly leading to a degradation in performance in these dataset?

a question abouts the ’generic.gin ‘

In experiments/m4/generic.gin,
`
instance.history_size = {
'Yearly': 1.5,
'Quarterly': 1.5,
'Monthly': 1.5,
'Weekly': 10,
'Daily': 10,
'Hourly': 10
}

instance.iterations = {
'Yearly': 15000,
'Quarterly': 15000,
'Monthly': 15000,
'Weekly': 5000,
'Daily': 5000,
'Hourly': 5000
}
`
How are the above parameters determined？

Doesn't work with CUDA mode

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Normalize data and target

Hi. Does model normalize data and label to range [-1, 1] to train?
Thank you!

Reproducing M4 Results

I am having trouble reproducing the results on the M4 dataset. I am getting the following error when running the notebook for the m4:

ModuleNotFoundError Traceback (most recent call last)
in
1 import pandas as pd
2
----> 3 from summary.m4 import M4Summary
4 from summary.utils import median_ensemble
5

ModuleNotFoundError: No module named 'summary'

When I follow the reproduction steps I run into this error.

Here is my build script.

#!/bin/bash
make init
make dataset
make build config=experiments/m4/interpretable.gin
make run command=storage/experiments/m4_interpretable/repeat=0,lookback=2,loss=MAPE/command

why does forward of GenericBasis Class in models/nbeats.py return theta instead of FC projection of theta?

class GenericBasis(t.nn.Module):
"""
Generic basis function.
"""
def init(self, backcast_size: int, forecast_size: int):
super().init()
self.backcast_size = backcast_size
self.forecast_size = forecast_size

def forward(self, theta: t.Tensor):
    return theta[:, :self.backcast_size], theta[:, -self.forecast_size:]

is it more reasonable to return a function of theta just like trendBasis and seasonalityBasis?

NBeats.forward()

Hi，
In the 'nbeats.py', the class NBeats:

class NBeats(t.nn.Module):

    def __init__(self, blocks: t.nn.ModuleList):
        super().__init__()
        self.blocks = blocks

    def forward(self, x: t.Tensor, input_mask: t.Tensor) -> t.Tensor:

        residuals = x.flip(dims=(1,))
        input_mask = input_mask.flip(dims=(1,))

        forecast = x[:, -1:]
        for i, block in enumerate(self.blocks):
            backcast, block_forecast = block(residuals)
            residuals = (residuals - backcast) * input_mask  
            forecast = forecast + block_forecast
        return forecast

in the forward funtion， why do this operation： ‘residuals = x.flip(dims=(1,))’ ？？？

Project dependencies may have API risk issues

Hi, In N-BEATS, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

gin-config
fire
matplotlib
numpy
pandas
patool
torch
tqdm
xlrd

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project,
The version constraint of dependency pandas can be changed to >=0.13.0,<=0.23.4.
The version constraint of dependency tqdm can be changed to >=4.42.0,<=4.64.0.

The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the pandas

pandas.DataFrame.to_csv
pandas.read_csv
pandas.read_excel
pandas.concat

The calling methods from the tqdm

itertools.product
tqdm.tqdm

The calling methods from the all methods

dates.s.datetime.strptime.strftime.map.list.np.unique.dump
layer
collections.OrderedDict
numpy.cos
pandas.concat
numpy.array
datasets.tourism.TourismDataset.download
i.permutations.np.where.raw_data.rstrip
os.stat
tqdm.tqdm
training_values.extend
group_by.forecast_file.summary_filter.experiment_path.os.path.join.glob.tqdm.file.file.pd.read_csv.pd.concat.set_index.groupby
self.build
join
x_mask.x.model.cpu.detach
cmd.write
experiments.model.interpretable
x_mask.x.model.cpu
torch.abs
self.snapshot
values.extend
models.nbeats.GenericBasis
min
os.fsync
numpy.sin
optimizer.state_dict
numpy.where
iter
summary.utils.group_values
enumerate
torch.no_grad
__loss_fn
URL_TEMPLATE.format
test.reset_index.reset_index
i.i.data.sum
model.to.parameters
numpy.mean
permutations.rstrip.split.np.array.astype.rstrip
os.chmod
patoolib.extract_archive
torch.device
value.str.replace
left_indices.append
torch.load
weighted_score.values
torch.save
super.__init__
datasets.tourism.TourismDataset.load
pandas.DataFrame.to_csv
experiments.trainer.trainer
snapshot_manager.register
float
common.sampler.TimeseriesSampler
dir_path.Path.mkdir
pandas.DataFrame
str
os.getenv
groups.extend
torch.mean
row_vector.split.np.array.astype
i.permutations.np.where.raw_data.rstrip.split
x.flip
models.nbeats.TrendBasis
numpy.random.randint
group.lower
datasets.traffic.TrafficDataset.download
common.metrics.mase
shutil.copy
torch.cuda.is_available
training_loss_fn.backward
metric
d.items
model.load_state_dict
datasets.m4.NAIVE2_FORECAST_FILE_PATH.pd.read_csv.values.astype
range
common.torch.losses.smape_2_loss
parsed_values.np.array.astype
numpy.array.dump
datetime.timedelta
i.timedelta.current_date.strftime
itertools.product
os.path.dirname
os.walk
time.time
torch.nn.ModuleList
optimizer.load_state_dict
row_vector.split
os.rename
common.http_utils.download
dataset.dump
urllib.request.urlretrieve
numpy.isnan
numpy.load
snapshot_manager.restore
Exception
self.basis_parameters
training_loss_fn
super
url.split
numpy.abs
snapshot_manager.enable_time_tracking
int
numpy.power
forecasts.extend
test_values.extend
datasets.m4.M4Dataset.download
pandas.read_csv.iterrows
common.sampler.TimeseriesSampler.last_insample_window
list
common.metrics.mape
success_flag.Path.touch
dict.items
pandas.read_csv.set_index
cfg.write
ids.extend
TourismDataset
model.to.to
datasets.traffic.TrafficDataset.load.split_by_date
file_path.os.path.dirname.pathlib.Path.mkdir
torch.nn.Linear
zip
numpy.concatenate
models.nbeats.NBeats
right_indices.append
fire.Fire
torch.optim.Adam
M3Dataset
logging.root.setLevel
raw_line.replace.strip.split
timeseries_dict.values.list.np.array.dump
collections.OrderedDict.values
gin.configurable
models.nbeats.NBeatsBlock
max
s.datetime.strptime.strftime
permutations.rstrip.split.np.array.astype
numpy.append
datasets.m3.M3Dataset.load
pandas.read_csv
len
pandas.read_excel
default_device
TrafficDataset
numpy.prod
test.iloc.astype
dict
datasets.electricity.ElectricityDataset.load
common.torch.ops.default_device
common.torch.ops.divide_no_nan
models.nbeats.SeasonalityBasis
numpy.zeros
datasets.electricity.ElectricityDataset.load.split_by_date
torch.load.items
torch.nn.Parameter
isinstance
torch.nn.utils.clip_grad_norm_
numpy.transpose
common.torch.snapshots.SnapshotManager
sys.stdout.flush
torch.load.keys
numpy.max
os.path.isdir
numpy.sum
input_mask.flip.flip
tempfile.NamedTemporaryFile
M4Dataset
torch.optim.Adam.zero_grad
datasets.m3.M3Dataset.download
numpy.round
train_meta.iloc.astype
numpy.unique
torch.tensor
dataclasses.dataclass
dates.np.array.dump
tempfile.NamedTemporaryFile.flush
self.instance
f.readlines
os.path.basename
open
permutations.rstrip.split
common.torch.losses.mape_loss
common.metrics.smape_1
forecast_file.summary_filter.experiment_path.os.path.join.glob.tqdm.file.file.pd.read_csv.pd.concat.set_index
raw_line.replace.strip
urllib.request.install_opener
sys.stdout.write
horizons.extend
group_by.group_by.forecast_file.summary_filter.experiment_path.os.path.join.glob.tqdm.file.file.pd.read_csv.pd.concat.set_index.groupby.median
round
group_count
logging.info
gin.parse_config_file
sorted
common.torch.losses.mase_loss
urllib.request.build_opener
torch.relu
common.http_utils.url_file_name
glob.glob
x_mask.x.model.cpu.detach.numpy
model.state_dict
round_all
torch.float32.array.t.tensor.to
format
datetime.datetime.strptime
os.path.join
numpy.save
datasets.m4.M4Dataset.load
self.summarize_groups.keys
dates.extend
torch.optim.Adam.step
self.summarize_groups
pathlib.Path
torch.einsum
instance_path.Path.mkdir
self.basis_function
numpy.ceil
map
datasets.traffic.TrafficDataset.load
os.path.isfile
model.to.train
block
experiments.model.generic
datasets.electricity.ElectricityDataset.download
experiments.trainer.trainer.eval
model
raw_line.replace
build_cache
splits.items
tempfile.NamedTemporaryFile.fileno
train.reset_index.reset_index
numpy.array.append
ElectricityDataset
common.metrics.smape_2
numpy.arange
next
numpy.sqrt

@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.

Question about hyperparameters

Are there any hyperparameters that are more important than others and do you have recommended ranges? I'm thinking of something similar to this, but for N-BEATS:

Cannot run on gpu

I can successfully make run the experiment on tourism dataset on cpu. However, when I use
make run command=storage/experiments/tourism_interpretable/repeat=0,lookback=2,loss=MAPE/command gpu=0

I have the following feedback:
the input device is not a TTY
Makefile:30: recipe for target 'run' failed
make: *** [run] Error 1

Questions about the MASE.

In the MASE, I find that the shapes of in_sample, out_sample, and forecasting should be time_o or time_in, instead of batch, time_i/o.
Moreover, does the MASE follow the formulation in Section 2 of the paper? The current implementation seems utilize all historical data (instead of the data from 1~T), and the denominator in current implementation does not include future data, i.e. data from T~T+H?

servicenow / n-beats Goto Github PK

n-beats's People

Contributors

Stargazers

Watchers

Forkers

n-beats's Issues

Define the model with the suggested hyperparameters

Recommend Projects

Recommend Topics

Recommend Org

Jobs