GithubHelp home page GithubHelp logo

time2graphplus's Introduction

Time2GraphPlus (Time2Graph+)

This project implements the Time2Graph+ model[1], an extension of Time2Graph[2] which focuses on time series modeling with dynamic shapelets via graph attentions. See project homepage for model details.

Quick Links

Building and Testing

This project is implemented primarily in Python 3.6, with several dependencies listed below. We have tested the framework on Ubuntu 16.04.5 LTS with kernel 4.4.0, and it is expected to easily build and run under a regular Unix-like system.

Dependencies

  • Python 3.7. Version 3.7.0 has been tested. Higher versions are expected be compatible with current implementation, while there may be syntax errors or conflicts under python 2.x.

  • PyTorch.

    Version 1.7.0 has been tested. You can find installation instructions here. Note that the GPU support is ENCOURAGED as it greatly boosts training efficiency.

  • XGBoost

    Version 1.3.3 has been tested. You can find installation instructions here.

  • Other Python modules. Some other Python module dependencies are listed in requirements.txt, which can be easily installed with pip:

    pip install -r requirements.txt

    Although not all dependencies are mentioned in the installation instruction links above, you can find most of the libraries in the package repository of a regular Linux distribution.

Building the Project

Before building the project, we recommend switching the working directory to the project root directory. Assume the project root is at <time2graphplus_root>, then run command

cd <time2graphplus_root>

Note that we assume <time2graphplus_root> as your working directory in all the commands presented in the rest of this documentation. Then make sure that the environment variable PYTHONPATH is properly set, by running the following command (on a Linux distribution):

export PYTHONPATH=`readlink -f ./`

Testing the Project (Reproducibility)

A test script scripts/benchmark_test.py is available for reproducibility on the benchmark datasets:

python . -h
 
usage: . [-h] [--dataset] [--n_splits] [--model_cache] [--shapelet_cache] [--gpu_enable]

optional arguments:
  -h, --help        show this help message and exit
  --dataset         str, one of `ucr-Earthquakes`, `ucr-WormsTwoClass` and `ucr-Strawberry`, 
                    which we have set the optimal parameters after fine-tuning. 
                    (default: `ucr-Earthquakes`)
  --n_splits        int, number of splits in cross-validation. (default: 5)
  --model_cache	    bool, whether to use a pretrained model.(default: False)
  --shapelet_cache  bool, whether to use a pretrained shapelets set.(default: False)
  --gpu_enable      bool, whether to enable GPU usage. (default: False)

Usage

Given a set of time series data and the corresponding labels, the Time2Graph+ framework aims to learn the representations of original time series, and conduct time series classifications under the setting of supervised learning.

Input Format

The input time series data and labels are expected to be numpy.ndarray:

Time_Series X: 
    numpy.ndarray with shape (N x L x data_size),
    where N is the number of time series, L is the time series length, 
    and data_size is the data dimension.
Labels Y:
    numpy.ndarray with shape (N x 1), with 0 as negative, and 1 as positive samples.

We organize the preprocessing codes that load the UCR dataset in the archive/ repo, and if you want to utilize the framework on other datasets, just preprocess the original data as the abovementioned format.

Main Script

Now that the input data is ready, the main script scripts/run.py is a pipeline example to train and test the whole framework. Firstly you need to modify the codes in the following block (line 46-51) to load your datasets, by reassigning x_train, y_train, x_test, y_test respectively.

if args.dataset.startswith('ucr'):
    dataset = args.dataset.rstrip('\n\r').split('-')[-1]
    x_train, y_train, x_test, y_test = load_usr_dataset_by_name(
    fname=dataset, length=args.seg_length * args.num_segment)
else:
    raise NotImplementedError()

The help information of the main script scripts/run.py is listed as follows:

python . -h
 
usage: .[-h] [-- dataset] [--K] [--C] [--num_segment] [--seg_length] [--data_size] 
        [--n_splits] [--njobs] [--optimizer] [--alpha]  [--beta] [--init] 
        [--gpu_enable] [--opt_metric] [--cache] [--embed] [--embed_size] [--warp] 
        [--cmethod] [--kernel] [--percentile] [--measurement] [--batch_size] 
        [--tflag] [--scaled] [--norm] [--no_global]

optional arguments:
  -h, --help        show this help message and exit
  --dataset         str, indicate which dataset to load; 
                    need to modify the codes in line 46-51.
  --K               int, number of shapelets that try to learn
  --C               int, number of shapelet candidates used for learning shapelets
  --num_segment     int, number of segment that a time series have
  --seg_length      int, the segment length, 
                    so the length of a time series is num_segment * seg_length
  --data_size       int, the dimension of time series data
  --n_splits        int, number of cross-validation, default 5.
  --njobs           int, number of threads if using multiprocessing.
  --optimizer       str, optimizer used for learning shapelets, default `Adam`.
  --alpha           float, penalty for local timing factor, default 0.1.
  --beta            float, penalty for global timing factor, default 0.05.
  --init            int, init offset for time series, default 0.
  --gpu_enable      bool, whether to use GPU, default False.
  --opt_metric      str, metric for optimizing out-classifier, default `accuracy`.
  --cache           bool, whether to save model cache, defualt False.
  --wrap            int, warp size in greedy-dtw, default 2.
  --cmethod         str, candidate generation method, one of `cluster` and `greedy`
  --kernel          str, choice of outer-classifer, default `xgb`.
  --percentile      int, distance threshold (percentile) in graph construction, default 10
  --measurement     str, distance measurement,default `gdtw`.
  --batch_size      int, batch size, default 50
  --tflag           bool, whether to use timing factors, default True.

Some of the arguments may require further explanation:

  • --K/--C: the number of shapelets should be carefully selected, and it is highly related with intrinsic properties of the dataset. And in our extensive experiments, C is often set 10 or 20 times of K to ensure that we can learn from a large pool of candidates.
  • --percentile , --alpha and --beta: we have conduct fine-tuning on several datasets, and in most cases we recommend the default settings, although modifying them may bring performance increment, as well as drop.

Demo

We include all three benchmark UCR datasets in the dataset directory, which is a subset of UCR-Archive time series dataset. See Data Sets for more details. Then a demo script is available by calling scripts/run.py, as the following:

python scripts/run.py --dataset ucr-Earthquakes --K 50 --C 500 
--num_segment 21 --seg_length 24 --data_size 1 --gpu_enable

Evaluation

Data Sets

The three benchmark datasets reported in [1] was made public by UCR, and detailed descriptions can be referred in Time2Graph. Furthermore, we apply the proposed Time2Graph model on three real-world scenarios: Electricity Consumption Records (ECR) and Elderly Electricity Records (EER) provided by State Grid of China, and Network Traffic Flow (NTF) from China Telecom. Detailed dataset descriptions can be found in our paper. The performance increment compared with existing models clearly demonstrate the effectiveness of the framework, and below we list the final results along with several popular baselines.

Performance

Accuracy on UCR(%) Earthquakes WormsTwoClass Strawberry
NN-DTW 70.31 68.16 95.53
TSF 74.67 68.51 96.27
FS 74.66 70.58 91.66
Time2Graph 79.14 72.73 96.76
Time2Graph+ 77.70 71.43 96.49
Performance on ECR(%) Precision Recall F1
NN-DTW 15.52 18.15 16.73
TSF 26.32 2.02 3.75
FS 10.45 79.84* 18.48
Time2Graph 30.10 40.26 34.44
Time2Graph+ 35.94 44.81 39.88
Performance on NTF(%) Precision Recall F1
NN-DTW 33.20 43.75 37.75
TSF 57.52 33.85 42.62
FS 63.55 35.42 45.49
Time2Graph 71.52 56.25 62.97
Time2Graph+ 97.62 48.81 65.08
Performance on NTF(%) Precision Recall F1
NN-DTW 33.20 43.75 37.75
TSF 57.52 33.85 42.62
FS 63.55 35.42 45.49
Time2Graph 71.52 56.25 62.97
Time2Graph+ 32.80 66.19 43.87

Please refer to our paper [1] for detailed information about the experimental settings, the description of unpublished data sets, the full results of our experiments, along with ablation and observational studies. Last but not least, we have deployed Time2Graph+ model in a real-world application, elderly recognition, cooperated with State Grid of China, Jinhua Zhejiang. See Project Homepage and our paper for details.

Reference

[1] Cheng, Z; Yang, Y; Jiang, S; Hu, W; Ying, Z and Chai, Z, 2021, Time2Graph: Bridging Time Series and Graph Representation Learning via Multiple Attentions, under review.

[2] Cheng, Z; Yang, Y; Wang, W; Hu, W; Zhuang, Y and Song, G, 2020, Time2Graph: Revisiting Time Series Modeling with Dynamic Shapelets, In AAAI, 2020

@inproceedings{cheng2020time2graph,
  title = "{Time2Graph: Revisiting Time Series Modeling with Dynamic Shapelets}", 
  author = {{Cheng}, Z. and {Yang}, Y. and {Wang}, W. and {Hu}, W. and {Zhuang}, Y. and {Song}, G.}, 
  booktitle={Proceedings of Association for the Advancement of Artificial Intelligence (AAAI)},
  year = 2020, 
} 

time2graphplus's People

Contributors

petecheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

time2graphplus's Issues

License Issue

你好,请问Time2Graph以及Time2GraphPlus代码是否有License?是否可以修改和使用其中的代码?谢谢

运行问题

下载后不知道在哪里添加自己的数据集,没有运行的.py文件,不知道是不是在WIndows系统上的原因,感觉下载后的代码不完整
cb6101f3c61ca6018d61b63aa4f3597

Replicate accuracy results from benchmark datasets

Hi,

I'm wondering how can I replicate these results (the ones in bold for the Earthquakes, WormsTwoClass and Strawberry datasets):

image

I have tried using the benchmark_test.py in this way, for example--> python3 benchmark_test.py --dataset ucr-WormsTwoClass (same for other datasets), but results are different than the ones from the figure.

I'm aware there are several more parameters that I could use in the command above, but would like to know which ones from the different parameters found in benchmark_test.py should I use:
image

Thanks!

run on my dataset have an error

File "/home/featurize/Time2GraphPlus-main/time2graph/core/model_utils.py", line 34, in __get_weight
cnt[lb] += 1
TypeError: only integer scalar arrays can be converted to a scalar index

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.