This is a collection of 22 time series features contained in the hctsa toolbox coded in C. Features were selected by their classification performance across a collection of 93 real-world time-series classification problems. The included features only evaluate dynamical properties of time series and do not respond to differences in mean or variance. We suggest to add these features of the raw value distribution if they might be useful for your data.
For information on how this feature set was constructed see our preprint:
- C.H. Lubba, S.S. Sethi, P. Knaute, S.R. Schultz, B.D. Fulcher, N.S. Jones. catch22: CAnonical Time-series CHaracteristics. Data Mining and Knowledge Discovery (2019).
For information on the full set of over 7000 features, see the following (open) publications:
- B.D. Fulcher and N.S. Jones. hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell Systems 5, 527 (2017).
- B.D. Fulcher, M.A. Little, N.S. Jones Highly comparative time-series analysis: the empirical structure of time series and their methods. J. Roy. Soc. Interface 10, 83 (2013).
The fast C-coded functions in this repository can be used in Python, Matlab, and R following the instructions below. Time series are z-scored internally which means e.g., constant time series will lead to NaN outputs. The wrappers are only tested on OS X so far and require Clang.
Installation of the Python wrapper differs slightly between Python 2 and 3.
Go to the directory wrap_Python
and run the following
python setup.py build
python setup.py install
or alternatively, using pip, go to main directory and run
pip install -e wrap_Python
Manual installation through distutils
python3 setup_P3.py build
python3 setup_P3.py install
Or using pip
pip install catch22
To test that the catch22 wrapper was installed successfully and works run (NB: replace python
with python3
for Python 3):
$ python testing.py
The module is now available under the name catch22
. Each feature function can be accessed individually and takes arrays as tuple or lists (not Numpy
-arrays). E.g., for loaded data, tsData
in Python:
import catch22
catch22.CO_f1ecac(tsData)
All features are bundeled in the method catch22_all
which also accepts numpy arrays and gives back a dictionary containing the entries catch22_all['names']
for feature names and catch22_all['values']
for feature outputs.
from catch22 import catch22_all
catch22_all(tsData)
This assumes your have R
installed and the package Rcpp
is available. Clang is required.
Copy all .c
- and .h
-files from ./C
to ./wrap_R/catch22/src
. Then go to the directory ./wrap_R
and run the following two lines while replacing x.y
by the current version number
R CMD build catch22
R CMD INSTALL catch22_x.y.tar.gz
To test if the installation was successful, navigate to ./wrap_R
in the console and run:
$ Rscript testing.R
The module is now available in R
as catch22
. Single functions can be accessed by their name, all functions are bundeled as catch22_all
which can be called with a data vector tsData
as an argument and gives back a data frame with the variables name
for feature names and values
for feature outputs:
library(catch22)
catch22_out = catch22_all(tsData);
print(catch22_out)
Go to the wrap_Matlab
directory and call mexAll
from within Matlab. Include the folder in your Matlab path to use the package.
To test, navigate to the wrap_Matlab
directory from within Matlab and run:
testing
All feature can be called individually, e.g. catch22_CO_f1ecac
. Alternatively, all features are bundeled in a function catch22_all
which returns an array of feature outputs and, as a second output, a cell array of feature names. With loaded data tsData
:
[vals, names] = catch22_all(data);
gcc -o run_features main.c CO_AutoCorr.c DN_HistogramMode_10.c DN_HistogramMode_5.c DN_OutlierInclude.c FC_LocalSimple.c IN_AutoMutualInfoStats.c MD_hrv.c PD_PeriodicityWang.c SB_BinaryStats.c SB_CoarseGrain.c SB_MotifThree.c SB_TransitionMatrix.c SC_FluctAnal.c SP_Summaries.c butterworth.c fft.c helper_functions.c histcounts.c splinefit.c stats.c
As for OS X but with -lm
switch in from of every source-file name.
The compiled run_features
program only takes one time series at a time. Usage is ./run_features <infile> <outfile>
in the terminal, where specifying <outfile>
is optional, it prints to stdout
by default.
For multiple time series, put them โ one file for each โ into a folder timeSeries
and call ./runAllTS.sh
. The output will be written into a folder featureOutput
. Do change the permissions of runAllTS.sh
to executable by calling chmod 755 runAllTS.sh
.
Each line of the output correponds to one feature; the three comma-separated entries per line correspond to feature value, feature name and feature execution time in milliseconds. E.g.
0.29910714285714, CO_Embed2_Basic_tau.incircle_1, 0.341000
0.57589285714286, CO_Embed2_Basic_tau.incircle_2, 0.296000
...
Sample outputs for the time series test.txt
and test2.txt
are provided as test_output.txt
and test2_output.txt
. The first two entries per line should always be the same. The third one (execution time) will be different.