GithubHelp home page GithubHelp logo

patrickzib / sfa Goto Github PK

View Code? Open in Web Editor NEW
306.0 306.0 66.0 114.85 MB

Scalable Time Series Data Analytics

License: GNU General Public License v3.0

Java 100.00%
classification indexing similarity-measures time-series

sfa's People

Contributors

assaad avatar christiansch avatar mohataher avatar patrickzib avatar watanabe8760 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sfa's Issues

README Inconsistent

HI,

The code in the README seems to be different than how the usage is in the test cases.
For example:

Score score = weasel.fit(trainSamples);
Score should be Classifier.Score

Is this correct? If so, should the README be updated?

Adding load and save methods to WEASELClassifier

After training, wouldn't it be a good idea to save a model so we could load it later?

Idealy, we need to add a static load() and save() methods to Classifier so it would work with all classifiers implemented in the project.

I will be happy to help.

SFATrie search by radius

Hello patrick, thanks for your effort,
I am adding the issue here just not to forget, if possible to enable back seach by radius on SFATrie instead of just searching for the K nearest neighbors.
Cheers!

TEASER algorithm - Python version

Dear authors,

First of all, congratulation for your work!

Second, is there a version of TEASER algorithm that can be used in python?

Thank you and best regards,

Andrea

Use New Dataset

I have my data in csv format.
How do I get the code to read in the csv files?

The only way I have been able to get it to work so far is by doing the following:

open CBF in notepad, then paste in my data, then save
I did this for both train and test

Memory related errors when running UCR tests

I am getting errors on about 15 data sets out of 85 data sets of UCR archive.
I run in ItelliJ IDEA with -Xmx2048m
For example:

Done reading from /home/p/augment/UCR_TS_Archive_2015/Phoneme/Phoneme_TEST samples 1896 queryLength 1024
Done reading from /home/p/augment/UCR_TS_Archive_2015/Phoneme/Phoneme_TRAIN samples 214 queryLength 1024
Exception in thread "main" com.carrotsearch.hppc.BufferAllocationException: Not enough memory to allocate buffers for rehashing: 65,536 -> 131,072
at com.carrotsearch.hppc.IntIntHashMap.allocateBuffers(IntIntHashMap.java:1144)
at com.carrotsearch.hppc.IntIntHashMap.allocateThenInsertThenRehash(IntIntHashMap.java:1169)
at com.carrotsearch.hppc.IntIntHashMap.put(IntIntHashMap.java:177)
at com.carrotsearch.hppc.IntIntHashMap.putOrAdd(IntIntHashMap.java:256)
at sfa.transformation.WEASEL.createBagOfPatterns(WEASEL.java:164)
at sfa.classification.WEASELClassifier.predict(WEASELClassifier.java:130)
at sfa.classification.WEASELClassifier.score(WEASELClassifier.java:124)
at sfa.classification.WEASELClassifier.eval(WEASELClassifier.java:95)
at sfa.MyUCRmain.main(MyUCRmain.java:61)
Caused by: java.lang.OutOfMemoryError: Java heap space

Instructions on how to build the project?

I learned Java many years ago, without using the tools that this project uses (Gradle, Maven). It would be great to have instructions on how to build the project, including IDE preference, and what software versions are needed (JDK 10? Gradle version >= 4.8)? Some things that come to mind include:

  • Installing Gradle (there are multiple ways)
  • How to import the project
  • Gradle configuration

Non-transformed data in MFT

In the MFT class, the fft object is initialised with the windowSize when the MFT instance is initialised, however when the array size is set later:

int arraySize = Math.max(l+this.startOffset, this.windowSize);

and when this is used in line 107:

mftData = Arrays.copyOf(timeSeries.getData(), arraySize);
this.fft.realForward(mftData);
mftData[1] = 0;

if arraySize is larger than the windowSize, then it leaves some of the mftData not transformed and is then used later on in the rest of the MFT and in the classification.

I saw this happen at windowSize = 5, l = 4 and so arraySize = 6

This has an implication on the creation of words especially at small window sizes, however I am not sure how often this occurs or how much it affects the final results

WEASEL+MUSE: Datasets Format

Hallo

Thank you for your fascinating work!
Which structure have the datasets in the datasets folder? They don't look like original UCI datasets...
What are the columns?

For example, DigitShapeRandom:
1 1 1 0.3421972205305417 -1.594004942648406
1 2 1 0.3490627644881473 -1.4250704156116172
1 3 1 0.3353316765729366 -1.2647991976536381
1 4 1 0.3559283084457524 -1.0915330160774444
1 5 1 0.3559283084457524 -0.9225984890406556
1 6 1 0.35249553646694987 -0.7709905801614865
1 7 1 0.3490627644881473 -0.5847294349670782
1 8 1 0.37309216833976566 -0.4634431078637429
1 9 1 0.37309216833976566 -0.35515174437862146

First column seems to be the class. And the rest?

Thank you very much for your time.

Regards
Ilja

getEuclideanDistance should depends on the state of timeseries

The getEuclidean Distance gives incorrect results.
It should depends whether the timeseries is normalized or not and whether the query is normalized or not.
In case of normalization, both should be de-normalized (value * std + avg).
The current implementation normalize again !

As well, for the epsilon radius search, std needs to be taken into account
Cheers!
Assaad

MTSClassificationTest fails

MTSClassificationTest fails line 443 because offset is out of bounds.
This is caused by changes in commit 97aee8e.

Reverting just this loop to the previous code works.

Using asserts in test cases

In tests cases, no assertions are used. Only printing is insufficient to prove the test case is working properly. A good example on using assertion is here. Have a look at assertEquals and assertTrue.

Planning a release?

Hi Patrick:

Great effort has been put into this library later. Such a great work. Do you have any plans for a v0.1 version release?

We should also aim at uploading that release's artifact to the Maven's central repository. This will make it easier for new comers to use it in their projects and give the library an industrial legitmacy.

Running WeaselMUSE on UCR dataset

Hi Patrick,

I recently read your paper on WEASEL+MUSE, it's a very good work.

I'm now trying to run WEASEL+MUSE, on UCR multivariate datasets (30 datasets), I think it is not possible to run on ts formatted dataset. Is there a way to run the code from txt files? Am I missing something here?

Generated models for WEASEL are large in size

When working with WEASEL, I noticed save() method generates a seriously large file to represent the classifier model; approximately 300 MB model for a moderately small dataset. I'm not sure exactly why but my first guess it's saving training data as it's linked somewhere in the dependencies of the objects.

Do you have any idea?

Moving to maven

Nice work. In order to get this code standardised, why not using Maven? It's the most acceptable packaging, build and automation tool for Java. It will also help with wiring unit test cases for the project.

I'll be happy to help ;)

TimeSeriesLoader.java exception

I am not able to load any dataset (any type arff or csv). For example: TwoLeadECG dataset I got this exception.

Exception in thread "main" java.lang.NumberFormatException: For input string: "-0.04372,0.003764,0.039377,0.1106,0.18183,0.30054,0.24118,0.20557,0.16996,0.074989,0.063118,0.015635,-0.031849,-0.067461,-0.079332,-0.091203,-0.15056,-0.22178,-0.24552,-0.2574,-0.2574,-0.24552,-0.23365,-0.24552,-0.28114,-0.2574,-0.2574,0.039377,0.098731,-0.70849,-1.7769,-2.3585,-2.5366,-2.5841,-2.4298,-2.0974,-1.9905,-1.9193,-1.8125,-1.765,-1.5869,-1.2545,-0.87468,-0.57791,-0.3761,-0.15056,0.18183,0.46673,0.57357,0.66853,0.69227,0.69227,0.73976,0.7635,0.81098,0.8466,0.88221,0.95343,1.0009,1.0484,1.1078,1.1671,1.2383,1.2739,1.2858,1.3333,1.357,1.3333,1.2739,1.1909,1.0603,0.95343,0.81098,0.69227,0.57357,0.44299,0.33615,0.21744,0.14621,0.08686,0.027506,-0.008107,1" at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at sun.misc.FloatingDecimal.parseDouble(Unknown Source) at java.lang.Double.parseDouble(Unknown Source) at java.lang.Double.valueOf(Unknown Source) at sfa.timeseries.TimeSeriesLoader.loadDataset(TimeSeriesLoader.java:53)

Am I missing something or there is a bug in the loader class?
Thanks

ArrayIndexOutOfBoundsException when runnig WEASEL

When running WEASEL on time series data whose lengths vary, the following error occurs (not always, but sometimes).

java.lang.ArrayIndexOutOfBoundsException: 47
	at sfa.timeseries.TimeSeries.calcIncrementalMeanStddev(TimeSeries.java:237)
	at sfa.transformation.MFT.transformWindowing(MFT.java:115)
	at sfa.transformation.SFA.transformWindowing(SFA.java:292)
	at sfa.transformation.SFA.transformWindowingInt(SFA.java:327)
	at sfa.transformation.WEASEL.createWords(WEASEL.java:122)
	at sfa.transformation.WEASEL.access$000(WEASEL.java:21)
	at sfa.transformation.WEASEL$1.run(WEASEL.java:96)
	at sfa.classification.ParallelFor$1.run(ParallelFor.java:31)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
java.lang.ArrayIndexOutOfBoundsException: 47
	at sfa.timeseries.TimeSeries.calcIncrementalMeanStddev(TimeSeries.java:237)
	at sfa.transformation.MFT.transformWindowing(MFT.java:115)
	at sfa.transformation.SFA.transformWindowing(SFA.java:292)
	at sfa.transformation.SFA.transformWindowingInt(SFA.java:327)
	at sfa.transformation.WEASEL.createWords(WEASEL.java:122)
	at sfa.transformation.WEASEL.access$000(WEASEL.java:21)
	at sfa.transformation.WEASEL$1.run(WEASEL.java:96)
	at sfa.classification.ParallelFor$1.run(ParallelFor.java:31)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Exception in thread "main" java.lang.NullPointerException
	at sfa.transformation.WEASEL.createBagOfPatterns(WEASEL.java:151)
	at sfa.classification.WEASELClassifier.predict(WEASELClassifier.java:125)
	at xxxx.execution.WEASELClassifierCrossValidator.main(WEASELClassifierCrossValidator.java:45)

The cause seems to be windowSize in MFT class, which is set from the length of first sample of TimeSeries [] at the line 354 in SFA class.

      if (this.transformation == null) {
        this.transformation = new MFT(samples[0].getLength(), normMean, this.lowerBounding, this.mftUseMaxOrMin);
      }

It should not assume that the length of first sample is the representative, should it?

NullPointerException in WEASEL.createBagOfPatterns

It feels like it's related #17. I encountered the following two errors (different data sets).
There must be an inconsistency between number of elements of words expected and actually created.

java.lang.NullPointerException
	at sfa.transformation.WEASEL.createBagOfPatterns(WEASEL.java:147)
	at sfa.classification.WEASELClassifier.fitWeasel(WEASELClassifier.java:162)
	at sfa.classification.WEASELClassifier.fit(WEASELClassifier.java:110)
	at xxxxx.execution.WEASELClassifierCrossValidator.main(WEASELClassifierCrossValidator.java:40)
Exception in thread "main" java.lang.NullPointerException
	at sfa.classification.WEASELClassifier.fit(WEASELClassifier.java:113)
	at xxxxx.execution.WEASELClassifierCrossValidator.main(WEASELClassifierCrossValidator.java:40)
java.lang.NullPointerException
	at sfa.transformation.WEASEL.createBagOfPatterns(WEASEL.java:151)
	at sfa.classification.WEASELClassifier.fitWeasel(WEASELClassifier.java:162)
	at sfa.classification.WEASELClassifier.fit(WEASELClassifier.java:110)
	at xxxxx.execution.WEASELClassifierCrossValidator.main(WEASELClassifierCrossValidator.java:40)
Exception in thread "main" java.lang.NullPointerException
	at sfa.classification.WEASELClassifier.fit(WEASELClassifier.java:113)
	at xxxxx.execution.WEASELClassifierCrossValidator.main(WEASELClassifierCrossValidator.java:40)

Lack of documentation

For example, fitTransform parameters are not super clear what is expected as input
(sorry there is no time to read the entire paper before using the code)

Size of training data

How many examples should we use for training if using our own data?

Do you have any suggestions? 1000 examples per class?

What would be small? 50 examples per class? Would the algorithms perform worse in this case?

Meaning of output

What is the meaning of the output:

CBF;WEASEL;0.967;0.998

Specifically the last two numbers. Are they the train and then test accuracy?

Data for histograms

Is there a way to get the data for the histograms?

  • For example WEASEL histogram has weights and words.

    • How do we get this data to produce the histograms?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.