GithubHelp home page GithubHelp logo

mrsqm's People

Contributors

heerme avatar lnthach avatar mbalatsko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mrsqm's Issues

Best Parameters on UCR-Datasets

Hi, great work! :)

I am having problems to set the best hyper-parameters for the UCR-datasets.

Currently, the default-parameters seem to be optimized for speed, but the accuracy is rather low (measured on my hardware)

Total mean-accuracy: 0.814
Total fit_time: 3257.89
Total pred_time: 1965.34

What would be the best set of parameters to obtain get results from the paper? In the paper this is referred to as MrSQM_SFA_k5.

package should have `__version__` attribute

It is difficult to determine the version of the package robustly since it does not have a __version__ attribute - that should be added.

Related to #1 as I wanted to add a bound >=0.0.2 as a requirement where the persistence bug is fixed.

Also related: sktime/sktime#5172

`MrSQMClassifier` non-compliance with `sktime` serialization interface

@lnthach, @heerme, I've tried integrating MrSQM to distribute it with sktime (by interface to this package), also as a proof-of-concept of how we could back-integrate MrSEQL.
Issue for MrSQM: sktime/sktime#4338
Issue for MrSEQL: sktime/sktime#4296

The PR sktime/sktime#4337 works in that it proves that we can integrate cython based estimators with sktime tests and the up-to-date framework, if those estimators are in an external package such as mrsql (this one).

Now, with the tests working, it would appear that MrSQM is failing two contract tests:

  • _get_fitted_params, fitted parameter inspection
  • serialization, i.e., pickling does not work

I assume the same tests would fail if you run check_estimator, i.e.,

from sktime.utils.estimator_checks import check_estimator

check_estimator(MrSQMClassifier, raise_exceptions=True)

How do we proceed from here?

Problem 1: serialization

I'm not too familiar with cython, the exception when serializing (saving to binary) is

TypeError: no default __reduce__ due to non-trivial __cinit__

To make it work, I suspect one would have to implement serialization via the higher-level interface (save and load) or via the lower-level interface, __reduce__. @achieveordie has done that previously for classes of estimators, maybe he has more insight.

To provoke the error without the test suite, do pickle(my_estimator) on a fitted instance of MrSQMClassifier (that should work without complaining for an easy solution via the lower-level interface).

Problem 2: fitted parameters

The other issue is get_fitted_params, this should be implemented either directly or via inheritance from BaseClassifier - that can be solved on the sktime side though.

should we add the `firstdiff` parameter in `sktime` too?

@lnthach, quick question: should we add the firstdiff parameter in sktime too?

You could make a PR to change the interface class, since changes in parameters do not propagate through the interface.

There's also the issue that the firstdiff parameter is only availale from 0.0.4 on, so we need to take care that users with lower versions don't suddenly error out. (only relevant if we change the interface in sktime)

Also, quick question: not sure whether there is a need to add this in mrsqm - you can easily get the same logic from the pipeline

from sktime.classification.shapelet_based import MrSQM
from sktime.transformations.series.difference import Differencer

sqm = MrSQM(some_params)
first_diff = Differencer()

sqm_with_first_diff = first_diff * sqm

:-)

Of course you decide which parameterizations you want to ship with your estimator, e.g., for convenience, or as a scientific implementation reference to a paper. Just pointing out that users of pipelines can get this (and similar) functionality already.

Error when using sklearn crossvalidation

Code:

clf = mrsqm.MrSQMClassifier(nsax=0,nsfa=5, random_state=0) scores = cross_validate(clf, X_train, y_train, scoring=['accuracy'], cv=3)

Error:

TypeError: Cannot clone object '<mrsqm.mrsqm_wrapper.MrSQMClassifier object at 0x7fc806dc8ac0>' (type <class 'mrsqm.mrsqm_wrapper.MrSQMClassifier'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' method.

pip install fails

Build fails because gcc has an unknown argument??

Processing /Users/vipulp/Downloads/mrsqm-main/mrsqm
Building wheels for collected packages: mrsqm
Building wheel for mrsqm (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /Users/vipulp/opt/anaconda3/envs/timeseries/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-req-build-dq2fhl0_/setup.py'"'"'; file='"'"'/private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-req-build-dq2fhl0_/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-wheel-7sxl9is0
cwd: /private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-req-build-dq2fhl0_/
Complete output (10 lines):
running bdist_wheel
running build
running build_ext
building 'mrsqm' extension
creating build
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/sfa
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vipulp/opt/anaconda3/envs/timeseries/include -arch x86_64 -I/Users/vipulp/opt/anaconda3/envs/timeseries/include -arch x86_64 -I/Users/vipulp/opt/anaconda3/envs/timeseries/include/python3.7m -c mrsqm_wrapper.cpp -o build/temp.macosx-10.9-x86_64-3.7/mrsqm_wrapper.o -Wall -Ofast -g -std=c++11 -mfpmath=both -ffast-math
error: unknown FP unit 'both'
error: command 'gcc' failed with exit status 1

`MrSQMClassifier`'s `random_state` parameter seem ineffective for `predict_proba` if `nsfa>0`

It seems like the pickling bug is indeed fixed!

This now allows to run save/load tests, which indicate that at least predict_proba is not frozen from setting random_state as it should, in the nsfa>0 case (not in the nsfa=0 case where the same tests pass).'

Theoretically, it could also be an issue with the pickling, not random_state - then I would guess the most likely reason to be a loss of numerical precision occurring when you serialize or deserialize.

Error message below, full error log is in https://github.com/sktime/sktime/actions/runs/6000661723/job/16273328621?pr=5171

=========================== short test summary info ============================
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_save_estimators_to_file[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError: 
Arrays are not almost equal to 6 decimals
Results of predict_proba differ between saved and loaded estimator MrSQM
Mismatched elements: 10 / 10 (100%)
Max absolute difference: 0.00205801
Max relative difference: 0.13931118
 x: array([[0.992248, 0.007752],
       [0.017839, 0.982161],
       [0.023511, 0.976489],...
 y: array([[0.992762, 0.007238],
       [0.018737, 0.981263],
       [0.024453, 0.975547],...
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_fit_idempotent[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError: 
Arrays are not almost equal to 6 decimals

Mismatched elements: 10 / 10 (100%)
Max absolute difference: 0.00414175
Max relative difference: 0.19419357
 x: array([[0.992142, 0.007858],
       [0.017063, 0.982937],
       [0.02547 , 0.97453 ],...
 y: array([[0.993322, 0.006678],
       [0.01615 , 0.98385 ],
       [0.021328, 0.978672],...
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_persistence_via_pickle[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError: 
Arrays are not almost equal to 6 decimals
Results of predict_proba differ between when pickling and not pickling, estimator MrSQM
Mismatched elements: 6 / 10 (60%)
Max absolute difference: 0.00174577
Max relative difference: 0.11265353
 x: array([[0.992649, 0.007351],
       [0.013751, 0.986249],
       [0.019901, 0.980099],...
 y: array([[0.992649, 0.007351],
       [0.015497, 0.984503],
       [0.019901, 0.980099],...
====== 3 failed, 69 passed, 157 skipped, 4 warnings in 102.22s (0:01:42) =======

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.