mlgig / mrsqm Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 8.0 24.53 MB

License: GNU General Public License v3.0

Python 4.25% C++ 44.34% Cython 17.22% Jupyter Notebook 34.19%

mrsqm's People

Contributors

Stargazers

Watchers

Forkers

softwareimpacts prathapkumarbaratam georgefragulis lnthach patrickzib harel-coffee mbalatsko vareto-forks

mrsqm's Issues

Best Parameters on UCR-Datasets

Hi, great work! :)

I am having problems to set the best hyper-parameters for the UCR-datasets.

Currently, the default-parameters seem to be optimized for speed, but the accuracy is rather low (measured on my hardware)

Total mean-accuracy: 0.814
Total fit_time: 3257.89
Total pred_time: 1965.34

What would be the best set of parameters to obtain get results from the paper? In the paper this is referred to as MrSQM_SFA_k5.

package should have `version` attribute

It is difficult to determine the version of the package robustly since it does not have a __version__ attribute - that should be added.

Related to #1 as I wanted to add a bound >=0.0.2 as a requirement where the persistence bug is fixed.

Also related: sktime/sktime#5172

`MrSQMClassifier` non-compliance with `sktime` serialization interface

@lnthach, @heerme, I've tried integrating MrSQM to distribute it with sktime (by interface to this package), also as a proof-of-concept of how we could back-integrate MrSEQL.
Issue for MrSQM: sktime/sktime#4338
Issue for MrSEQL: sktime/sktime#4296

The PR sktime/sktime#4337 works in that it proves that we can integrate cython based estimators with sktime tests and the up-to-date framework, if those estimators are in an external package such as mrsql (this one).

Now, with the tests working, it would appear that MrSQM is failing two contract tests:

_get_fitted_params, fitted parameter inspection
serialization, i.e., pickling does not work

I assume the same tests would fail if you run check_estimator, i.e.,

from sktime.utils.estimator_checks import check_estimator

check_estimator(MrSQMClassifier, raise_exceptions=True)

How do we proceed from here?

Problem 1: serialization

I'm not too familiar with cython, the exception when serializing (saving to binary) is

TypeError: no default __reduce__ due to non-trivial __cinit__

To make it work, I suspect one would have to implement serialization via the higher-level interface (save and load) or via the lower-level interface, __reduce__. @achieveordie has done that previously for classes of estimators, maybe he has more insight.

To provoke the error without the test suite, do pickle(my_estimator) on a fitted instance of MrSQMClassifier (that should work without complaining for an easy solution via the lower-level interface).

Problem 2: fitted parameters

The other issue is get_fitted_params, this should be implemented either directly or via inheritance from BaseClassifier - that can be solved on the sktime side though.

should we add the `firstdiff` parameter in `sktime` too?

@lnthach, quick question: should we add the firstdiff parameter in sktime too?

You could make a PR to change the interface class, since changes in parameters do not propagate through the interface.

There's also the issue that the firstdiff parameter is only availale from 0.0.4 on, so we need to take care that users with lower versions don't suddenly error out. (only relevant if we change the interface in sktime)

Also, quick question: not sure whether there is a need to add this in mrsqm - you can easily get the same logic from the pipeline

from sktime.classification.shapelet_based import MrSQM
from sktime.transformations.series.difference import Differencer

sqm = MrSQM(some_params)
first_diff = Differencer()

sqm_with_first_diff = first_diff * sqm

:-)

Of course you decide which parameterizations you want to ship with your estimator, e.g., for convenience, or as a scientific implementation reference to a paper. Just pointing out that users of pipelines can get this (and similar) functionality already.

Error when using sklearn crossvalidation

Code:

clf = mrsqm.MrSQMClassifier(nsax=0,nsfa=5, random_state=0) scores = cross_validate(clf, X_train, y_train, scoring=['accuracy'], cv=3)

Error:

TypeError: Cannot clone object '<mrsqm.mrsqm_wrapper.MrSQMClassifier object at 0x7fc806dc8ac0>' (type <class 'mrsqm.mrsqm_wrapper.MrSQMClassifier'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' method.

pip install fails

Build fails because gcc has an unknown argument??

Processing /Users/vipulp/Downloads/mrsqm-main/mrsqm
Building wheels for collected packages: mrsqm
Building wheel for mrsqm (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /Users/vipulp/opt/anaconda3/envs/timeseries/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-req-build-dq2fhl0_/setup.py'"'"'; file='"'"'/private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-req-build-dq2fhl0_/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-wheel-7sxl9is0
cwd: /private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-req-build-dq2fhl0_/
Complete output (10 lines):
running bdist_wheel
running build
running build_ext
building 'mrsqm' extension
creating build
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/sfa
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vipulp/opt/anaconda3/envs/timeseries/include -arch x86_64 -I/Users/vipulp/opt/anaconda3/envs/timeseries/include -arch x86_64 -I/Users/vipulp/opt/anaconda3/envs/timeseries/include/python3.7m -c mrsqm_wrapper.cpp -o build/temp.macosx-10.9-x86_64-3.7/mrsqm_wrapper.o -Wall -Ofast -g -std=c++11 -mfpmath=both -ffast-math
error: unknown FP unit 'both'
error: command 'gcc' failed with exit status 1

module 'mrsqm' has no attribute 'MrSQMClassifier'

module 'mrsqm' has no attribute 'MrSQMClassifier' when running : clf = mrsqm.MrSQMClassifier().fit(X_train,y_train)

`MrSQMClassifier`'s `random_state` parameter seem ineffective for `predict_proba` if `nsfa>0`

It seems like the pickling bug is indeed fixed!

This now allows to run save/load tests, which indicate that at least predict_proba is not frozen from setting random_state as it should, in the nsfa>0 case (not in the nsfa=0 case where the same tests pass).'

Theoretically, it could also be an issue with the pickling, not random_state - then I would guess the most likely reason to be a loss of numerical precision occurring when you serialize or deserialize.

Error message below, full error log is in https://github.com/sktime/sktime/actions/runs/6000661723/job/16273328621?pr=5171

=========================== short test summary info ============================
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_save_estimators_to_file[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError: 
Arrays are not almost equal to 6 decimals
Results of predict_proba differ between saved and loaded estimator MrSQM
Mismatched elements: 10 / 10 (100%)
Max absolute difference: 0.00205801
Max relative difference: 0.13931118
 x: array([[0.992248, 0.007752],
       [0.017839, 0.982161],
       [0.023511, 0.976489],...
 y: array([[0.992762, 0.007238],
       [0.018737, 0.981263],
       [0.024453, 0.975547],...
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_fit_idempotent[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError: 
Arrays are not almost equal to 6 decimals

Mismatched elements: 10 / 10 (100%)
Max absolute difference: 0.00414175
Max relative difference: 0.19419357
 x: array([[0.992142, 0.007858],
       [0.017063, 0.982937],
       [0.02547 , 0.97453 ],...
 y: array([[0.993322, 0.006678],
       [0.01615 , 0.98385 ],
       [0.021328, 0.978672],...
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_persistence_via_pickle[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError: 
Arrays are not almost equal to 6 decimals
Results of predict_proba differ between when pickling and not pickling, estimator MrSQM
Mismatched elements: 6 / 10 (60%)
Max absolute difference: 0.00174577
Max relative difference: 0.11265353
 x: array([[0.992649, 0.007351],
       [0.013751, 0.986249],
       [0.019901, 0.980099],...
 y: array([[0.992649, 0.007351],
       [0.015497, 0.984503],
       [0.019901, 0.980099],...
====== 3 failed, 69 passed, 157 skipped, 4 warnings in 102.22s (0:01:42) =======

mlgig / mrsqm Goto Github PK

mrsqm's People

Contributors

Stargazers

Watchers

Forkers

mrsqm's Issues

Best Parameters on UCR-Datasets

package should have `version` attribute

`MrSQMClassifier` non-compliance with `sktime` serialization interface

Problem 1: serialization

Problem 2: fitted parameters

should we add the `firstdiff` parameter in `sktime` too?

Error when using sklearn crossvalidation

pip install fails

module 'mrsqm' has no attribute 'MrSQMClassifier'

`MrSQMClassifier`'s `random_state` parameter seem ineffective for `predict_proba` if `nsfa>0`

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs