mlgig / mrsqm Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
Hi, great work! :)
I am having problems to set the best hyper-parameters for the UCR-datasets.
Currently, the default-parameters seem to be optimized for speed, but the accuracy is rather low (measured on my hardware)
Total mean-accuracy: 0.814
Total fit_time: 3257.89
Total pred_time: 1965.34
What would be the best set of parameters to obtain get results from the paper? In the paper this is referred to as MrSQM_SFA_k5
.
It is difficult to determine the version of the package robustly since it does not have a __version__
attribute - that should be added.
Related to #1 as I wanted to add a bound >=0.0.2
as a requirement where the persistence bug is fixed.
Also related: sktime/sktime#5172
@lnthach, @heerme, I've tried integrating MrSQM
to distribute it with sktime
(by interface to this package), also as a proof-of-concept of how we could back-integrate MrSEQL
.
Issue for MrSQM
: sktime/sktime#4338
Issue for MrSEQL
: sktime/sktime#4296
The PR sktime/sktime#4337 works in that it proves that we can integrate cython based estimators with sktime
tests and the up-to-date framework, if those estimators are in an external package such as mrsql
(this one).
Now, with the tests working, it would appear that MrSQM
is failing two contract tests:
_get_fitted_params
, fitted parameter inspectionI assume the same tests would fail if you run check_estimator
, i.e.,
from sktime.utils.estimator_checks import check_estimator
check_estimator(MrSQMClassifier, raise_exceptions=True)
How do we proceed from here?
I'm not too familiar with cython, the exception when serializing (saving to binary) is
TypeError: no default __reduce__ due to non-trivial __cinit__
To make it work, I suspect one would have to implement serialization via the higher-level interface (save
and load
) or via the lower-level interface, __reduce__
. @achieveordie has done that previously for classes of estimators, maybe he has more insight.
To provoke the error without the test suite, do pickle(my_estimator)
on a fitted instance of MrSQMClassifier
(that should work without complaining for an easy solution via the lower-level interface).
The other issue is get_fitted_params
, this should be implemented either directly or via inheritance from BaseClassifier
- that can be solved on the sktime
side though.
@lnthach, quick question: should we add the firstdiff
parameter in sktime
too?
You could make a PR to change the interface class, since changes in parameters do not propagate through the interface.
There's also the issue that the firstdiff
parameter is only availale from 0.0.4 on, so we need to take care that users with lower versions don't suddenly error out. (only relevant if we change the interface in sktime
)
Also, quick question: not sure whether there is a need to add this in mrsqm
- you can easily get the same logic from the pipeline
from sktime.classification.shapelet_based import MrSQM
from sktime.transformations.series.difference import Differencer
sqm = MrSQM(some_params)
first_diff = Differencer()
sqm_with_first_diff = first_diff * sqm
:-)
Of course you decide which parameterizations you want to ship with your estimator, e.g., for convenience, or as a scientific implementation reference to a paper. Just pointing out that users of pipelines can get this (and similar) functionality already.
Code:
clf = mrsqm.MrSQMClassifier(nsax=0,nsfa=5, random_state=0) scores = cross_validate(clf, X_train, y_train, scoring=['accuracy'], cv=3)
Error:
TypeError: Cannot clone object '<mrsqm.mrsqm_wrapper.MrSQMClassifier object at 0x7fc806dc8ac0>' (type <class 'mrsqm.mrsqm_wrapper.MrSQMClassifier'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' method.
Build fails because gcc has an unknown argument??
Processing /Users/vipulp/Downloads/mrsqm-main/mrsqm
Building wheels for collected packages: mrsqm
Building wheel for mrsqm (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /Users/vipulp/opt/anaconda3/envs/timeseries/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-req-build-dq2fhl0_/setup.py'"'"'; file='"'"'/private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-req-build-dq2fhl0_/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-wheel-7sxl9is0
cwd: /private/var/folders/1p/jk19lf1541j7wft_wfmc69wc0000gr/T/pip-req-build-dq2fhl0_/
Complete output (10 lines):
running bdist_wheel
running build
running build_ext
building 'mrsqm' extension
creating build
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/sfa
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vipulp/opt/anaconda3/envs/timeseries/include -arch x86_64 -I/Users/vipulp/opt/anaconda3/envs/timeseries/include -arch x86_64 -I/Users/vipulp/opt/anaconda3/envs/timeseries/include/python3.7m -c mrsqm_wrapper.cpp -o build/temp.macosx-10.9-x86_64-3.7/mrsqm_wrapper.o -Wall -Ofast -g -std=c++11 -mfpmath=both -ffast-math
error: unknown FP unit 'both'
error: command 'gcc' failed with exit status 1
module 'mrsqm' has no attribute 'MrSQMClassifier' when running : clf = mrsqm.MrSQMClassifier().fit(X_train,y_train)
It seems like the pickling bug is indeed fixed!
This now allows to run save/load tests, which indicate that at least predict_proba
is not frozen from setting random_state
as it should, in the nsfa>0
case (not in the nsfa=0
case where the same tests pass).'
Theoretically, it could also be an issue with the pickling, not random_state
- then I would guess the most likely reason to be a loss of numerical precision occurring when you serialize or deserialize.
Error message below, full error log is in https://github.com/sktime/sktime/actions/runs/6000661723/job/16273328621?pr=5171
=========================== short test summary info ============================
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_save_estimators_to_file[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError:
Arrays are not almost equal to 6 decimals
Results of predict_proba differ between saved and loaded estimator MrSQM
Mismatched elements: 10 / 10 (100%)
Max absolute difference: 0.00205801
Max relative difference: 0.13931118
x: array([[0.992248, 0.007752],
[0.017839, 0.982161],
[0.023511, 0.976489],...
y: array([[0.992762, 0.007238],
[0.018737, 0.981263],
[0.024453, 0.975547],...
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_fit_idempotent[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError:
Arrays are not almost equal to 6 decimals
Mismatched elements: 10 / 10 (100%)
Max absolute difference: 0.00414175
Max relative difference: 0.19419357
x: array([[0.992142, 0.007858],
[0.017063, 0.982937],
[0.02547 , 0.97453 ],...
y: array([[0.993322, 0.006678],
[0.01615 , 0.98385 ],
[0.021328, 0.978672],...
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_persistence_via_pickle[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError:
Arrays are not almost equal to 6 decimals
Results of predict_proba differ between when pickling and not pickling, estimator MrSQM
Mismatched elements: 6 / 10 (60%)
Max absolute difference: 0.00174577
Max relative difference: 0.11265353
x: array([[0.992649, 0.007351],
[0.013751, 0.986249],
[0.019901, 0.980099],...
y: array([[0.992649, 0.007351],
[0.015497, 0.984503],
[0.019901, 0.980099],...
====== 3 failed, 69 passed, 157 skipped, 4 warnings in 102.22s (0:01:42) =======
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.