vanderschaarlab / temporai Goto Github PK
View Code? Open in Web Editor NEWTemporAI: ML-centric Toolkit for Medical Time Series
Home Page: https://www.temporai.vanderschaar-lab.com/
License: Apache License 2.0
TemporAI: ML-centric Toolkit for Medical Time Series
Home Page: https://www.temporai.vanderschaar-lab.com/
License: Apache License 2.0
A plugin with a name like plugin_temporal_minmax_scaler.py
/temporal_minmax_scaler
will break the PluginLoader.
One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.
To that end, metrics are needed for every supported problem type.
One of them is evaluating survival analysis
tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.
The metrics should be reported by each evaluation time horizon, and aggregated(mean, std).
Important metrics to cover here:
[X] c_index
: The concordance index or c-index is a metric to evaluate the predictions made by a survival algorithm. It is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs.
[X] brier_score
: The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions.
aucroc
: the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.sensitivity
: Sensitivity (true positive rate) is the probability of a positive test result, conditioned on the individual truly being positive.specificity
: Specificity (true negative rate) is the probability of a negative test result, conditioned on the individual truly being negative.PPV
: The positive predictive value(PPV) is the probability that following a positive test result, that individual will truly have that specific disease.NPV
: The negative predictive value(NPV) is the probability that following a negative test result, that individual will truly not have that specific disease.AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py
All the classes should have test coverage.
One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.
To that end, metrics are needed for every supported problem type.
One of them is evaluating classification tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.
Important metrics to cover here:
aucroc
: the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.aucprc
: The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.accuracy
: Accuracy classification score.f1_score
(micro, macro, weighted): F1 score is a harmonic mean of the precision and recall. This version uses the "micro" average: calculate metrics globally by counting the total true positives, false negatives and false positives.kappa
: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.precision
(micro, macro, weighted): Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.recall
(micro, macro, weighted): Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.mcc
: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py
It would seem that this package is "cloning" a number of core aspects of sktime, including data format, base class design, etc.
It does add some novel aspects, but there aren't too many differences at the moment.
So, why develop this in complete detachment from the pydata ecosytem?
Long-term, it will be much harder to maintain if you insist on trying to build a parallel ecosystem targeted at medical doctors.
I understand the academic sensitivities of wanting to "own", but that's not really how open source works - the more you give and let go of, the more you get back, and the more successful you will be.
For instance, why not contribute this to sktime?
Create issue templates for the repository as described here
After serializing and deserializing some models that include RNNs, the following warning is received:
UserWarning: RNN module weights are not part of single contiguous chunk of memory"
Serialization mechanism needs to be improved to fix this problem.
Preprocessing plugins, for scaling or dimensionality reduction.
Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?
Provide a bulleted or numbered list of how you might break this epic down into smaller issues.
[AutoML] Add AutoML objective evaluation for classification tasks
For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.
For classification tasks, the evaluation metrics are documented here #15
The evaluation is done using the cross-validation tester documented in #20
Given a metric, the optimization process might seek to maximize or minimize the objective.
AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/risk_estimation.py
Some useful datasets for running unit tests: https://github.com/vanderschaarlab/synthcity/tree/main/src/synthcity/utils/datasets/time_series
An improvement on top of pydantic
would be to integrate jaxtyping
, which allows for validating tensor shapes as well
jaxtyping
supports PyTorch tensors and numpy arrays.
Example
from jaxtyping import Array, Float, PyTree
# Accepts floating-point 2D arrays with matching dimensions
def matrix_multiply(x: Float[Array, "dim1 dim2"],
y: Float[Array, "dim2 dim3"]
) -> Float[Array, "dim1 dim3"]:
...
def accepts_pytree_of_ints(x: PyTree[int]):
...
def accepts_pytree_of_arrays(x: PyTree[Float[Array, "batch c1 c2"]]):
...
Introduce plugin types which can be listed separately, so that we can have clear separation between models/metrics/data sources etc. plugins.
The plugins and pipelines should be easy to serialize/deserialize.
cloudpickle
is a good starting point.
Currently we are using sphinx-material docs theme. However, sphinx-immaterial theme is similar but better supported, so we should migrate.
See if this also resolves the problem of nav bar logo not working with the actual temporai logo.
Add imputation plugins.
Provide a bulleted or numbered list of how you might break this epic down into smaller issues.
Describe the bug
Building the temporai package from pip or from github fails as catboost is required.
This is probably linked to #72 were the catboost dependency should have been removed and to catboost/catboost#2371 (comment)
Platform:
To Reproduce
pip install temporai
or
pip install "temporai @ git+https://github.com/vanderschaarlab/temporai@daa4af2e3943e5639098a4459464012c007245a3"
Expected behavior
Build should not fail, and catboost should probably not be required.
Results
temporai build fails.
Collecting catboost>=1.0.5 (from hyperimpute>=0.1.17->temporai@ git+https://github.com/vanderschaarlab/temporai@daa4af2e3943e5639098a4459464012c007245a3)
Using cached catboost-1.2.2.tar.gz (60.1 MB)
× Building wheel for catboost (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [218 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-311
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/monoforest.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/plot_helpers.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/metrics.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/version.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/text_processing.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/datasets.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/core.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/dev_utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/metrics_plotter.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/ipythonwidget.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/callbacks.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/catboost_evaluation.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_model.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_readers.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/log_config.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_splitter.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/execution_case.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_storage.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/factor_utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/evaluation_result.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_models_handler.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
running build_ext
Buildling _catboost with cmake and ninja
target_platform=darwin-x86_64. Building targets _catboost with PIC
Running "cmake /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src -B /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311 -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/toolchains/clang.toolchain --log-level=VERBOSE -DCMAKE_POSITION_INDEPENDENT_CODE=On -DCATBOOST_COMPONENTS=python-package -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0 -DHAVE_CUDA=no -DPython3_ROOT_DIR=/Users/jk1/opt/anaconda3/envs/treatment_effects"
-- The C compiler identification is AppleClang 14.0.3.14030022
-- The CXX compiler identification is AppleClang 14.0.3.14030022
-- The ASM compiler identification is Clang
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /Users/jk1/opt/anaconda3/envs/treatment_effects/bin/python3.1 (found version "3.11.5") found components: Interpreter
-- CMAKE_C_FLAGS = " -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -Wall -Wextra -Wno-parentheses -Wno-implicit-const-int-float-conversion -Wno-unknown-warning-option -pipe -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -m64 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -mcx16 -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1"
-- CMAKE_CXX_FLAGS = " -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -Wall -Wextra -Wno-parentheses -Wno-implicit-const-int-float-conversion -Wno-unknown-warning-option -pipe -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -m64 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -mcx16 -Woverloaded-virtual -Wimport-preprocessor-directive-pedantic -Wno-undefined-var-template -Wno-return-std-move -Wno-defaulted-function-deleted -Wno-pessimizing-move -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-enum-enum-conversion -Wno-deprecated-enum-float-conversion -Wno-ambiguous-reversed-operator -Wno-deprecated-volatile -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1"
-- Conan: checking conan executable
-- Conan: Found program /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/bin/conan
-- Conan: Version found Conan version 1.59.0
-- Conan executing: /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/bin/conan install /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src --remote conancenter --install-folder /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311 --build missing --env CONAN_CMAKE_GENERATOR=Ninja --settings build_type=Release --settings compiler=apple-clang --settings compiler.version=14.0 --settings compiler.libcxx=libc++ --settings compiler.cppstd=20 --conf tools.cmake.cmaketoolchain:generator=Ninja
Configuration:
[settings]
arch=x86_64
arch_build=x86_64
build_type=Release
compiler=apple-clang
compiler.cppstd=20
compiler.libcxx=libc++
compiler.version=14.0
os=Macos
os_build=Macos
[options]
[build_requires]
[env]
CONAN_CMAKE_GENERATOR=Ninja
[conf]
tools.cmake.cmaketoolchain:generator=Ninja
Version ranges solved
Version range '>=1.2.11 <2' required by 'pcre/8.45' resolved to 'zlib/1.3' in local cache
conanfile.txt: Installing package
Requirements
libiconv/1.15 from 'conancenter' - Cache
openssl/1.1.1t from 'conancenter' - Cache
Packages
libiconv/1.15:e1ef30a7ac2ff8c218173fdf49ec961a5c046a36 - Cache
openssl/1.1.1t:a319f556f93546f2dff1b70922784b70e7cba919 - Cache
Build requirements
bzip2/1.0.8 from 'conancenter' - Cache
pcre/8.45 from 'conancenter' - Cache
ragel/6.10 from 'conancenter' - Cache
swig/4.0.2 from 'conancenter' - Cache
yasm/1.3.0 from 'conancenter' - Cache
zlib/1.3 from 'conancenter' - Cache
Build requirements packages
bzip2/1.0.8:b9b85a7c8f543b96385e1da9e174853f1fb08e0c - Cache
pcre/8.45:842afe377248eac66b64b538531df2b005d57959 - Cache
ragel/6.10:801752c0480319b8e090188c566245a78e9abcf4 - Cache
swig/4.0.2:099d7b9cd06e9bd11e92b9a2ddf3b29cd986fdcb - Cache
yasm/1.3.0:801752c0480319b8e090188c566245a78e9abcf4 - Cache
zlib/1.3:a319f556f93546f2dff1b70922784b70e7cba919 - Cache
Installing (downloading, building) binaries...
bzip2/1.0.8: Already installed!
libiconv/1.15: Already installed!
openssl/1.1.1t: Already installed!
ragel/6.10: Already installed!
ragel/6.10: Appending PATH environment variable: /Users/jk1/.conan/data/ragel/6.10/_/_/package/801752c0480319b8e090188c566245a78e9abcf4/bin
yasm/1.3.0: Already installed!
yasm/1.3.0: Appending PATH environment variable: /Users/jk1/.conan/data/yasm/1.3.0/_/_/package/801752c0480319b8e090188c566245a78e9abcf4/bin
zlib/1.3: Already installed!
pcre/8.45: Already installed!
swig/4.0.2: Already installed!
swig/4.0.2: Appending PATH environment variable: /Users/jk1/.conan/data/swig/4.0.2/_/_/package/099d7b9cd06e9bd11e92b9a2ddf3b29cd986fdcb/bin
conanfile.txt: Applying build-requirement: ragel/6.10
conanfile.txt: Applying build-requirement: swig/4.0.2
conanfile.txt: Applying build-requirement: yasm/1.3.0
conanfile.txt: Applying build-requirement: pcre/8.45
conanfile.txt: Applying build-requirement: bzip2/1.0.8
conanfile.txt: Applying build-requirement: zlib/1.3
conanfile.txt: Generator cmake_find_package created Findragel.cmake
conanfile.txt: Generator cmake_find_package created FindSWIG.cmake
conanfile.txt: Generator cmake_find_package created Findyasm.cmake
conanfile.txt: Generator cmake_find_package created FindIconv.cmake
conanfile.txt: Generator cmake_find_package created FindOpenSSL.cmake
conanfile.txt: Generator cmake_find_package created FindPCRE.cmake
conanfile.txt: Generator cmake_find_package created FindBZip2.cmake
conanfile.txt: Generator cmake_find_package created FindZLIB.cmake
conanfile.txt: Generator cmake_paths created conan_paths.cmake
conanfile.txt: Generator txt created conanbuildinfo.txt
conanfile.txt: Aggregating env generators
conanfile.txt: Generated conaninfo.txt
conanfile.txt: Generated graphinfo
conanfile.txt imports(): Copied 434 '.i' files
conanfile.txt imports(): Copied 273 '.swg' files
conanfile.txt imports(): Copied 1 '.swig' file: Makefile.swig
conanfile.txt imports(): Copied 2 '.ml' files: swig.ml, swigp4.ml
conanfile.txt imports(): Copied 1 '.pl' file: Makefile.pl
conanfile.txt imports(): Copied 6 files
conanfile.txt imports(): Copied 1 '.rb' file: extconf.rb
conanfile.txt imports(): Copied 1 '.h' file: noembed.h
conanfile.txt imports(): Copied 1 '.scm' file: common.scm
conanfile.txt imports(): Copied 1 '.mli' file: swig.mli
conanfile.txt imports(): Copied 1 '.hpp' file: octheaders.hpp
CMake Error at /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find Python3 (missing: Python3_INCLUDE_DIRS Python3_LIBRARIES
Development Development.Module Development.Embed)
Call Stack (most recent call first):
/usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
/usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPython/Support.cmake:3166 (find_package_handle_standard_args)
/usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPython3.cmake:485 (include)
catboost/python-package/catboost/CMakeLists.darwin-x86_64.txt:9 (find_package)
catboost/python-package/catboost/CMakeLists.txt:20 (include)
-- Configuring incomplete, errors occurred!
See also "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311/CMakeFiles/CMakeOutput.log".
See also "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311/CMakeFiles/CMakeError.log".
Traceback (most recent call last):
File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 434, in build_wheel
return self._build_with_temp_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 419, in _build_with_temp_dir
self.run_setup()
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 507, in run_setup
super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in run_setup
exec(code, locals())
File "<string>", line 731, in <module>
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
return distutils.core.setup(**attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
^^^^^^^^^^^^^^^^^^
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "<string>", line 397, in run
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 364, in run
self.run_command("build")
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "<string>", line 332, in run
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "<string>", line 444, in run
File "<string>", line 462, in build_with_cmake_and_ninja
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/build_native.py", line 517, in build
cmd_runner.run(cmake_cmd, env=build_environ)
File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/build_native.py", line 164, in run
subprocess.run(cmd, check=True, **subprocess_run_kwargs)
File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['cmake', '/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src', '-B', '/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_TOOLCHAIN_FILE=/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/toolchains/clang.toolchain', '--log-level=VERBOSE', '-DCMAKE_POSITION_INDEPENDENT_CODE=On', '-DCATBOOST_COMPONENTS=python-package', '-DCMAKE_OSX_DEPLOYMENT_TARGET=11.0', '-DHAVE_CUDA=no', '-DPython3_ROOT_DIR=/Users/jk1/opt/anaconda3/envs/treatment_effects']' returned non-zero exit status 1.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for catboost
Successfully built temporai
Failed to build catboost
ERROR: Could not build wheels for catboost, which is required to install pyproject.toml-based projects
</details>
There are still a few places where docstrings are needed. This enhancement would get docstring coverage to 100%.
For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.
For classification tasks, the evaluation metrics are documented here #11
The benchmark is done using the cross-validation tester documented in #20.
Given a metric, the optimization process might seek to maximize or minimize the objective.
AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/classifiers.py
Pydantic 2.0 is now the current version, so changes need to be made to use it rather then 1.0.
Given a set of K
optimal pipelines selected by the AutoML logic given an objective, the next step is to evaluate ensembles of top of the candidate pipelines.
For the weighted ensemble, a separate AutoML search can be executed, to evaluate various weights.
The process benchmarks all the supported ensemble setups(weighted, stacked, voting etc), and returns the optimal solution.
depends on #8, #7, #6, #5, #13
AP references:
Before releasing, the library should be tested on the matrix {MacOS, Windows, Linux} x {ython {3.7, 3.8, 3.9, 3.10} for compatiblity.
On each test scenario, all the unit tests should pass.
Reference workflow: https://github.com/vanderschaarlab/autoprognosis/blob/main/.github/workflows/test.yml
Additional notes:
requirements-dev.txt
and put that as testing mode in setup.cfg
For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.
For classification tasks, the evaluation metrics are documented here #10
The benchmark is done using the cross-validation tester documented in #20
Given a metric, the optimization process might seek to maximize or minimize the objective.
AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/regression.py
Hi I found that the time_to_event models and scaling methods do not work with categorical data.
To reproduce it, one can go to the tutorials/data/tutorial1_data_format.ipynb, use the data example (comment one line of event data) and run the time to event models.
I have only tried time to event models and scaling methods, other methods might not work on categorical data as well. According to the tutorials for data format, pandas.Catergorical is supported as column values?
Thanks for looking into it!
Wenjuan
Given a set of pipelines - or just estimators, users should be able to create ensembles.
Popular ensemble techniques
Reference code in AutoPrognosis: https://github.com/vanderschaarlab/autoprognosis/tree/main/src/autoprognosis/plugins/ensemble
More about here: https://github.com/yzhao062/combo
The library should offer the possibility to execute multiple plugins in sequence, and sample hyperparameters for all of them.
Sampling hyperparameters require a class, so that you don't instantiate an useless object.
To that end, you can create meta classes in python by inheriting the type
class directly.
The pipeline wrapper should offer the following interface:
fit
- train the pipelinepredict
- transform(for preprocessing plugins in the pipeline) and predicthyperarameters_space
/sample_hyperparameters
which should call the sampling logic from each plugin.Reference implementation https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/plugins/pipeline/__init__.py
The feature should be covered by tests, covering:
depends on https://github.com/vanderschaarlab/temporai-priv/issues/5
Add prediction models
Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?
Provide a bulleted or numbered list of how you might break this epic down into smaller issues.
Add basic bfill/ffil imputation methods .
Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ffill.html
The model should be covered by tests.
Every plugin/pipeline or API call should support fixing the random seed.
There are methods for setting a global random seed. For example
# stdlib
import random
# third party
import numpy as np
import torch
def enable_reproducible_results(random_state: int = 0) -> None:
np.random.seed(random_state)
torch.manual_seed(random_state)
random.seed(random_state)
Example from catenets, adapted here:
https://github.com/AliciaCurth/CATENets/blob/main/catenets/experiment_utils/torch_metrics.py
Pydantic imposes a limit on the number of temporai objects that can be instantiated.
Example: In the test_nn_regressor.py, the following snippet will crash
def test_hyperparam_sample():
for repeat in range(10000): # pylint: disable=unused-variable
args = plugin._cls.sample_hyperparameters() # pylint: disable=no-member, protected-access
plugin(**args)
with the error
> ???
E pydantic.error_wrappers.ValidationError: 1 validation error for _InitArgsValidator
E __root__
E Model parameters could not be validated as defined by `EmptyParamsDefinition`, cause:
E ---------------
E RecursionError:
E maximum recursion depth exceeded
E ---------------
E (type=value_error)
pydantic/main.py:342: ValidationError
Pydantic should not limit the functionality of the library.
nbsphinx extension is designed for integrating notebooks into documentation. Hence we should use that instead of the custom code in docs/pre_build.py
. Ideally should also find a way of having the link to colab at the top of each tutorial in the docs.
Dynamic Deephit prediciton model.
Reference implementation in PyTorch: https://github.com/vanderschaarlab/synthcity/blob/main/src/synthcity/plugins/core/models/time_series_survival/ts_surv_dynamic_deephit.py
The library should be automatically uploaded to PyPI on release.
Reference release script https://github.com/vanderschaarlab/autoprognosis/blob/main/.github/workflows/release.yml
sphinx-material
) or autodoc
./mnt/data-fourtb/Dropbox/Programming/wsl_repos/_vds/temporai/docs/../src/tempor/data/pandera_utils.py:docstring of tempor.data.pandera_utils:1: WARNING: Inline interpreted text or phrase reference start-string without end-string.
...
Investigate and fix these.
For AutoML search, is important to be able to sample hyperparameters, and to recreate the pipeline from those hyperparameters.
AutoPrognosis implements the following strategy:
[imputation plugin 1, 2, 3, ...] + [preprocessing plugin 1, 2, 3, ...] + hyperparam_space
. This simplifies the sampling process.AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/core/selector.py
Blocked by: #27
The library should offer methods for evaluating predictive models/pipelines.
For each problem type, there can be different relevant metrics, as described in the linked tasks.
The evaluation should be done using KFold(regression)/StratifiedKFold(classification, survival analsysis), and predefined random seed.
AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py
Example implementation for time series survival
https://github.com/vanderschaarlab/synthcity/blob/main/src/synthcity/plugins/core/models/time_series_survival/benchmarks.py#L142
One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.
To that end, metrics are needed for every supported problem type.
One of them is evaluating regression
tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.
Important metrics to cover here:
r2"
R^2(coefficient of determination) regression score function.mse
: Mean squared error regression loss.mae
: Mean absolute error regression loss.AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py
Hi,
I have errors when trying to run dynamic deephit with MethodSeeker with PBC dataset:
To reproduce the errors, please do the following:
from tempor.utils.dataloaders import PBCDataLoader
dataset = PBCDataLoader(random_state=42).load()
# Provide a custom hyperparameter space to search for each type of model.
hp_space = {
"dynamic_deephit": [
IntegerParams(name="n_iter", low=200, high=200),
IntegerParams(name="batch_size", low=30, high=100),
CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
],
"ts_xgb": [
IntegerParams(name="n_iter", low=200, high=200),
IntegerParams(name="batch_size", low=100, high=100),
CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
],
}
# Initialize a `MethodSeeker` and provide `override_hp_space`.
seeker = MethodSeeker(
study_name="my_automl_study",
task_type="time_to_event",
estimator_names=[
"dynamic_deephit",
"ts_xgb",
],
metric="c_index",
dataset=dataset,
horizon=[1,5,9],
return_top_k=2,
num_iter=3, # For the sake of speed of this example, only 3 AutoML iterations.
tuner_type="bayesian",
# Override hyperparameter space:
override_hp_space=hp_space,
)
best_methods, best_scores = seeker.search()
Thanks for the help!
Best wishes,
Wenjuan
Hi,
I created a simple dataset according to the example in data format tutorial.
time_series_df = pd.DataFrame(
{
"sample_idx": ["sample_0", "sample_0", "sample_0", "sample_1", "sample_1", "sample_1", "sample_2", "sample_2", "sample_2"],
"time_idx": [1, 2, 3, 1, 2, 3, 1, 2, 3],
"t_feat_0": [11, 12, 13, 14, 21, 22, 31, 28, 26],
"t_feat_1": [1.1, 1.2, 1.3, 1, 2.1, 2.2, 3.1, 2.3, 2.0],
}
)
# Set the 2-level index:
time_series_df.set_index(keys=["sample_idx", "time_idx"], drop=True, inplace=True)
# Create a static data dataframe.
static_df = pd.DataFrame(
{
"s_feat_0": [100, 200, 300],
"s_feat_1": [-1.1, -1, -1.3],
},
index=["sample_0", "sample_1", "sample_2"],
)
# Create an event dataframe.
event_df = pd.DataFrame(
{
"e_feat_0": [(10, True), (12, False), (13, True)],
},
index=["sample_0", "sample_1", "sample_2"],
)
# Create a dataset of time-to-event analysis task:
data = TimeToEventAnalysisDataset(
time_series=time_series_df,
static=static_df,
targets=event_df,
)
But the dynamic_deephit plugin does not work on the above dataset with the following errors.
It seems that the dataset was not fit-ready. I checked that I have all the components for time to event dataset. Could you help with this please?
Thanks very much,
Wenjuan
Add the landmarking prediction model
Reference
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.