vanderschaarlab / temporai Goto Github PK

View Code? Open in Web Editor NEW

89.0 7.0 19.0 4.64 MB

TemporAI: ML-centric Toolkit for Medical Time Series

Home Page: https://www.temporai.vanderschaar-lab.com/

License: Apache License 2.0

Python 68.46% Jupyter Notebook 31.54%

machine-learning medicine time-series automl

temporai's Issues

Plugins cannot contain the `tempor` substring in their name

Description

A plugin with a name like plugin_temporal_minmax_scaler.py/temporal_minmax_scaler will break the PluginLoader.

[Enhancement] Evaluation: Add more metrics for evaluating survival analysis tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating survival analysis tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

The metrics should be reported by each evaluation time horizon, and aggregated(mean, std).

Important metrics to cover here:
[X] c_index : The concordance index or c-index is a metric to evaluate the predictions made by a survival algorithm. It is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs.
[X] brier_score: The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions.

aucroc : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
sensitivity: Sensitivity (true positive rate) is the probability of a positive test result, conditioned on the individual truly being positive.
specificity: Specificity (true negative rate) is the probability of a negative test result, conditioned on the individual truly being negative.
PPV: The positive predictive value(PPV) is the probability that following a positive test result, that individual will truly have that specific disease.
NPV: The negative predictive value(NPV) is the probability that following a negative test result, that individual will truly not have that specific disease.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

Enhance `DataSet` with methods for data splitting

Fix the problem of accessing static/class methods on plugins

[Epic] Testing

Description

All the classes should have test coverage.

Type of Test

Unit test (e.g. checking a loop, method, or function is working as intended)
Integration test (e.g. checking if a certain group or set of functionality is working as intended)
Regression test (e.g. checking if by adding or removing a module of code allows other systems to continue to function as intended)
Stress test (e.g. checking to see how well a system performs under various situations, including heavy usage)
Performance test (e.g. checking to see how efficient a system is as performing the intended task)
Other...

[Evaluation] Add metrics for evaluating classification tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating classification tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

Important metrics to cover here:

aucroc : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
aucprc : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.
accuracy : Accuracy classification score.
f1_score(micro, macro, weighted): F1 score is a harmonic mean of the precision and recall. This version uses the "micro" average: calculate metrics globally by counting the total true positives, false negatives and false positives.
kappa: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.
precision(micro, macro, weighted): Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.
recall(micro, macro, weighted): Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.
mcc: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

why not contribute this to sktime?

It would seem that this package is "cloning" a number of core aspects of sktime, including data format, base class design, etc.

It does add some novel aspects, but there aren't too many differences at the moment.
So, why develop this in complete detachment from the pydata ecosytem?

Long-term, it will be much harder to maintain if you insist on trying to build a parallel ecosystem targeted at medical doctors.

I understand the academic sensitivities of wanting to "own", but that's not really how open source works - the more you give and let go of, the more you get back, and the more successful you will be.

For instance, why not contribute this to sktime?

Add tutorials(notebooks + Colab links)

Description

The library should have a tutorial for each major feature.
The tutorials should be notebooks, and should be also deployed on Colab, for easier use.

Fix the problem of accessing static/class methods on plugins

Create issue templates

Create issue templates for the repository as described here

[Epic] Add more datasets

[Enhancement] Fix RNN contiguous memory warning (serialization)

After serializing and deserializing some models that include RNNs, the following warning is received:

UserWarning: RNN module weights are not part of single contiguous chunk of memory"

Serialization mechanism needs to be improved to fix this problem.

Embrace mypy strict typing

[Epic] Preprocessing plugins

Description

Preprocessing plugins, for scaling or dimensionality reduction.

Why?

Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

drop constant features - TODO
handle multicollinearity - TODO
drop low variance features - TODO
encode data

[AutoML] Add AutoML objective evaluation for survival analysis tasks

Feature Description

[AutoML] Add AutoML objective evaluation for classification tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #15
The evaluation is done using the cross-validation tester documented in #20
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/risk_estimation.py

[Testing] Add testing datasets

Some useful datasets for running unit tests: https://github.com/vanderschaarlab/synthcity/tree/main/src/synthcity/utils/datasets/time_series

[Enhancement] Integrate jaxtyping for advanced parameter validation #120

Description

An improvement on top of pydantic would be to integrate jaxtyping, which allows for validating tensor shapes as well
jaxtyping supports PyTorch tensors and numpy arrays.

Example

from jaxtyping import Array, Float, PyTree

# Accepts floating-point 2D arrays with matching dimensions
def matrix_multiply(x: Float[Array, "dim1 dim2"],
                    y: Float[Array, "dim2 dim3"]
                  ) -> Float[Array, "dim1 dim3"]:
    ...

def accepts_pytree_of_ints(x: PyTree[int]):
    ...

def accepts_pytree_of_arrays(x: PyTree[Float[Array, "batch c1 c2"]]):
    ...

https://github.com/google/jaxtyping

[Enhancement] Introduce plugin types

Introduce plugin types which can be listed separately, so that we can have clear separation between models/metrics/data sources etc. plugins.

Serialization support for plugins and pipelines

Description

The plugins and pipelines should be easy to serialize/deserialize.

cloudpickle is a good starting point.

Use sphinx-immaterial theme

Currently we are using sphinx-material docs theme. However, sphinx-immaterial theme is similar but better supported, so we should migrate.

See if this also resolves the problem of nav bar logo not working with the actual temporai logo.

[Epic] Imputation models

Description

Add imputation plugins.

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

bfill/ffill
...

[Bug] Catboost is required, but fails to build (macOS 13.5)

Describe the bug
Building the temporai package from pip or from github fails as catboost is required.
This is probably linked to #72 were the catboost dependency should have been removed and to catboost/catboost#2371 (comment)

Platform:

macOS 13.5
python 3.11.5

To Reproduce
pip install temporai
or
pip install "temporai @ git+https://github.com/vanderschaarlab/temporai@daa4af2e3943e5639098a4459464012c007245a3"

Expected behavior
Build should not fail, and catboost should probably not be required.

Results
temporai build fails.

Collecting catboost>=1.0.5 (from hyperimpute>=0.1.17->temporai@ git+https://github.com/vanderschaarlab/temporai@daa4af2e3943e5639098a4459464012c007245a3)
  Using cached catboost-1.2.2.tar.gz (60.1 MB)

``` Building wheel for catboost (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for catboost (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [218 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-311
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/monoforest.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/plot_helpers.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/metrics.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/version.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/text_processing.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/datasets.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/core.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/dev_utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/metrics_plotter.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/ipythonwidget.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/callbacks.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/catboost_evaluation.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_model.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_readers.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/log_config.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_splitter.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/execution_case.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_storage.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/factor_utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/evaluation_result.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_models_handler.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
running build_ext
Buildling _catboost with cmake and ninja
target_platform=darwin-x86_64. Building targets _catboost with PIC
Running "cmake /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src -B /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311 -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/toolchains/clang.toolchain --log-level=VERBOSE -DCMAKE_POSITION_INDEPENDENT_CODE=On -DCATBOOST_COMPONENTS=python-package -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0 -DHAVE_CUDA=no -DPython3_ROOT_DIR=/Users/jk1/opt/anaconda3/envs/treatment_effects"
-- The C compiler identification is AppleClang 14.0.3.14030022
-- The CXX compiler identification is AppleClang 14.0.3.14030022
-- The ASM compiler identification is Clang
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /Users/jk1/opt/anaconda3/envs/treatment_effects/bin/python3.1 (found version "3.11.5") found components: Interpreter
-- CMAKE_C_FLAGS = " -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -Wall -Wextra -Wno-parentheses -Wno-implicit-const-int-float-conversion -Wno-unknown-warning-option -pipe -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -m64 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -mcx16 -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1"
-- CMAKE_CXX_FLAGS = " -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -Wall -Wextra -Wno-parentheses -Wno-implicit-const-int-float-conversion -Wno-unknown-warning-option -pipe -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -m64 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -mcx16 -Woverloaded-virtual -Wimport-preprocessor-directive-pedantic -Wno-undefined-var-template -Wno-return-std-move -Wno-defaulted-function-deleted -Wno-pessimizing-move -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-enum-enum-conversion -Wno-deprecated-enum-float-conversion -Wno-ambiguous-reversed-operator -Wno-deprecated-volatile -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1"
-- Conan: checking conan executable
-- Conan: Found program /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/bin/conan
-- Conan: Version found Conan version 1.59.0
-- Conan executing: /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/bin/conan install /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src --remote conancenter --install-folder /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311 --build missing --env CONAN_CMAKE_GENERATOR=Ninja --settings build_type=Release --settings compiler=apple-clang --settings compiler.version=14.0 --settings compiler.libcxx=libc++ --settings compiler.cppstd=20 --conf tools.cmake.cmaketoolchain:generator=Ninja
Configuration:
[settings]
arch=x86_64
arch_build=x86_64
build_type=Release
compiler=apple-clang
compiler.cppstd=20
compiler.libcxx=libc++
compiler.version=14.0
os=Macos
os_build=Macos
[options]
[build_requires]
[env]
CONAN_CMAKE_GENERATOR=Ninja
[conf]
tools.cmake.cmaketoolchain:generator=Ninja

  Version ranges solved
      Version range '>=1.2.11 <2' required by 'pcre/8.45' resolved to 'zlib/1.3' in local cache
  
  conanfile.txt: Installing package
  Requirements
      libiconv/1.15 from 'conancenter' - Cache
      openssl/1.1.1t from 'conancenter' - Cache
  Packages
      libiconv/1.15:e1ef30a7ac2ff8c218173fdf49ec961a5c046a36 - Cache
      openssl/1.1.1t:a319f556f93546f2dff1b70922784b70e7cba919 - Cache
  Build requirements
      bzip2/1.0.8 from 'conancenter' - Cache
      pcre/8.45 from 'conancenter' - Cache
      ragel/6.10 from 'conancenter' - Cache
      swig/4.0.2 from 'conancenter' - Cache
      yasm/1.3.0 from 'conancenter' - Cache
      zlib/1.3 from 'conancenter' - Cache
  Build requirements packages
      bzip2/1.0.8:b9b85a7c8f543b96385e1da9e174853f1fb08e0c - Cache
      pcre/8.45:842afe377248eac66b64b538531df2b005d57959 - Cache
      ragel/6.10:801752c0480319b8e090188c566245a78e9abcf4 - Cache
      swig/4.0.2:099d7b9cd06e9bd11e92b9a2ddf3b29cd986fdcb - Cache
      yasm/1.3.0:801752c0480319b8e090188c566245a78e9abcf4 - Cache
      zlib/1.3:a319f556f93546f2dff1b70922784b70e7cba919 - Cache
  
  Installing (downloading, building) binaries...
  bzip2/1.0.8: Already installed!
  libiconv/1.15: Already installed!
  openssl/1.1.1t: Already installed!
  ragel/6.10: Already installed!
  ragel/6.10: Appending PATH environment variable: /Users/jk1/.conan/data/ragel/6.10/_/_/package/801752c0480319b8e090188c566245a78e9abcf4/bin
  yasm/1.3.0: Already installed!
  yasm/1.3.0: Appending PATH environment variable: /Users/jk1/.conan/data/yasm/1.3.0/_/_/package/801752c0480319b8e090188c566245a78e9abcf4/bin
  zlib/1.3: Already installed!
  pcre/8.45: Already installed!
  swig/4.0.2: Already installed!
  swig/4.0.2: Appending PATH environment variable: /Users/jk1/.conan/data/swig/4.0.2/_/_/package/099d7b9cd06e9bd11e92b9a2ddf3b29cd986fdcb/bin
  conanfile.txt: Applying build-requirement: ragel/6.10
  conanfile.txt: Applying build-requirement: swig/4.0.2
  conanfile.txt: Applying build-requirement: yasm/1.3.0
  conanfile.txt: Applying build-requirement: pcre/8.45
  conanfile.txt: Applying build-requirement: bzip2/1.0.8
  conanfile.txt: Applying build-requirement: zlib/1.3
  conanfile.txt: Generator cmake_find_package created Findragel.cmake
  conanfile.txt: Generator cmake_find_package created FindSWIG.cmake
  conanfile.txt: Generator cmake_find_package created Findyasm.cmake
  conanfile.txt: Generator cmake_find_package created FindIconv.cmake
  conanfile.txt: Generator cmake_find_package created FindOpenSSL.cmake
  conanfile.txt: Generator cmake_find_package created FindPCRE.cmake
  conanfile.txt: Generator cmake_find_package created FindBZip2.cmake
  conanfile.txt: Generator cmake_find_package created FindZLIB.cmake
  conanfile.txt: Generator cmake_paths created conan_paths.cmake
  conanfile.txt: Generator txt created conanbuildinfo.txt
  conanfile.txt: Aggregating env generators
  conanfile.txt: Generated conaninfo.txt
  conanfile.txt: Generated graphinfo
  conanfile.txt imports(): Copied 434 '.i' files
  conanfile.txt imports(): Copied 273 '.swg' files
  conanfile.txt imports(): Copied 1 '.swig' file: Makefile.swig
  conanfile.txt imports(): Copied 2 '.ml' files: swig.ml, swigp4.ml
  conanfile.txt imports(): Copied 1 '.pl' file: Makefile.pl
  conanfile.txt imports(): Copied 6 files
  conanfile.txt imports(): Copied 1 '.rb' file: extconf.rb
  conanfile.txt imports(): Copied 1 '.h' file: noembed.h
  conanfile.txt imports(): Copied 1 '.scm' file: common.scm
  conanfile.txt imports(): Copied 1 '.mli' file: swig.mli
  conanfile.txt imports(): Copied 1 '.hpp' file: octheaders.hpp
  CMake Error at /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
    Could NOT find Python3 (missing: Python3_INCLUDE_DIRS Python3_LIBRARIES
    Development Development.Module Development.Embed)
  Call Stack (most recent call first):
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPython/Support.cmake:3166 (find_package_handle_standard_args)
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPython3.cmake:485 (include)
    catboost/python-package/catboost/CMakeLists.darwin-x86_64.txt:9 (find_package)
    catboost/python-package/catboost/CMakeLists.txt:20 (include)
  
  
  -- Configuring incomplete, errors occurred!
  See also "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311/CMakeFiles/CMakeOutput.log".
  See also "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311/CMakeFiles/CMakeError.log".
  Traceback (most recent call last):
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
      return _build_backend().build_wheel(wheel_directory, config_settings,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 434, in build_wheel
      return self._build_with_temp_dir(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 419, in _build_with_temp_dir
      self.run_setup()
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 507, in run_setup
      super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in run_setup
      exec(code, locals())
    File "<string>", line 731, in <module>
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
             ^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 397, in run
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 364, in run
      self.run_command("build")
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 332, in run
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 131, in run
      self.run_command(cmd_name)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 444, in run
    File "<string>", line 462, in build_with_cmake_and_ninja
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/build_native.py", line 517, in build
      cmd_runner.run(cmake_cmd, env=build_environ)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/build_native.py", line 164, in run
      subprocess.run(cmd, check=True, **subprocess_run_kwargs)
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/subprocess.py", line 571, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['cmake', '/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src', '-B', '/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_TOOLCHAIN_FILE=/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/toolchains/clang.toolchain', '--log-level=VERBOSE', '-DCMAKE_POSITION_INDEPENDENT_CODE=On', '-DCATBOOST_COMPONENTS=python-package', '-DCMAKE_OSX_DEPLOYMENT_TARGET=11.0', '-DHAVE_CUDA=no', '-DPython3_ROOT_DIR=/Users/jk1/opt/anaconda3/envs/treatment_effects']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for catboost
Successfully built temporai
Failed to build catboost
ERROR: Could not build wheels for catboost, which is required to install pyproject.toml-based projects

</details>

[Enhancement] Achieve full docstring coverage

There are still a few places where docstrings are needed. This enhancement would get docstring coverage to 100%.

[Docs] Add doscstrings for all the important classes

[AutoML] Add AutoML objective evaluation for classification tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #11
The benchmark is done using the cross-validation tester documented in #20.
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/classifiers.py

Upgrade to pydantic 2

Pydantic 2.0 is now the current version, so changes need to be made to use it rather then 1.0.

Handle SyncTwin predict_counterfactuals

[Enhancement] Add AutoML objective evaluation for ensembles

Feature Description

Given a set of K optimal pipelines selected by the AutoML logic given an objective, the next step is to evaluate ensembles of top of the candidate pipelines.

For the weighted ensemble, a separate AutoML search can be executed, to evaluate various weights.
The process benchmarks all the supported ensemble setups(weighted, stacked, voting etc), and returns the optimal solution.

depends on #8, #7, #6, #5, #13

AP references:

[CI] Github workflows

Description

Before releasing, the library should be tested on the matrix {MacOS, Windows, Linux} x {ython {3.7, 3.8, 3.9, 3.10} for compatiblity.

On each test scenario, all the unit tests should pass.

Reference workflow: https://github.com/vanderschaarlab/autoprognosis/blob/main/.github/workflows/test.yml

Additional notes:

Add a notebook testing workflow a la https://github.com/vanderschaarlab/synthcity/blob/main/tests/nb_eval.py
Include doctests in workflow
As part of this, get rid of requirements-dev.txt and put that as testing mode in setup.cfg

[AutoML] Add AutoML objective evaluation for regression tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #10
The benchmark is done using the cross-validation tester documented in #20
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/regression.py

time_to_event models and scaling methods do not work with categorical data

Hi I found that the time_to_event models and scaling methods do not work with categorical data.

To reproduce it, one can go to the tutorials/data/tutorial1_data_format.ipynb, use the data example (comment one line of event data) and run the time to event models.

I have only tried time to event models and scaling methods, other methods might not work on categorical data as well. According to the tutorials for data format, pandas.Catergorical is supported as column values?

Thanks for looking into it!
Wenjuan

[Enhancement] Ensemble support

Feature Description

Given a set of pipelines - or just estimators, users should be able to create ensembles.

Popular ensemble techniques

WeightedEnsemble: average across all scores/prediction results, maybe with weights
Stacking (meta ensembling): use a meta learner to learn the base classifier results
Majority Vote Ensemble
DCS: Dynamic Classifier Selection: Combination of multiple classifiers using local accuracy estimates
DES: Dynamic Ensemble Selection: From dynamic classifier selection to dynamic ensemble selection

Reference code in AutoPrognosis: https://github.com/vanderschaarlab/autoprognosis/tree/main/src/autoprognosis/plugins/ensemble

More about here: https://github.com/yzhao062/combo

[Feat] Add pipeline logic

Feature Description

The library should offer the possibility to execute multiple plugins in sequence, and sample hyperparameters for all of them.

Sampling hyperparameters require a class, so that you don't instantiate an useless object.
To that end, you can create meta classes in python by inheriting the type class directly.

The pipeline wrapper should offer the following interface:

fit - train the pipeline
predict - transform(for preprocessing plugins in the pipeline) and predict
hyperarameters_space/sample_hyperparameters which should call the sampling logic from each plugin.

Reference implementation https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/plugins/pipeline/__init__.py

The feature should be covered by tests, covering:

a standard pipeline train and predict.
hyperparameter sampling and pipeline instantiation.
serialization

depends on https://github.com/vanderschaarlab/temporai-priv/issues/5

[Epic] Prediction models

Description

Add prediction models

Why?

Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

[Feat] bfill/ffill imputation methods

Feature Description

Add basic bfill/ffil imputation methods .

Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ffill.html

The model should be covered by tests.

[Feat] Reproducibility

Feature Description

Every plugin/pipeline or API call should support fixing the random seed.

There are methods for setting a global random seed. For example

# stdlib
import random

# third party
import numpy as np
import torch


def enable_reproducible_results(random_state: int = 0) -> None:
    np.random.seed(random_state)
    torch.manual_seed(random_state)
    random.seed(random_state)

[Enhancement] Evaluation: Add more metrics for benchmarking - treatment effects

Example from catenets, adapted here:

https://github.com/AliciaCurth/CATENets/blob/main/catenets/experiment_utils/torch_metrics.py

nn_regressor: Pydantic crash

Description

Pydantic imposes a limit on the number of temporai objects that can be instantiated.

Example: In the test_nn_regressor.py, the following snippet will crash

 def test_hyperparam_sample():
     for repeat in range(10000):  # pylint: disable=unused-variable
         args = plugin._cls.sample_hyperparameters()  # pylint: disable=no-member, protected-access
         plugin(**args)

with the error

>   ???
E   pydantic.error_wrappers.ValidationError: 1 validation error for _InitArgsValidator
E   __root__
E     Model parameters could not be validated as defined by `EmptyParamsDefinition`, cause: 
E   ---------------
E   RecursionError:
E   maximum recursion depth exceeded
E   ---------------
E    (type=value_error)

pydantic/main.py:342: ValidationError

Expected behaviour

Pydantic should not limit the functionality of the library.

Add clairvoyance2 dataset conversions

Fix test failure due to `cmaes` import failure with optuna 3.4+

Use nbsphinx for tutorials / user guide in docs

nbsphinx extension is designed for integrating notebooks into documentation. Hence we should use that instead of the custom code in docs/pre_build.py. Ideally should also find a way of having the link to colab at the top of each tutorial in the docs.

[Feat] Prediction models: Add Dynamic DeepHit

Feature Description

Dynamic Deephit prediciton model.

Reference implementation in PyTorch: https://github.com/vanderschaarlab/synthcity/blob/main/src/synthcity/plugins/core/models/time_series_survival/ts_surv_dynamic_deephit.py

Add PyPI release workflows

Feature Description

The library should be automatically uploaded to PyPI on release.

Reference release script https://github.com/vanderschaarlab/autoprognosis/blob/main/.github/workflows/release.yml

Investigate docs API generation

The way module reference (API) is rendered in docs is not great - need to investigate.

Currently like so: https://temporai.readthedocs.io/en/latest/api/modules.html
Should be more like: https://www.statsmodels.org/stable/api.html#
Could be related to the theme (sphinx-material) or autodoc.

Warnings raised on documentation building like

/mnt/data-fourtb/Dropbox/Programming/wsl_repos/_vds/temporai/docs/../src/tempor/data/pandera_utils.py:docstring of tempor.data.pandera_utils:1: WARNING: Inline interpreted text or phrase reference start-string without end-string.
...

Investigate and fix these.

[AutoML] Create pipeline from hyperparameters

Feature Description

For AutoML search, is important to be able to sample hyperparameters, and to recreate the pipeline from those hyperparameters.

AutoPrognosis implements the following strategy:

For each search task, the user can select imputation, preprocessing and prediction plugins.
For each pipeline, the prediction plugin "drives" the whole pipeline selection.
To that end, we artificially extend the predictor hyperparameters to include the imputation and preprocessing plugins. In other words, the user samples from predictor's [imputation plugin 1, 2, 3, ...] + [preprocessing plugin 1, 2, 3, ...] + hyperparam_space. This simplifies the sampling process.
Given a sampled preprocessing/imputed plugin, and a set of hyperparameters, the user must be able to create the complete pipeline

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/core/selector.py

Blocked by: #27

[Feat] Benchmarking tools

Feature Description

The library should offer methods for evaluating predictive models/pipelines.

For each problem type, there can be different relevant metrics, as described in the linked tasks.

The evaluation should be done using KFold(regression)/StratifiedKFold(classification, survival analsysis), and predefined random seed.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py
Example implementation for time series survival
https://github.com/vanderschaarlab/synthcity/blob/main/src/synthcity/plugins/core/models/time_series_survival/benchmarks.py#L142

blocked by #9
blocked by #10
blocked by #11

[Evaluation] Add metrics for evaluating regression tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating regression tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

Important metrics to cover here:

r2" R^2(coefficient of determination) regression score function.
mse: Mean squared error regression loss.
mae: Mean absolute error regression loss.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

Issues with MethodSeeker and Dynamic DeepHit when running with PBC dataset

Hi,
I have errors when trying to run dynamic deephit with MethodSeeker with PBC dataset:
To reproduce the errors, please do the following:

from tempor.utils.dataloaders import PBCDataLoader
dataset = PBCDataLoader(random_state=42).load()

# Provide a custom hyperparameter space to search for each type of model.

hp_space = {
    "dynamic_deephit": [
        IntegerParams(name="n_iter", low=200, high=200),
        IntegerParams(name="batch_size", low=30, high=100),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "ts_xgb": [
        IntegerParams(name="n_iter", low=200, high=200),
        IntegerParams(name="batch_size", low=100, high=100),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
}

# Initialize a `MethodSeeker` and provide `override_hp_space`.
seeker = MethodSeeker(
    study_name="my_automl_study",
    task_type="time_to_event",
    estimator_names=[
        "dynamic_deephit",
        "ts_xgb",
    ],
    metric="c_index",   
    dataset=dataset,
    horizon=[1,5,9],
    return_top_k=2,
    num_iter=3,  # For the sake of speed of this example, only 3 AutoML iterations.
    tuner_type="bayesian",
    # Override hyperparameter space:
    override_hp_space=hp_space,
)

best_methods, best_scores = seeker.search()

The error is as follows:

Thanks for the help!

Best wishes,
Wenjuan

dynamic deephit plugin does not work on a simple dataset

Hi,

I created a simple dataset according to the example in data format tutorial.

time_series_df = pd.DataFrame(
    {
        "sample_idx": ["sample_0", "sample_0", "sample_0", "sample_1", "sample_1", "sample_1", "sample_2", "sample_2", "sample_2"],
        "time_idx": [1, 2, 3, 1, 2, 3, 1, 2, 3],
        "t_feat_0": [11, 12, 13, 14, 21, 22, 31, 28, 26],
        "t_feat_1": [1.1, 1.2, 1.3, 1, 2.1, 2.2, 3.1, 2.3, 2.0],
    }
)

# Set the 2-level index:
time_series_df.set_index(keys=["sample_idx", "time_idx"], drop=True, inplace=True)

# Create a static data dataframe.
static_df = pd.DataFrame(
    {
        "s_feat_0": [100, 200, 300],
        "s_feat_1": [-1.1, -1, -1.3],
    },
    index=["sample_0", "sample_1", "sample_2"],
)

# Create an event dataframe.

event_df = pd.DataFrame(
    {
        "e_feat_0": [(10, True), (12, False), (13, True)],
    },
    index=["sample_0", "sample_1", "sample_2"],
)

# Create a dataset of time-to-event analysis task:
data = TimeToEventAnalysisDataset(
    time_series=time_series_df,
    static=static_df,
    targets=event_df,
)

But the dynamic_deephit plugin does not work on the above dataset with the following errors.

It seems that the dataset was not fit-ready. I checked that I have all the components for time to event dataset. Could you help with this please?

Thanks very much,
Wenjuan

[Enhancement] Prediction models: Landmarking

Feature Description

Add the landmarking prediction model

Reference

vanderschaarlab / temporai Goto Github PK

temporai's Issues

Description

Feature Description

Description

Type of Test

Feature Description

Description

Description

Why?

Breakdown

Feature Description

Feature Description

Description

Description

Description

Breakdown

Feature Description

Feature Description

Description

Feature Description

Feature Description

Feature Description

Description

Why?

Breakdown

Feature Description

Feature Description

Description

Expected behaviour

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Recommend Projects

Recommend Topics

Recommend Org

Jobs