GithubHelp home page GithubHelp logo

vanderschaarlab / temporai Goto Github PK

View Code? Open in Web Editor NEW
89.0 7.0 18.0 4.64 MB

TemporAI: ML-centric Toolkit for Medical Time Series

Home Page: https://www.temporai.vanderschaar-lab.com/

License: Apache License 2.0

Python 68.46% Jupyter Notebook 31.54%
machine-learning medicine time-series automl

temporai's People

Contributors

bcebere avatar drshushen avatar julianklug avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

temporai's Issues

Investigate docs API generation

  1. The way module reference (API) is rendered in docs is not great - need to investigate.
  1. Warnings raised on documentation building like
/mnt/data-fourtb/Dropbox/Programming/wsl_repos/_vds/temporai/docs/../src/tempor/data/pandera_utils.py:docstring of tempor.data.pandera_utils:1: WARNING: Inline interpreted text or phrase reference start-string without end-string.
...

Investigate and fix these.

[AutoML] Add AutoML objective evaluation for regression tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #10
The benchmark is done using the cross-validation tester documented in #20
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/regression.py

[Evaluation] Add metrics for evaluating classification tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating classification tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

Important metrics to cover here:

  • aucroc : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
  • aucprc : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.
  • accuracy : Accuracy classification score.
  • f1_score(micro, macro, weighted): F1 score is a harmonic mean of the precision and recall. This version uses the "micro" average: calculate metrics globally by counting the total true positives, false negatives and false positives.
  • kappa: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.
  • precision(micro, macro, weighted): Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.
  • recall(micro, macro, weighted): Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.
  • mcc: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

[Epic] Preprocessing plugins

Description

Preprocessing plugins, for scaling or dimensionality reduction.

Why?

Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

  • drop constant features - TODO
  • handle multicollinearity - TODO
  • drop low variance features - TODO
  • encode data

Add tutorials(notebooks + Colab links)

Description

  • The library should have a tutorial for each major feature.
  • The tutorials should be notebooks, and should be also deployed on Colab, for easier use.

Issues with MethodSeeker and Dynamic DeepHit when running with PBC dataset

Hi,
I have errors when trying to run dynamic deephit with MethodSeeker with PBC dataset:
To reproduce the errors, please do the following:

from tempor.utils.dataloaders import PBCDataLoader
dataset = PBCDataLoader(random_state=42).load()

# Provide a custom hyperparameter space to search for each type of model.

hp_space = {
    "dynamic_deephit": [
        IntegerParams(name="n_iter", low=200, high=200),
        IntegerParams(name="batch_size", low=30, high=100),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "ts_xgb": [
        IntegerParams(name="n_iter", low=200, high=200),
        IntegerParams(name="batch_size", low=100, high=100),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
}

# Initialize a `MethodSeeker` and provide `override_hp_space`.
seeker = MethodSeeker(
    study_name="my_automl_study",
    task_type="time_to_event",
    estimator_names=[
        "dynamic_deephit",
        "ts_xgb",
    ],
    metric="c_index",   
    dataset=dataset,
    horizon=[1,5,9],
    return_top_k=2,
    num_iter=3,  # For the sake of speed of this example, only 3 AutoML iterations.
    tuner_type="bayesian",
    # Override hyperparameter space:
    override_hp_space=hp_space,
)

best_methods, best_scores = seeker.search()

The error is as follows:
Screenshot 2023-08-15 at 15 55 43

Thanks for the help!

Best wishes,
Wenjuan

[Feat] Reproducibility

Feature Description

Every plugin/pipeline or API call should support fixing the random seed.

There are methods for setting a global random seed. For example

# stdlib
import random

# third party
import numpy as np
import torch


def enable_reproducible_results(random_state: int = 0) -> None:
    np.random.seed(random_state)
    torch.manual_seed(random_state)
    random.seed(random_state)

[Epic] Imputation models

Description

Add imputation plugins.

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

[Evaluation] Add metrics for evaluating regression tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating regression tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

Important metrics to cover here:

  • r2" R^2(coefficient of determination) regression score function.
  • mse: Mean squared error regression loss.
  • mae: Mean absolute error regression loss.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

[Epic] Prediction models

Description

Add prediction models

Why?

Epics require a lot of work and often require a change in the scope of development. Justify your epic - why can't it just be a simple issue?

Breakdown

Provide a bulleted or numbered list of how you might break this epic down into smaller issues.

[CI] Github workflows

Description

Before releasing, the library should be tested on the matrix {MacOS, Windows, Linux} x {ython {3.7, 3.8, 3.9, 3.10} for compatiblity.

On each test scenario, all the unit tests should pass.

Reference workflow: https://github.com/vanderschaarlab/autoprognosis/blob/main/.github/workflows/test.yml


Additional notes:

[Enhancement] Add AutoML objective evaluation for ensembles

Feature Description

Given a set of K optimal pipelines selected by the AutoML logic given an objective, the next step is to evaluate ensembles of top of the candidate pipelines.

For the weighted ensemble, a separate AutoML search can be executed, to evaluate various weights.
The process benchmarks all the supported ensemble setups(weighted, stacked, voting etc), and returns the optimal solution.

depends on #8, #7, #6, #5, #13

AP references:

Use nbsphinx for tutorials / user guide in docs

nbsphinx extension is designed for integrating notebooks into documentation. Hence we should use that instead of the custom code in docs/pre_build.py. Ideally should also find a way of having the link to colab at the top of each tutorial in the docs.

[Epic] Testing

Description

All the classes should have test coverage.

Type of Test

  • Unit test (e.g. checking a loop, method, or function is working as intended)
  • Integration test (e.g. checking if a certain group or set of functionality is working as intended)
  • Regression test (e.g. checking if by adding or removing a module of code allows other systems to continue to function as intended)
  • Stress test (e.g. checking to see how well a system performs under various situations, including heavy usage)
  • Performance test (e.g. checking to see how efficient a system is as performing the intended task)
  • Other...

nn_regressor: Pydantic crash

Description

Pydantic imposes a limit on the number of temporai objects that can be instantiated.

Example: In the test_nn_regressor.py, the following snippet will crash

 def test_hyperparam_sample():
     for repeat in range(10000):  # pylint: disable=unused-variable
         args = plugin._cls.sample_hyperparameters()  # pylint: disable=no-member, protected-access
         plugin(**args)

with the error

>   ???
E   pydantic.error_wrappers.ValidationError: 1 validation error for _InitArgsValidator
E   __root__
E     Model parameters could not be validated as defined by `EmptyParamsDefinition`, cause: 
E   ---------------
E   RecursionError:
E   maximum recursion depth exceeded
E   ---------------
E    (type=value_error)

pydantic/main.py:342: ValidationError

Expected behaviour

Pydantic should not limit the functionality of the library.

[AutoML] Add AutoML objective evaluation for classification tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #11
The benchmark is done using the cross-validation tester documented in #20.
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/classifiers.py

why not contribute this to sktime?

It would seem that this package is "cloning" a number of core aspects of sktime, including data format, base class design, etc.

It does add some novel aspects, but there aren't too many differences at the moment.
So, why develop this in complete detachment from the pydata ecosytem?

Long-term, it will be much harder to maintain if you insist on trying to build a parallel ecosystem targeted at medical doctors.

I understand the academic sensitivities of wanting to "own", but that's not really how open source works - the more you give and let go of, the more you get back, and the more successful you will be.

For instance, why not contribute this to sktime?

[Feat] Benchmarking tools

Feature Description

The library should offer methods for evaluating predictive models/pipelines.

For each problem type, there can be different relevant metrics, as described in the linked tasks.

The evaluation should be done using KFold(regression)/StratifiedKFold(classification, survival analsysis), and predefined random seed.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py
Example implementation for time series survival
https://github.com/vanderschaarlab/synthcity/blob/main/src/synthcity/plugins/core/models/time_series_survival/benchmarks.py#L142

blocked by #9
blocked by #10
blocked by #11

[Feat] Add pipeline logic

Feature Description

The library should offer the possibility to execute multiple plugins in sequence, and sample hyperparameters for all of them.

Sampling hyperparameters require a class, so that you don't instantiate an useless object.
To that end, you can create meta classes in python by inheriting the type class directly.

The pipeline wrapper should offer the following interface:

  • fit - train the pipeline
  • predict - transform(for preprocessing plugins in the pipeline) and predict
  • hyperarameters_space/sample_hyperparameters which should call the sampling logic from each plugin.

Reference implementation https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/plugins/pipeline/__init__.py

The feature should be covered by tests, covering:

  • a standard pipeline train and predict.
  • hyperparameter sampling and pipeline instantiation.
  • serialization

depends on https://github.com/vanderschaarlab/temporai-priv/issues/5

[Enhancement] Ensemble support

Feature Description

Given a set of pipelines - or just estimators, users should be able to create ensembles.

Popular ensemble techniques

  • WeightedEnsemble: average across all scores/prediction results, maybe with weights
  • Stacking (meta ensembling): use a meta learner to learn the base classifier results
  • Majority Vote Ensemble
  • DCS: Dynamic Classifier Selection: Combination of multiple classifiers using local accuracy estimates
  • DES: Dynamic Ensemble Selection: From dynamic classifier selection to dynamic ensemble selection

Reference code in AutoPrognosis: https://github.com/vanderschaarlab/autoprognosis/tree/main/src/autoprognosis/plugins/ensemble

More about here: https://github.com/yzhao062/combo

[Bug] Catboost is required, but fails to build (macOS 13.5)

Describe the bug
Building the temporai package from pip or from github fails as catboost is required.
This is probably linked to #72 were the catboost dependency should have been removed and to catboost/catboost#2371 (comment)

Platform:

  • macOS 13.5
  • python 3.11.5

To Reproduce
pip install temporai
or
pip install "temporai @ git+https://github.com/vanderschaarlab/temporai@daa4af2e3943e5639098a4459464012c007245a3"

Expected behavior
Build should not fail, and catboost should probably not be required.

Results
temporai build fails.

Collecting catboost>=1.0.5 (from hyperimpute>=0.1.17->temporai@ git+https://github.com/vanderschaarlab/temporai@daa4af2e3943e5639098a4459464012c007245a3)
  Using cached catboost-1.2.2.tar.gz (60.1 MB)
``` Building wheel for catboost (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for catboost (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [218 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-311
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/monoforest.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/plot_helpers.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/metrics.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/version.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/text_processing.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/datasets.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/core.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
copying catboost/dev_utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/metrics_plotter.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/ipythonwidget.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
copying catboost/widget/callbacks.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/widget
creating build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/catboost_evaluation.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_model.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_readers.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/log_config.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_splitter.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/execution_case.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_storage.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/factor_utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/utils.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/evaluation_result.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
copying catboost/eval/_fold_models_handler.py -> build/lib.macosx-10.9-x86_64-cpython-311/catboost/eval
running build_ext
Buildling _catboost with cmake and ninja
target_platform=darwin-x86_64. Building targets _catboost with PIC
Running "cmake /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src -B /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311 -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/toolchains/clang.toolchain --log-level=VERBOSE -DCMAKE_POSITION_INDEPENDENT_CODE=On -DCATBOOST_COMPONENTS=python-package -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0 -DHAVE_CUDA=no -DPython3_ROOT_DIR=/Users/jk1/opt/anaconda3/envs/treatment_effects"
-- The C compiler identification is AppleClang 14.0.3.14030022
-- The CXX compiler identification is AppleClang 14.0.3.14030022
-- The ASM compiler identification is Clang
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /Users/jk1/opt/anaconda3/envs/treatment_effects/bin/python3.1 (found version "3.11.5") found components: Interpreter
-- CMAKE_C_FLAGS = " -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -Wall -Wextra -Wno-parentheses -Wno-implicit-const-int-float-conversion -Wno-unknown-warning-option -pipe -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -m64 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -mcx16 -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1"
-- CMAKE_CXX_FLAGS = " -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -Wall -Wextra -Wno-parentheses -Wno-implicit-const-int-float-conversion -Wno-unknown-warning-option -pipe -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -m64 -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -mcx16 -Woverloaded-virtual -Wimport-preprocessor-directive-pedantic -Wno-undefined-var-template -Wno-return-std-move -Wno-defaulted-function-deleted -Wno-pessimizing-move -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-enum-enum-conversion -Wno-deprecated-enum-float-conversion -Wno-ambiguous-reversed-operator -Wno-deprecated-volatile -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1"
-- Conan: checking conan executable
-- Conan: Found program /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/bin/conan
-- Conan: Version found Conan version 1.59.0
-- Conan executing: /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/bin/conan install /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src --remote conancenter --install-folder /private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311 --build missing --env CONAN_CMAKE_GENERATOR=Ninja --settings build_type=Release --settings compiler=apple-clang --settings compiler.version=14.0 --settings compiler.libcxx=libc++ --settings compiler.cppstd=20 --conf tools.cmake.cmaketoolchain:generator=Ninja
Configuration:
[settings]
arch=x86_64
arch_build=x86_64
build_type=Release
compiler=apple-clang
compiler.cppstd=20
compiler.libcxx=libc++
compiler.version=14.0
os=Macos
os_build=Macos
[options]
[build_requires]
[env]
CONAN_CMAKE_GENERATOR=Ninja
[conf]
tools.cmake.cmaketoolchain:generator=Ninja

  Version ranges solved
      Version range '>=1.2.11 <2' required by 'pcre/8.45' resolved to 'zlib/1.3' in local cache
  
  conanfile.txt: Installing package
  Requirements
      libiconv/1.15 from 'conancenter' - Cache
      openssl/1.1.1t from 'conancenter' - Cache
  Packages
      libiconv/1.15:e1ef30a7ac2ff8c218173fdf49ec961a5c046a36 - Cache
      openssl/1.1.1t:a319f556f93546f2dff1b70922784b70e7cba919 - Cache
  Build requirements
      bzip2/1.0.8 from 'conancenter' - Cache
      pcre/8.45 from 'conancenter' - Cache
      ragel/6.10 from 'conancenter' - Cache
      swig/4.0.2 from 'conancenter' - Cache
      yasm/1.3.0 from 'conancenter' - Cache
      zlib/1.3 from 'conancenter' - Cache
  Build requirements packages
      bzip2/1.0.8:b9b85a7c8f543b96385e1da9e174853f1fb08e0c - Cache
      pcre/8.45:842afe377248eac66b64b538531df2b005d57959 - Cache
      ragel/6.10:801752c0480319b8e090188c566245a78e9abcf4 - Cache
      swig/4.0.2:099d7b9cd06e9bd11e92b9a2ddf3b29cd986fdcb - Cache
      yasm/1.3.0:801752c0480319b8e090188c566245a78e9abcf4 - Cache
      zlib/1.3:a319f556f93546f2dff1b70922784b70e7cba919 - Cache
  
  Installing (downloading, building) binaries...
  bzip2/1.0.8: Already installed!
  libiconv/1.15: Already installed!
  openssl/1.1.1t: Already installed!
  ragel/6.10: Already installed!
  ragel/6.10: Appending PATH environment variable: /Users/jk1/.conan/data/ragel/6.10/_/_/package/801752c0480319b8e090188c566245a78e9abcf4/bin
  yasm/1.3.0: Already installed!
  yasm/1.3.0: Appending PATH environment variable: /Users/jk1/.conan/data/yasm/1.3.0/_/_/package/801752c0480319b8e090188c566245a78e9abcf4/bin
  zlib/1.3: Already installed!
  pcre/8.45: Already installed!
  swig/4.0.2: Already installed!
  swig/4.0.2: Appending PATH environment variable: /Users/jk1/.conan/data/swig/4.0.2/_/_/package/099d7b9cd06e9bd11e92b9a2ddf3b29cd986fdcb/bin
  conanfile.txt: Applying build-requirement: ragel/6.10
  conanfile.txt: Applying build-requirement: swig/4.0.2
  conanfile.txt: Applying build-requirement: yasm/1.3.0
  conanfile.txt: Applying build-requirement: pcre/8.45
  conanfile.txt: Applying build-requirement: bzip2/1.0.8
  conanfile.txt: Applying build-requirement: zlib/1.3
  conanfile.txt: Generator cmake_find_package created Findragel.cmake
  conanfile.txt: Generator cmake_find_package created FindSWIG.cmake
  conanfile.txt: Generator cmake_find_package created Findyasm.cmake
  conanfile.txt: Generator cmake_find_package created FindIconv.cmake
  conanfile.txt: Generator cmake_find_package created FindOpenSSL.cmake
  conanfile.txt: Generator cmake_find_package created FindPCRE.cmake
  conanfile.txt: Generator cmake_find_package created FindBZip2.cmake
  conanfile.txt: Generator cmake_find_package created FindZLIB.cmake
  conanfile.txt: Generator cmake_paths created conan_paths.cmake
  conanfile.txt: Generator txt created conanbuildinfo.txt
  conanfile.txt: Aggregating env generators
  conanfile.txt: Generated conaninfo.txt
  conanfile.txt: Generated graphinfo
  conanfile.txt imports(): Copied 434 '.i' files
  conanfile.txt imports(): Copied 273 '.swg' files
  conanfile.txt imports(): Copied 1 '.swig' file: Makefile.swig
  conanfile.txt imports(): Copied 2 '.ml' files: swig.ml, swigp4.ml
  conanfile.txt imports(): Copied 1 '.pl' file: Makefile.pl
  conanfile.txt imports(): Copied 6 files
  conanfile.txt imports(): Copied 1 '.rb' file: extconf.rb
  conanfile.txt imports(): Copied 1 '.h' file: noembed.h
  conanfile.txt imports(): Copied 1 '.scm' file: common.scm
  conanfile.txt imports(): Copied 1 '.mli' file: swig.mli
  conanfile.txt imports(): Copied 1 '.hpp' file: octheaders.hpp
  CMake Error at /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
    Could NOT find Python3 (missing: Python3_INCLUDE_DIRS Python3_LIBRARIES
    Development Development.Module Development.Embed)
  Call Stack (most recent call first):
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPython/Support.cmake:3166 (find_package_handle_standard_args)
    /usr/local/Cellar/cmake/3.21.3/share/cmake/Modules/FindPython3.cmake:485 (include)
    catboost/python-package/catboost/CMakeLists.darwin-x86_64.txt:9 (find_package)
    catboost/python-package/catboost/CMakeLists.txt:20 (include)
  
  
  -- Configuring incomplete, errors occurred!
  See also "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311/CMakeFiles/CMakeOutput.log".
  See also "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311/CMakeFiles/CMakeError.log".
  Traceback (most recent call last):
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
      return _build_backend().build_wheel(wheel_directory, config_settings,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 434, in build_wheel
      return self._build_with_temp_dir(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 419, in _build_with_temp_dir
      self.run_setup()
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 507, in run_setup
      super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in run_setup
      exec(code, locals())
    File "<string>", line 731, in <module>
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
             ^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 397, in run
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 364, in run
      self.run_command("build")
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 332, in run
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 131, in run
      self.run_command(cmd_name)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-build-env-itfn3d2q/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 444, in run
    File "<string>", line 462, in build_with_cmake_and_ninja
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/build_native.py", line 517, in build
      cmd_runner.run(cmake_cmd, env=build_environ)
    File "/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/build_native.py", line 164, in run
      subprocess.run(cmd, check=True, **subprocess_run_kwargs)
    File "/Users/jk1/opt/anaconda3/envs/treatment_effects/lib/python3.11/subprocess.py", line 571, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['cmake', '/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src', '-B', '/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/build/temp.macosx-10.9-x86_64-cpython-311', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_TOOLCHAIN_FILE=/private/var/folders/9v/1s329cwj32jc2kkx24p8jp980000gp/T/pip-install-pvf83pvv/catboost_978ae68fc7434fc3857f456b847dec21/catboost_all_src/build/toolchains/clang.toolchain', '--log-level=VERBOSE', '-DCMAKE_POSITION_INDEPENDENT_CODE=On', '-DCATBOOST_COMPONENTS=python-package', '-DCMAKE_OSX_DEPLOYMENT_TARGET=11.0', '-DHAVE_CUDA=no', '-DPython3_ROOT_DIR=/Users/jk1/opt/anaconda3/envs/treatment_effects']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for catboost
Successfully built temporai
Failed to build catboost
ERROR: Could not build wheels for catboost, which is required to install pyproject.toml-based projects

</details>

dynamic deephit plugin does not work on a simple dataset

Hi,

I created a simple dataset according to the example in data format tutorial.

time_series_df = pd.DataFrame(
    {
        "sample_idx": ["sample_0", "sample_0", "sample_0", "sample_1", "sample_1", "sample_1", "sample_2", "sample_2", "sample_2"],
        "time_idx": [1, 2, 3, 1, 2, 3, 1, 2, 3],
        "t_feat_0": [11, 12, 13, 14, 21, 22, 31, 28, 26],
        "t_feat_1": [1.1, 1.2, 1.3, 1, 2.1, 2.2, 3.1, 2.3, 2.0],
    }
)

# Set the 2-level index:
time_series_df.set_index(keys=["sample_idx", "time_idx"], drop=True, inplace=True)

# Create a static data dataframe.
static_df = pd.DataFrame(
    {
        "s_feat_0": [100, 200, 300],
        "s_feat_1": [-1.1, -1, -1.3],
    },
    index=["sample_0", "sample_1", "sample_2"],
)

# Create an event dataframe.

event_df = pd.DataFrame(
    {
        "e_feat_0": [(10, True), (12, False), (13, True)],
    },
    index=["sample_0", "sample_1", "sample_2"],
)

# Create a dataset of time-to-event analysis task:
data = TimeToEventAnalysisDataset(
    time_series=time_series_df,
    static=static_df,
    targets=event_df,
)

But the dynamic_deephit plugin does not work on the above dataset with the following errors.

Screenshot 2023-07-26 at 15 00 55

It seems that the dataset was not fit-ready. I checked that I have all the components for time to event dataset. Could you help with this please?

Thanks very much,
Wenjuan

[Enhancement] Integrate jaxtyping for advanced parameter validation #120

Description

An improvement on top of pydantic would be to integrate jaxtyping, which allows for validating tensor shapes as well
jaxtyping supports PyTorch tensors and numpy arrays.

Example

from jaxtyping import Array, Float, PyTree

# Accepts floating-point 2D arrays with matching dimensions
def matrix_multiply(x: Float[Array, "dim1 dim2"],
                    y: Float[Array, "dim2 dim3"]
                  ) -> Float[Array, "dim1 dim3"]:
    ...

def accepts_pytree_of_ints(x: PyTree[int]):
    ...

def accepts_pytree_of_arrays(x: PyTree[Float[Array, "batch c1 c2"]]):
    ...

https://github.com/google/jaxtyping

[Enhancement] Evaluation: Add more metrics for evaluating survival analysis tasks

Feature Description

One of the major tasks of the library is evaluating the quality of the models and evaluating the AutoML objectives.

To that end, metrics are needed for every supported problem type.

One of them is evaluating survival analysis tasks. The library should offer an API for using any of these metrics, testing the predicted values against the ground truth.

The metrics should be reported by each evaluation time horizon, and aggregated(mean, std).

Important metrics to cover here:
[X] c_index : The concordance index or c-index is a metric to evaluate the predictions made by a survival algorithm. It is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs.
[X] brier_score: The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions.

  • aucroc : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
  • sensitivity: Sensitivity (true positive rate) is the probability of a positive test result, conditioned on the individual truly being positive.
  • specificity: Specificity (true negative rate) is the probability of a negative test result, conditioned on the individual truly being negative.
  • PPV: The positive predictive value(PPV) is the probability that following a positive test result, that individual will truly have that specific disease.
  • NPV: The negative predictive value(NPV) is the probability that following a negative test result, that individual will truly not have that specific disease.

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/utils/tester.py

[Enhancement] Fix RNN contiguous memory warning (serialization)

After serializing and deserializing some models that include RNNs, the following warning is received:

UserWarning: RNN module weights are not part of single contiguous chunk of memory"

Serialization mechanism needs to be improved to fix this problem.

[AutoML] Add AutoML objective evaluation for survival analysis tasks

Feature Description

[AutoML] Add AutoML objective evaluation for classification tasks

Feature Description

For a sampled pipeline, the AutoML search process needs to evaluate an objective, and suggest future points in a direction.

For classification tasks, the evaluation metrics are documented here #15
The evaluation is done using the cross-validation tester documented in #20
Given a metric, the optimization process might seek to maximize or minimize the objective.

AP Reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/risk_estimation.py

[Enhancement] Introduce plugin types

Introduce plugin types which can be listed separately, so that we can have clear separation between models/metrics/data sources etc. plugins.

[AutoML] Create pipeline from hyperparameters

Feature Description

For AutoML search, is important to be able to sample hyperparameters, and to recreate the pipeline from those hyperparameters.

AutoPrognosis implements the following strategy:

  • For each search task, the user can select imputation, preprocessing and prediction plugins.
  • For each pipeline, the prediction plugin "drives" the whole pipeline selection.
  • To that end, we artificially extend the predictor hyperparameters to include the imputation and preprocessing plugins. In other words, the user samples from predictor's [imputation plugin 1, 2, 3, ...] + [preprocessing plugin 1, 2, 3, ...] + hyperparam_space. This simplifies the sampling process.
  • Given a sampled preprocessing/imputed plugin, and a set of hyperparameters, the user must be able to create the complete pipeline

AP reference: https://github.com/vanderschaarlab/autoprognosis/blob/main/src/autoprognosis/explorers/core/selector.py

Blocked by: #27

time_to_event models and scaling methods do not work with categorical data

Hi I found that the time_to_event models and scaling methods do not work with categorical data.

To reproduce it, one can go to the tutorials/data/tutorial1_data_format.ipynb, use the data example (comment one line of event data) and run the time to event models.

I have only tried time to event models and scaling methods, other methods might not work on categorical data as well. According to the tutorials for data format, pandas.Catergorical is supported as column values?

Thanks for looking into it!
Wenjuan

Upgrade to pydantic 2

Pydantic 2.0 is now the current version, so changes need to be made to use it rather then 1.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.