carefree0910 / carefree-learn Goto Github PK

Deep Learning ❤️ PyTorch

Home Page: https://carefree0910.me/carefree-learn-doc/

License: MIT License

Python 99.98% Dockerfile 0.02%

algorithm automl computer-vision data-science deep-learning ensemble machine-learning numpy python pytorch tabular-data tabular-datasets

carefree-learn's People

Contributors

Stargazers

Watchers

carefree-learn's Issues

Add documentation for `Protocol`s

Support packing multiple models in `Pack`

save & load might be broken when compress=False

Add `metric_targets`

So once all metrics reach the corresponding target, we can safely early stop the training process.

[CRITICAL] Fix bugs when we apply customizations

This bug breaks customization codes in v0.1.5, which makes it a broken release.

Add unittests for customizations

Support customize the `reduce` step

keep a copy of the orignal user defined configs

Otherwise the make_from API will always be buggy.

Support `increment_config` in `make_from` & `finetune`

Should not always use cuda:0 when `PrefetchLoader` is applied

Automatically infer `should_bypass`

Finished at 5c6b08f

Document the production parts (`Pack`)

Implement `PipelineProtocal`

So we can utilize carefree-learn's APIs on other models (e.g. sklearn models).

The scores in your Titanic demo, with the new AutoML system, are not as good as they were before. I tried it now using https://github.com/carefree0910/carefree-learn/blob/dev/examples/titanic/test_titanic.py
and submitted to Kaggle and got:
Optuna: 0.77751
HPO - 0.75598
AdaBoost: 0.67703

Clean up benchmark codes

Maybe they should be moved to a new repo

Losses should take dictionary as input

TypeError: estimate() got an unexpected keyword argument 'pipelines'

When running tutorial code :

#%%
import cflearn
from cfdata.tabular import TabularDataset

import cflearn

from cfdata.tabular import *

# prepare iris dataset
iris = TabularDataset.iris()
iris = TabularData.from_dataset(iris)
# split 10% of the data as validation data
split = iris.split(0.1)
train, valid = split.remained, split.split
x_tr, y_tr = train.processed.xy
x_cv, y_cv = valid.processed.xy
data = x_tr, y_tr, x_cv, y_cv

m = cflearn.make().fit(*data)
# Make label predictions
m.predict(x_cv)
# Make probability predictions
m.predict_prob(x_cv)
# Estimate performance
cflearn.estimate(x_cv, y_cv, pipelines=m)

We get :

                                     Traceback (most recent call last):
  File "C:\ProgramData\miniconda\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-78d46f42bbd0>", line 24, in <module>
    cflearn.estimate(x_cv, y_cv, pipelines=m)
TypeError: estimate() got an unexpected keyword argument 'pipelines'

Write documents on other High Level APIs

ValueError: too many values to unpack (expected 2)

Hi,

FYI, I got this error running the example:

import cflearn
from cfdata.tabular import TabularDataset

x, y = TabularDataset.iris().xy
m = cflearn.make().fit(x, y)

Traceback (most recent call last):
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\IPython\core\interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-7dea84d121c1>", line 5, in <module>
    m = cflearn.make().fit(x, y)
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\cflearn\bases.py", line 860, in fit
    self._before_loop(x, y, x_cv, y_cv)
  File "D:\Anaconda3\envs\tipjar\lib\site-packages\cflearn\bases.py", line 829, in _before_loop
    self.cv_data, self.tr_data = self.tr_data.split(self._cv_split)
ValueError: too many values to unpack (expected 2)

Was able to fix by installing the latest:

pip install -U git+https://github.com/carefree0910/carefree-learn.git

Split?

Hi, I am new to carefree and enjoying it so far. I am using cv_split=.2. My data is not IID and temporal so want to make sure the split is doesn't shuffle/stratify. It appears from your code that it does not shuffle:

split = self.tr_data.split(self._cv_split)

Is this correct?

AttributeError: module 'cflearn' has no attribute 'make'

On my Ubuntu 18.04.4 server, when I run your quick start code, I get this error:
root@server1:~/newautoml# python3 cflearn.py
Traceback (most recent call last):
File "cflearn.py", line 1, in
import cflearn
File "/root/newautoml/cflearn.py", line 5, in
m = cflearn.make().fit(x, y)
AttributeError: module 'cflearn' has no attribute 'make'

AttributeError: module 'cflearn' has no attribute 'Auto'

when running turorial code :

import cflearn

from cfdata.tabular import *

# prepare iris dataset
iris = TabularDataset.iris()
iris = TabularData.from_dataset(iris)
# split 10% of the data as validation data
split = iris.split(0.1)
train, valid = split.remained, split.split
x_tr, y_tr = train.processed.xy
x_cv, y_cv = valid.processed.xy
data = x_tr, y_tr, x_cv, y_cv

#%%
fcnn = cflearn.make().fit(*data)

# 'overfit' validation set
auto = cflearn.Auto(TaskTypes.CLASSIFICATION).fit(*data, num_jobs=2)

# estimate manually
predictions = auto.predict(x_cv)
print("accuracy:", (y_cv == predictions).mean())

# estimate with `cflearn`
cflearn.estimate(
    x_cv,
    y_cv,
    pipelines=fcnn,
    other_patterns={"auto": auto.pattern},
)

Get this error :

File "C:\ProgramData\miniconda\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 4, in
auto = cflearn.Auto(TaskTypes.CLASSIFICATION).fit(*data, num_jobs=2)
AttributeError: module 'cflearn' has no attribute 'Auto'

Cannot specify metrics

Provide more examples on runnables

Write documentations on examples, optimizations and customizations

Improve documentation with docusaurus

Here's the on-going repo.

Improve `TrainMonitor`

Try to escape local minima by dynamically adjusting batch size

e.g. lower the batch size when the std is too small

Implement `DataProtocal`, `DataLoaderProtocal`, etc.

So we can use other forms of data in carefree-learn.

Document the `meta_config` technique

AutoML Mode Question

Maybe I am misunderstanding how it works, but when I run your latest version in automl mode (using https://github.com/carefree0910/carefree-learn/blob/dev/examples/titanic/automl.py), it runs the same methods it always did (fcnn_optuna, tree_dnn_optuna, etc.) and gets the same Kaggle score as before (0.78947). But in your latest version you wrote that you
"Implemented more models (Linear, TreeLinear, Wide and Deep, RNN, Transformer, etc.).". Shouldn't those be in the automl part?

Integrate DeepSpeed

This is mainly for downstream usages, because in most cases neural networks are not required to train distributedly when they are targeting tabular datasets.

Clean up codes & APIs for cflearn.dist

Cross validate on multiple splits (kfold/time series split)?

Hi, Is it possible or are there plans to provide kfold or time series split cross validation? Thank you

[CRITICAL] Regression is now broken due to legacy bugs

This bug is caused by introducing get_outputs. Previously, when binary_threshold_outputs is not None, we know that this is definitely a binary classification task. But now it will never be None, even on regression tasks.

Depend patience of `TrainMonitor` on dataset size

Fix the design of `Environment`

Currently we cannot change the default behaviour of Environment. Fix it by refactor _preset_config and _init_config stuffs.

Integrate MLflow

Add documentation for `Aggregator`

Artifacts should be logged when using `mlflow run` calls

Clean up APIs

Provide better user-side experiences.
Made development on carefree-learn carefree as well.

Keyring is skipped due to an exception: "WindowsPath" object has no attribute 'read_text'

On a Windows 10 Pro machine, trying to install via pip install carefree-learn breaks on some anaconda environments (but not on others, as I've confirmed)
I'm using the most up-to-date version of pip, 20.2.2

Keyring is skipped due to an exception: "WindowsPath" object has no attribute 'read_text'

I figured this was due to pathlib2 interfering with pathlib so I uninstalled pathlib2, but still no joy.

I performed a conda clean --all -y to remove any lingering tarballs, but this too did not help.

I then had an idea to manually install the dependencies manually, so started with pip install carefree-ml and this successfully installed carefree-ml.
I then was able to successfully run pip install -carefree-learn

Strange error. I wonder if it may be related to the new pip 20.2 dependency resolver.
Regardless, it's installed now. Hope this helps anyone else who encounters a similar install issue.

AttributeError: module 'cflearn' has no attribute 'Ensemble'

When I git clone your repo, pip3 install it (Ubuntu 18.04.5) , and run test_titanic.py without any changes, I get this error:
root@ns544446:~/carefree-learn/examples/titanic# sudo python3 test_titanic.py
Traceback (most recent call last):
File "test_titanic.py", line 64, in
test_adaboost()
File "test_titanic.py", line 60, in test_adaboost
_test("adaboost", _adaboost_core)
File "test_titanic.py", line 44, in _test
data, pattern = _core(train_file)
File "test_titanic.py", line 36, in _adaboost_core
ensemble = cflearn.Ensemble(TaskTypes.CLASSIFICATION, config)
AttributeError: module 'cflearn' has no attribute 'Ensemble'

Visualize pipe structures in ModelBase
Implement Factory class
Record best epoch & step

carefree0910 / carefree-learn Goto Github PK

carefree-learn's People

Contributors

Stargazers

Watchers

Forkers

carefree-learn's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs