datacanvasio / deeptables Goto Github PK

DeepTables: Deep-learning Toolkit for Tabular data

Home Page: https://deeptables.readthedocs.io

License: Apache License 2.0

Python 99.63% Dockerfile 0.37%

tabular-data deep-learning deepfm fm afm dcn-model ctr-prediction pnn xdeepfm wide-and-deep autoint fgcnn fibinet factorization-machines structured-data

deeptables's Introduction

DeepTables

We Are Hiring！

Dear folks, we are opening several precious positions based in Beijing both for professionals and interns avid in AutoML/NAS, please send your resume/cv to [email protected]. (Application deadline: TBD.)

DeepTables: Deep-learning Toolkit for Tabular data

DeepTables(DT) is a easy-to-use toolkit that enables deep learning to unleash great power on tabular data.

Overview

MLP (also known as Fully-connected neural networks) have been shown inefficient in learning distribution representation. The "add" operations of the perceptron layer have been proven poor performance to exploring multiplicative feature interactions. In most cases, manual feature engineering is necessary and this work requires extensive domain knowledge and very cumbersome. How learning feature interactions efficiently in neural networks becomes the most important problem.

Various models have been proposed to CTR prediction and continue to outperform existing state-of-the-art approaches to the late years. Well-known examples include FM, DeepFM, Wide&Deep, DCN, PNN, etc. These models can also provide good performance on tabular data under reasonable utilization.

DT aims to utilize the latest research findings to provide users with an end-to-end toolkit on tabular data.

DT has been designed with these key goals in mind:

Easy to use, non-experts can also use.
Provide good performance out of the box.
Flexible architecture and easy expansion by user.

Tutorials

Please refer to the official docs at https://deeptables.readthedocs.io/en/latest/.

Installation

pip is recommended to install DeepTables:

pip install tensorflow deeptables

Note:

Tensorflow is required by DeepTables, install it before running DeepTables.

GPU Setup (Optional)

To use DeepTables with GPU devices, install tensorflow-gpu instead of tensorflow.

pip install tensorflow-gpu deeptables

Verify the installation:

python -c "from deeptables.utils.quicktest import test; test()"

Optional dependencies

Following libraries are not hard dependencies and are not automatically installed when you install DeepTables. To use all functionalities of DT, these optional dependencies must be installed.

pip install shap

Example：

A simple binary classification example

import numpy as np
from deeptables.models import deeptable, deepnets
from deeptables.datasets import dsutils
from sklearn.model_selection import train_test_split

#loading data
df = dsutils.load_bank()
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)

y = df_train.pop('y')
y_test = df_test.pop('y')

#training
config = deeptable.ModelConfig(nets=deepnets.DeepFM)
dt = deeptable.DeepTable(config=config)
model, history = dt.fit(df_train, y, epochs=10)

#evaluation
result = dt.evaluate(df_test,y_test, batch_size=512, verbose=0)
print(result)

#scoring
preds = dt.predict(df_test)

A solution using DeepTables to win the 1st place in Kaggle Categorical Feature Encoding Challenge II

Click here

Citation

If you use DeepTables in your research, please cite us as follows:

Jian Yang, Xuefeng Li, Haifeng Wu. DeepTables: A Deep Learning Python Package for Tabular Data. https://github.com/DataCanvasIO/DeepTables, 2022. Version 0.2.x.

BibTex:

@misc{deeptables,
  author={Jian Yang, Xuefeng Li, Haifeng Wu},
  title={{DeepTables}: { A Deep Learning Python Package for Tabular Data}},
  howpublished={https://github.com/DataCanvasIO/DeepTables},
  note={Version 0.2.x},
  year={2022}
}

DataCanvas

DeepTables is an open source project created by DataCanvas.

deeptables's People

Contributors

Stargazers

Watchers

Forkers

jiwangduan dc-aps sunshuangkai yanghao1203 grandiachen justforlife lwc-arnold qcwang wuyucheng fq9614 liming8502628 qigj banjuede dabawse167 oaksharks b-xiang tonylv kiminh mindis zggl niuwan1 sanyam07 eminemand2pac zhixuanliu av90 adeyinka-hub qianrenjian lihengtianxia archernero gladomat liangzhu2008 manikant92 jxlijunhao jingmouren jyjatbupt allensmile binbinmeng shaikhzhas wangliang666 aicuramedical songfgh xujiameng miranska lixfz wanghao0911 frankbaul hriszhao cllhome minarastgar 18918090402 hanmichael china-challengehub c1258797185 nkgfirecream masknugget ycw01 strongerux huimei0907 hgomezsa huabao97 dm-num plubinski 2011cine zhangxjohn pangpang97 aries-jessie houkai mr-memorandum dormouse17 xgyaegliu992 chenfeng107 vnaazleenw laopeng2021 shinylover yikuide yilinmaster wwwzyf bigandsweet ytc272098215 sailfish009 dd-guo antihoney-code zhengxiang1994 ningchungui zhudongwork sapdo observedobserver sra1nani0303 zhangkaicr akhoso skyisnotwarm eryalee python-repository-hub huguanglong algonacci xiaochongzi8616 twilightsight barakber fudp gg-big-org

deeptables's Issues

Some examples don't work

I'm a new to DeepTables. How can I fix them?
automl:

      9 best_trial = hdt.get_best_trial()
---> 10 print(f'best reward:{best_trial.reward}')

AttributeError: 'NoneType' object has no attribute 'reward'

batch_trainer and batch_trainer-cv:

----> 2 from deeptables.utils import consts, dt_logging, batch_trainer

/usr/local/lib/python3.7/dist-packages/deeptables/utils/dart_early_stopping.py
----> 5 from lightgbm.compat import range_
ImportError: cannot import name 'range_' from 'lightgbm.compat' (/usr/local/lib/python3.7/dist-packages/lightgbm/compat.py)

dt_cross_validation:

roc_auc_score(y_train, oof_proba)
ValueError: bad input shape (627, 2)

Numpy and Tensorflow: version conflict

During the installation of DeepTables on Python 3.7 using pip install deeptables, I get

ERROR: tensorflow 2.3.1 has requirement numpy<1.19.0,>=1.16.0, but you'll have numpy 1.19.4 which is incompatible.

The installation completes, and the package is working, but I wonder if there are some corner cases where Tensorflow may "explode".

My workaround is to install DeepTables in two steps:

pip install tensorflow
pip install deeptables

I wonder if there is a better way to get rid of the error permanently.

做一个二分类时报错'list' object is not callable。

作者您好，我在用example里面做二分类的代码做一个含142个特征的数据集的二分类问题，报错信息是'list' object is not callable。报错具体为：deepmodel.py的83行和196行；deepmodel.py的358行。可以协助我改一下吗

How to use both AUCPR and AUC as metric for 'deeptable.fit'

Hi there,
I'm trying to use AUCPR and AUC as metrics, so I defined the param of ModelConfig like metrics=[keras.metrics.AUC(name="AUCPR", curve='PR', num_thresholds=1000),keras.metrics.AUC(name="AUC", curve='ROC', num_thresholds=1000)] . However, it did not work and raised a error 'AttributeError: 'AUC' object has no attribute 'name''.
Besides, I would like to know how to input pre-defined validation data to DeepTable.fit. I read its source code and found no related description. Is it correct if I set validation_data = (val_x, val_y)?

I would be grateful for any suggestions!

Best,
Wenyi Jin

Which model support "multiclass" task

Please make sure that this is a feature request.

System information

DeepTables version: 0.2.3.1
Are you willing to contribute it: No

Describe the feature and the current behavior/state.
When i test with various models in ModelConfig by change nets=['fibi_nets'], I see some model only support for binary classification, and when I tried to use with multi classification, I always get this error: ValueError: Unexpected logit output.{}

So how can I know which model support for multi classification, and which model is only for binary ?
Please help

Imputation error when working with categorical variables

Some variables I have are categorical but have integers as entries. I get the following error when the imputer tries to process them:

'fill_value'=nan-fillvalue is invalid. Expected a numerical value when imputing numerical data
For the moment I took fill_value='nan-fillvalue' out of the models/preprocessor.py on line 237. Now the default as given by sklearn.SimpleImputer() is used instead and it works.

Integrated Gradients and Embedding layer

I would like to calculate feature importances using integrated gradients, as I am hoping this calculation will be faster than the SHAP KernelExplainer. Unfortunately, the embeddding layer is non-differentiable, since it's a simple matrix multiplication. This causes a failure in the gradient calculation.

I also tried embedding the categorical variables separately and then feeding the full data into the model. Theoretically, this procedure avoids calculating the gradients for the embedding layer because there is none present. However, when every variable is numeric the preprocesser discards all variables which are 0 from top to bottom. This is not what I want, as I want to train in batches.

I have two questions:

Is there a way to calculate integrated gradients with embedding layers?
How can I stop the deeptables pre-processor to discard variables which have non-unique (all zeros) values?

Thanks!

varlen features implementations

Do you have a plan to implement Varlen sparse features and different pooling layers?

Error when passing validation_data to dt.fi()

I encountered an error while passing my own validation data to the fit() function. It says that a tuple isn't callable and found this in line 336 in deeptable.py:

X_val, y_val = validation_data(0), validation_data(1)

I'm not sure what it was supposed to be, but I changed it to square brackets:

X_val, y_val = validation_data[0], validation_data[1]

while passing validation_data=(X_val, y_val), and now it works.

How do I pass task='regression' to DeepTable'?

I haven't figure out how to pass the task which I want done to DeepTable. I saw that I can pass it to DeepModel(task='regression'). However, I want to pass it to DeepTalbe.

Edit: The reason I want to do this is that the auto-task categorizes the task as a multiclass classification if I pass ints as y, and the number of classes < 1000 (which they are in my case). I could, of course, convert the ints to float32, but I want the control over the model directly.

parameter setting

您好，想咨询下deeptable调参问题
1.二分类数据集，正负样本比例不平衡(10 : 1)，使用xgboost可以设置样本权重，deeptable是否有类似参数可以设置呢？
2.针对正负样本比例不平衡的数据集，有什么参数设置让模型得到比较好的评估值？
3.nets参数可以多选网络，在训练中是将多个网络组合，还是选择其中一个或几个好的网络呢？
4.看样例可以设置LightGBM参数，那需要有开关开启使用LightGBM吗？
5.对于deeptable有哪些好的调参经验参考呢？

deeptables-GPU安装使用方式

pip install deeptables[gpu] 安装gpu版本会默认装上cpu版本，即同时存在两个版本，如何仅Gpu版本安装？
同时存在两个版本在使用时会报错，当删除cpu版本后，gpu即可正常运行。

"packaging" not found

Python package packaging is missing from the list of required packages in v.0.1.13. Here is the repro code:

>>> from deeptables.models.deepnets import WideDeep
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "./venv/lib/python3.7/site-packages/deeptables/models/__init__.py", line 6, in <module>
    from deeptables.models.deeptable import DeepTable
  File ".venv/lib/python3.7/site-packages/deeptables/models/deeptable.py", line 25, in <module>
    from ..utils.tf_version import tf_less_than
  File "./venv/lib/python3.7/site-packages/deeptables/utils/tf_version.py", line 7, in <module>
    from packaging.version import parse
ModuleNotFoundError: No module named 'packaging'

Running pip install packaging fixes this.

Bug: Can't pickle local object during save

Can't save model

Somehow, I remember successfuly saving it once before, but it no longer works.
I've tried to play with the home_dir parameter in the config, cleaning it, and using a tempfile.TemporaryDirectory but can no longer save my models.

Version: 0.1.14

Tried from pip and from master

System: OSX

Reproduce:

from deeptables.models import deeptable
from sklearn.datasets import load_iris

data = load_iris()
X = data['data']
y = data['target']
conf = deeptable.ModelConfig(
        nets=['dnn_nets'],
        metrics=['AUC'],
        optimizer=keras.optimizers.RMSprop(),
    )
dt = deeptable.DeepTable(config=conf)
dt.fit(X, y)
dt.save('model')

Error:

Load model from disk:models/dt/dnn_nets.h5.
Traceback (most recent call last):
  File ".../python3.7/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-19-5a45263f9c71>", line 1, in <module>
    dt.save('model')
  File ".../python3.7/site-packages/deeptables/models/deeptable.py", line 723, in save
    pickle.dump(self, output, protocol=2)
AttributeError: Can't pickle local object 'make_gradient_clipnorm_fn.<locals>.<lambda>'

Deep session interest network for CTR

Do you have a plan to implement Deep session interest network for CTR paper (https://arxiv.org/abs/1905.06482)?

Question: For some models, only categrical inputs passed on. Why?

I was wondering, when I set up a model, for example a Wide and Deep model with an AutoInt Net and a CIN Net, only the Wide and Deep model gets fed with numerical variables, whereas the others only get the embedded categorical variables. See the attached figure. Why is that? wouldn't that mean, that interaction values between numerical variables get ignored? The other problem is, some numerical variables I feed into the model were originally categorical variables of differing lengths, that have been embedded through a different method. These also must exhibit interaction effects in a non-linear fashion. I thus find it a pity to not be able to take advantage of this extra information. Or I have a completely wrong understanding of the underlying structure.

I certainly would appreciate an explanation. Thank you!

No module named 'woodwork.serialize'

This template is for miscellaneous issues not covered by the other issue categories.

I cannot run the example. When I run 'from deeptables.models.deeptable import DeepTable, ModelConfig', I will get this error.

No module named 'woodwork.serialize'

What should I do?

Error on deeptables import on colab

Hi! I'm trying to run deeptables on google colab and I'm getting an error related to featuretools.primitives. Here's the full error traceback:

`
/usr/local/lib/python3.8/dist-packages/deeptables/models/init.py in
4 from deeptables.models.config import ModelConfig
5 from deeptables.models.modelset import ModelInfo, ModelSet
----> 6 from deeptables.models.deeptable import DeepTable
7 from deeptables.models.deepmodel import DeepModel
8 from deeptables.models.metainfo import ContinuousColumn, CategoricalColumn

/usr/local/lib/python3.8/dist-packages/deeptables/models/deeptable.py in
15 from tensorflow.keras.layers import Concatenate, BatchNormalization
16
---> 17 from hypernets.tabular import get_tool_box, is_dask_installed
18 from . import modelset, deepnets
19 from .config import ModelConfig

/usr/local/lib/python3.8/dist-packages/hypernets/tabular/init.py in
4 """
5 from ._base import get_tool_box, register_toolbox, register_transformer, tb_transformer
----> 6 from .toolbox import ToolBox
7
8 register_toolbox(ToolBox)

/usr/local/lib/python3.8/dist-packages/hypernets/tabular/toolbox.py in
24 from . import ensemble as ensemble_
25 from . import estimator_detector as estimator_detector_
---> 26 from . import feature_generators as feature_generators_
27 from . import metrics as metrics_
28 from . import pseudo_labeling as pseudo_labeling_

/usr/local/lib/python3.8/dist-packages/hypernets/tabular/feature_generators/init.py in
3
4 """
----> 5 from ._primitives import CrossCategorical, GeoHashPrimitive, DaskCompatibleHaversine, TfidfPrimitive
6 from ._transformers import FeatureGenerationTransformer

/usr/local/lib/python3.8/dist-packages/hypernets/tabular/feature_generators/_primitives.py in
15
16
---> 17 class DaskCompatibleTransformPrimitive(primitives.TransformPrimitive):
18 compatibility = [primitives.Library.PANDAS, primitives.Library.DASK]
19 return_dtype = 'object'

/usr/local/lib/python3.8/dist-packages/hypernets/tabular/feature_generators/_primitives.py in DaskCompatibleTransformPrimitive()
16
17 class DaskCompatibleTransformPrimitive(primitives.TransformPrimitive):
---> 18 compatibility = [primitives.Library.PANDAS, primitives.Library.DASK]
19 return_dtype = 'object'
20 commutative = True

AttributeError: module 'featuretools.primitives' has no attribute 'Library'
`

I'd be grateful for your help with this.

TypeError: import_optional_dependency() got an unexpected keyword argument 'errors'

Hello,

I'm trying to import DeepTables into my Google Colab notebook like so:

import deeptables
from deeptables.models import make_experiment
from deeptables.models.hyper_dt import DTModuleSpace, DnnModule, DTFit

However, I am met with the following errors:

TypeError                                 Traceback (most recent call last)
<ipython-input-20-5ce1cb115d93> in <module>()
     50 
     51 import deeptables
---> 52 from deeptables.models import make_experiment
     53 from deeptables.models.hyper_dt import DTModuleSpace, DnnModule, DTFit

11 frames
/usr/local/lib/python3.7/dist-packages/deeptables/models/__init__.py in <module>()
      2 __author__ = 'yangjian'
      3 
----> 4 from deeptables.models.config import ModelConfig
      5 from deeptables.models.modelset import ModelInfo, ModelSet
      6 from deeptables.models.deeptable import DeepTable

/usr/local/lib/python3.7/dist-packages/deeptables/models/config.py in <module>()
      3 import collections
      4 import os
----> 5 from ..utils import consts
      6 from . import deepnets as deepnets
      7 

/usr/local/lib/python3.7/dist-packages/deeptables/utils/__init__.py in <module>()
      4 
      5 """
----> 6 from hypernets.utils import fs, hash_data, infer_task_type, load_data, isnotebook
      7 from hypernets.tabular.metrics import calc_score

/usr/local/lib/python3.7/dist-packages/hypernets/utils/__init__.py in <module>()
      8 from ._fsutils import filesystem as fs
      9 from ._tic_tok import tic_toc, report as tic_toc_report, report_as_dataframe as tic_toc_report_as_dataframe
---> 10 from .common import generate_id, combinations, isnotebook, Counter, to_repr, get_params
     11 from .common import infer_task_type, hash_data, hash_dataframe, load_data, load_module
     12 from ._estimators import load_estimator, save_estimator

/usr/local/lib/python3.7/dist-packages/hypernets/utils/common.py in <module>()
     15 
     16 import dask.array as da
---> 17 import dask.dataframe as dd
     18 import numpy as np
     19 import pandas as pd

/usr/local/lib/python3.7/dist-packages/dask/dataframe/__init__.py in <module>()
      1 try:
----> 2     from .core import (
      3         DataFrame,
      4         Series,
      5         Index,

/usr/local/lib/python3.7/dist-packages/dask/dataframe/core.py in <module>()
     75 no_default = "__no_default__"
     76 
---> 77 pd.set_option("compute.use_numexpr", False)
     78 
     79 

/usr/local/lib/python3.7/dist-packages/pandas/_config/config.py in __call__(self, *args, **kwds)
    231 # class below which wraps functions inside a callable, and converts
    232 # __doc__ into a property function. The doctsrings below are templates
--> 233 # using the py2.6+ advanced formatting syntax to plug in a concise list
    234 # of options, and option descriptions.
    235 

/usr/local/lib/python3.7/dist-packages/pandas/_config/config.py in _set_option(*args, **kwargs)
    139         if o and o.validator:
    140             o.validator(v)
--> 141 
    142         # walk the nested dict
    143         root, k = _get_root(key)

/usr/local/lib/python3.7/dist-packages/pandas/core/config_init.py in use_numexpr_cb(key)
     48 
     49 
---> 50 def use_numexpr_cb(key):
     51     from pandas.core.computation import expressions
     52 

/usr/local/lib/python3.7/dist-packages/pandas/core/computation/expressions.py in <module>()
     17 from pandas._typing import FuncType
     18 
---> 19 from pandas.core.computation.check import NUMEXPR_INSTALLED
     20 from pandas.core.ops import roperator
     21 

/usr/local/lib/python3.7/dist-packages/pandas/core/computation/check.py in <module>()
      1 from pandas.compat._optional import import_optional_dependency
      2 
----> 3 ne = import_optional_dependency("numexpr", errors="warn")
      4 NUMEXPR_INSTALLED = ne is not None
      5 if NUMEXPR_INSTALLED:

TypeError: import_optional_dependency() got an unexpected keyword argument 'errors'

The strange thing is that everything imports normally without errors on a local Jupyter Notebook however it's throwing this error on Colab. My pandas version is the latest build and I'm unable to find the issue with this. Has anyone has a similar issue on Colab? Thanks

模型保存路径有问题

dt.save('/Users/wjq/Downloads/dtxx')
我把模型保存在'/Users/wjq/Downloads/dtxx'这个路径下，但这个路径下却为空。
w=dt.load('/Users/wjq/Downloads/dtxx'),当我从'/Users/wjq/Downloads/dtxx'这个路径load模型时发现其实deeptable并不是从'/Users/wjq/Downloads/dtxx'load模型的，而是从'/var/folders/g9/nwf7rmdd3nlfszr22lfg4lhc0000gn/T/workdir/Users/wjq/Downloads/dtxx/dt.pkl'这个路径下load模型的，请告诉我如何让模型保存在我指定的路径下

w=dt.load('/Users/wjq/Downloads/dtxx')
Traceback (most recent call last):
File "/Applications/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3457, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
w=dt.load('/Users/wjq/Downloads/dtxx')
File "/Users/wjq/wjq_home/hypergbm_venv/lib/python3.7/site-packages/deeptables/models/deeptable.py", line 801, in load
with fs.open(f'{filepath}dt.pkl', 'rb') as input:
File "/Users/wjq/wjq_home/hypergbm_venv/lib/python3.7/site-packages/hypernets/utils/_fsutils.py", line 131, in execute
result = fn(self.to_rpath(rpath), *args, **kwargs)
File "/Applications/anaconda3/lib/python3.7/site-packages/fsspec/spec.py", line 1043, in open
**kwargs,
File "/Applications/anaconda3/lib/python3.7/site-packages/fsspec/implementations/local.py", line 159, in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
File "/Applications/anaconda3/lib/python3.7/site-packages/fsspec/implementations/local.py", line 254, in init
self._open()
File "/Applications/anaconda3/lib/python3.7/site-packages/fsspec/implementations/local.py", line 259, in _open
self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/g9/nwf7rmdd3nlfszr22lfg4lhc0000gn/T/workdir/Users/wjq/Downloads/dtxx/dt.pkl'

some bugs in class DefaultPreprocessor

Here, class DefaultPreprocessor, fit_transform method,
{if copy: # TODO bug here
X = copy.deepcopy(X)
y = copy.deepcopy(y)} . the copy may be copy_data.

DeepTables with Tensorflow 2.2

Hi,

Would it be possible to use DeepTables with TF v2.2?
If so could you please update requirements and pip distribution.

thanks!

DataConversionWarning from example code

Hi, first of all thanks for this library. I am getting a DataConversionWarning from the example in readme:

sklearn/utils/validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return f(**kwargs)

on the line preds = dt.predict(df_test). I tried to call it with df_test.values.ravel() which resulted in a
ValueError: Number of features of the input must be equal to or greater than that of the fitted transformer. Transformer n_features is 17 and input n_features is 1.

collect problems

如何设置GPU的单机多卡

在百度平台上使用deeptables GPU版本时，发现GPU利用率比较低
请问如何配置单机多卡？

Do you have a benchmark for all models?

Hi,
do you have a benchmark for all models?

Deepfm 输出概率问题

deepfm输出为两个维度的数据，需要自己套用逻辑回归算出最终概率吗？
如何根据输出的概率选取TOPN进行召回推荐？谢谢

deeptables算法的采样策略，GPU和CPU有什么不同

CPU和GPU两个版本使用的相同的70W训练数据集，

训练过程中发现日志中使用的训练数据量不一致
cpu使用了5488个样本
gpu使用702439个样本

Adding tensorboard

This template is for miscellaneous issues not covered by the other issue categories.

Hi, this could be a feature request because I can't find a way to add a tensorboard callback to model. I need to profile a deeptables model for instrumentation. I'm using deeptables vr 0.2.3.1

Seems this class has a mechanism to inject only early stopping callbacks. Can you please suggest a way I can use tensorboard or add that as a feature in future release?

Model name always changes when saved

Is it possible to save the model with a static name? Right now when I save a model multiple times, after every training the model files keep changing. There is the dt.pkl file and another .h5 file. The latter changes its name depending on the ordering of the layers that are in the model. For example, one model might be named dnn_nets+linear.h5 and after the second training it's called linear+dnn_nets.h5. I would love an option where I can set the model name myself.

some bugs in CategoricalFocalLoss function

super(BinaryFocalLoss, self).init(reduction=reduction, name=name)
the BinaryFocalLoss may be CategoricalFocalLoss

SHAP values with DeepTables

I want to use SHAP values (https://github.com/slundberg/shap) to get feature importances. I thought of using the KernelExplainer. The problem that I encouter is that the embeddings of categorical variables are done on the fly. But I can only pass the non-embedded test set. How can I gain access to the embedded data? Is there a way?

Avoid early stopping while fitting

Here is my code

config = deeptable.ModelConfig(nets=deepnets.DeepFM)
dt = deeptable.DeepTable(config=config)
model, history = dt.fit(train, y_train,callbacks=None, epochs=200)

I don't want to have early stopping, what should I do?

This might help

2 class detected, {0, 1}, so inferred as a [binary classification] task
Preparing features taken 30.83131170272827s
Imputation taken 18.713809967041016s
Categorical encoding taken 0.0s
Injected a callback [EarlyStopping]. monitor:val_accuracy, patience:1, mode:max
1 Physical GPUs, 1 Logical GPUs

Load saved models doesn't find MultiColumnEmbedding layer

I tried loading a saved model but it can't load it and instead gives an error:

from tensorflow.keras.models import load_model mod_path = 'model_1.h5' load_model(mod_path)

The error:

ValueError: Unknown layer: MultiColumnEmbedding

How to output the loss after/during training?

Enable custom cv iterator

Hi, it appears that a train/test split is used as cross validation. I would like to see DT have a customer cv iterator for more CV folds.

System information

DeepTables version (you are using): 0.2.3.1
Are you willing to contribute it (Yes/No): Yes

Thank you for your consideration

X_val, y_val = validation_data(0), validation_data(1)

On deeptable.py there is a line:

X_val, y_val = validation_data(0), validation_data(1)

which should instead be:

X_val, y_val = validation_data[0], validation_data[1]

or when you fit a model with a validation sample, the API won't accept as input any list or tuple of validation data and target.

dt.predict failed on multi-GPU mode

When ModelConfig.distribution_strategy is turned on, the trained model predict will report an error

“ValueError: predict is not supported in multi-worker mode”

Please unpin dependencies

Please unpin python dependencies from pkg==ver to the usual pkg=>ver.

For instance the pinning of catboost in requirements.txt to an obsolete version 0.20.2 causes pip to downgrade this package after installing deeptables and our CICD scripts concludes that catboost needs an upgrade, leading to unnecessary rebuild of our python containers during every version check, i.e. every hour.

16:35:32  catboost :
16:35:32  - version installed:  0.20.2
16:35:32  - latest  available:  0.24.3
16:35:32  - package upgradeable:  True

16:44:20  Collecting catboost
16:44:20    Downloading catboost-0.24.3-cp38-none-manylinux1_x86_64.whl (66.2 MB)
[...]
16:44:25  Collecting deeptables
16:44:25    Downloading deeptables-0.1.12-py3-none-any.whl (2.2 MB)
16:44:25  Requirement already satisfied: lightgbm in /opt/conda/lib/python3.8/site-packages (from causalml->-r /tmp/python-packages/pypi-packages.txt (line 12)) (3.0.0)
16:44:25  Collecting catboost
16:44:25    Downloading catboost-0.20.2-cp38-none-manylinux1_x86_64.whl (63.9 MB)
1

As you see from the log above, the other GBDT dependency, lightgbm, does not cause this issue (even though it is pinned to a previous version 3.0.0 on our side), because it is not pinned in your requirements file here.

DeepFM example: AttributeError: module 'sklearn.utils' has no attribute '_deprecate_positional_args'

Hey,

got this error:

AttributeError: module 'sklearn.utils' has no attribute '_deprecate_positional_args'
I believe it is due to the new version of sckiti-learn.

complete the unit test

模型保存不了

from deeptables.models import deeptable

dt = deeptable.DeepTable()
dt.fit(x,y)
dt.save('/Users/wjq/Downloads/dt')

当我使用dt.save('/Users/wjq/Downloads/dt')保存模型时程序不报错但发现'/Users/wjq/Downloads/dt'路径下为空，模型并没有被保存入相应路径

请问dt模型的保存方法是什么

deeptable版本为0.2.5

版本介绍

你好，请问有版本介绍吗？stable和lasted分别指哪个版本呀？官方推荐使用哪个版本的deeptables?

How could I add a custom metric?

I would be interested in adding a custom metric. Could you give me some hints on how to do that? Thanks!

Saved model unable to be loaded in docker

I saved a model running in a conda environment and when I load it to run in a docker image using the same version of tensorflow, deeptables, scikit learn, pandas I get the following error regarding the configs:

 File "/usr/local/lib/python3.7/site-packages/deeptables/models/deeptable.py", line 336, in fit
    X, y = self.preprocessor.fit_transform(X, y)
  File "/usr/local/lib/python3.7/site-packages/deeptables/models/preprocessor.py", line 127, in fit_transform
    X = self.__prepare_features(X)
  File "/usr/local/lib/python3.7/site-packages/deeptables/models/preprocessor.py", line 200, in __prepare_features
    raise ValueError(f'"cat_expoent" must be less than 1, not {self.config.cat_exponent} .')
ValueError: "cat_expoent" must be less than 1, not True .

I made sure the same version of deeptables is running on both platforms, yet I still get this error. It seems like from the config.py the values get mixed up somehow.... but I don't know how.

How to get class assignment for columns for predict_proba?

predict_proba returns probabilities for corresponding classes and user needs to know the class correspondence.
In other frameworks this information is accessible through .classes_ instance variable of estimator.
Is there anything like this for DeepTables models? I am currently working with DCN.

Regards.

模型可以聚类吗

作者您好，看到官网例子上有二分类、多分类和回归，请问有聚类吗?怎么用呢？具体需要设置哪些参数呢?

datacanvasio / deeptables Goto Github PK

deeptables's Introduction

DeepTables

We Are Hiring！

DeepTables: Deep-learning Toolkit for Tabular data

Overview

Tutorials

Installation

Optional dependencies

Example：

A simple binary classification example

A solution using DeepTables to win the 1st place in Kaggle Categorical Feature Encoding Challenge II

Citation

DataCanvas

deeptables's People

Contributors

Stargazers

Watchers

Forkers

deeptables's Issues

Can't save model

Recommend Projects

Recommend Topics

Recommend Org

Jobs