Comments (7)
COuld you please provide data sample to reproduce the problem?
from gan-for-tabular-data.
Similar issue. Dataset attached.
from gan-for-tabular-data.
The problem can be reproduced using this example df:
num_samples = 2500
df_random = pd.DataFrame({"A": 100+100* np.random.rand(num_samples), "B": 100* np.random.rand(num_samples) , "C": 10np.random.rand(num_samples),
"D": 500 np.random.rand(num_samples), "E": 200* np.random.rand(num_samples), "F": 50* np.random.rand(num_samples) })
from gan-for-tabular-data.
Hi, I could solve my problem by normalizing the data before forwarding them to the GAN generator.
However, it also doesn't work with the train-test-split from scikit learn. I seperated the data set manually.
from gan-for-tabular-data.
data = pd.read_excel(path_data)
start_index = data.columns.get_loc("start_column") end_index = data.columns.get_loc("end_column") columns_between = data.columns[start_index:end_index]
df = data[columns_between] df = df.dropna() train, test = train_test_split(df, test_size=0.2, random_state=42) target = pd.DataFrame({'Y': [1.0] * train.shape[0]}) #as every line in the dataset is not generated, I suppose I just make a target df with ones only
new_train3, new_target3 = GANGenerator(gen_x_times=1.1, cat_cols=None, bot_filter_quantile=0.001, top_filter_quantile=0.999, is_post_process=True, adversarial_model_params={ "metrics": "AUC", "max_depth": 2, "max_bin": 100, "learning_rate": 0.02, "random_state": 42, "n_estimators": 500, }, pregeneration_frac=2, only_generated_data=False, gan_params = {"batch_size": 500, "patience": 25, "epochs" : 500,}).generate_data_pipe(train, target, test, deep_copy=True, only_adversarial=False, use_adversarial=True)
That code work without any problem:
import numpy as np
num_samples = 2500
df_random = pd.DataFrame(
{"A": 100 + 100 * np.random.rand(num_samples),
"B": 100 * np.random.rand(num_samples),
"C": 10 * np.random.rand(num_samples),
"D": 500 * np.random.rand(num_samples),
"E": 200 * np.random.rand(num_samples),
"F": 50 * np.random.rand(
num_samples)})
_sampler(
GANGenerator(gen_x_times=10, only_generated_data=False,
gan_params={"batch_size": 500, "patience": 25, "epochs" : 500,}), df_random, None,None
)
from gan-for-tabular-data.
Anyway example with excel really generate the error, I will fix it. Thank you for pointing out!
PS You may pass target as none without any problem)
from gan-for-tabular-data.
he problem is with indexes in target you should make like in train or drop it
new_train3, new_target3 = GANGenerator(gen_x_times=1.1, cat_cols=None,
bot_filter_quantile=0.001, top_filter_quantile=0.999, is_post_process=True,
adversarial_model_params={
"metrics": "AUC", "max_depth": 2, "max_bin": 100,
"learning_rate": 0.02, "random_state": 42, "n_estimators": 500,
}, pregeneration_frac=2, only_generated_data=False,
gan_params={"batch_size": 500, "patience": 25,
"epochs": 500, }).generate_data_pipe(train.reset_index(drop=True), target,
test, deep_copy=True,
only_adversarial=False,
use_adversarial=True)
from gan-for-tabular-data.
Related Issues (20)
- is it ok for regression type task? HOT 1
- generated Cov is not that close HOT 2
- all sample codes not working till epoch end HOT 1
- second args in generate_data_pipe cannot be left None HOT 2
- TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType' HOT 1
- training CTGAN stops in the middle (around 24%) HOT 2
- Difference between OriginalGenerator and GANGenerator HOT 1
- Getting this error when trying to install load HOT 2
- check HOT 1
- ContextualVersionConflict: (scikit-learn 1.0.2 (/usr/local/lib/python3.7/dist-packages), Requirement.parse('scikit-learn==0.23.2'), {'tabgan'}) HOT 3
- Dear Author, May I know the ctgan version for the installation? I am getting error. from ctgan import _CTGANSynthesizer ImportError: cannot import name '_CTGANSynthesizer' HOT 4
- pip install scikit-learn version issue HOT 3
- Mistake in Readme HOT 2
- Some issues araised when running Tab-GAN: 1) Manage Categorical Variables. 2) Batch size problem HOT 8
- Reproducibility issue HOT 1
- IntCastingNaNError Despite No NaN values HOT 3
- LGBMClassifier.fit() got an unexpected keyword argument 'early_stopping_rounds' HOT 2
- Dependency issue with ForestDiffusion Generator HOT 3
- TypeError w/ Boolean Data HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gan-for-tabular-data.