Describe the bug Although I filter all NaN from my df, I get a Va

Similar issue. Dataset attached. <a href="https://github.com/Diyago/

ValueError: Input X contains NaN although NaN filtered about gan-for-tabular-data HOT 7 CLOSED

corinnawegner commented on May 13, 2024 1

ValueError: Input X contains NaN although NaN filtered

from gan-for-tabular-data.

Comments (7)

Diyago commented on May 13, 2024

COuld you please provide data sample to reproduce the problem?

from gan-for-tabular-data.

GenomicGandalf commented on May 13, 2024

Similar issue. Dataset attached.

Nan_Error_Data.xlsx

from gan-for-tabular-data.

corinnawegner commented on May 13, 2024

The problem can be reproduced using this example df:

num_samples = 2500
df_random = pd.DataFrame({"A": 100+100* np.random.rand(num_samples), "B": 100* np.random.rand(num_samples) , "C": 10np.random.rand(num_samples),
"D": 500 np.random.rand(num_samples), "E": 200* np.random.rand(num_samples), "F": 50* np.random.rand(num_samples) })

from gan-for-tabular-data.

corinnawegner commented on May 13, 2024

Hi, I could solve my problem by normalizing the data before forwarding them to the GAN generator.

However, it also doesn't work with the train-test-split from scikit learn. I seperated the data set manually.

from gan-for-tabular-data.

Diyago commented on May 13, 2024

data = pd.read_excel(path_data)

start_index = data.columns.get_loc("start_column") end_index = data.columns.get_loc("end_column") columns_between = data.columns[start_index:end_index]

df = data[columns_between] df = df.dropna() train, test = train_test_split(df, test_size=0.2, random_state=42) target = pd.DataFrame({'Y': [1.0] * train.shape[0]}) #as every line in the dataset is not generated, I suppose I just make a target df with ones only

new_train3, new_target3 = GANGenerator(gen_x_times=1.1, cat_cols=None, bot_filter_quantile=0.001, top_filter_quantile=0.999, is_post_process=True, adversarial_model_params={ "metrics": "AUC", "max_depth": 2, "max_bin": 100, "learning_rate": 0.02, "random_state": 42, "n_estimators": 500, }, pregeneration_frac=2, only_generated_data=False, gan_params = {"batch_size": 500, "patience": 25, "epochs" : 500,}).generate_data_pipe(train, target, test, deep_copy=True, only_adversarial=False, use_adversarial=True)

That code work without any problem:

import numpy as np

num_samples = 2500
df_random = pd.DataFrame(
    {"A": 100 + 100 * np.random.rand(num_samples),
     "B": 100 * np.random.rand(num_samples),
     "C": 10 * np.random.rand(num_samples),
     "D": 500 * np.random.rand(num_samples),
     "E": 200 * np.random.rand(num_samples),
     "F": 50 * np.random.rand(
        num_samples)})
_sampler(
    GANGenerator(gen_x_times=10, only_generated_data=False,
                 gan_params={"batch_size": 500, "patience": 25, "epochs" : 500,}), df_random, None,None
)

from gan-for-tabular-data.

Diyago commented on May 13, 2024

Anyway example with excel really generate the error, I will fix it. Thank you for pointing out!

PS You may pass target as none without any problem)

from gan-for-tabular-data.

Diyago commented on May 13, 2024

he problem is with indexes in target you should make like in train or drop it

new_train3, new_target3 = GANGenerator(gen_x_times=1.1, cat_cols=None,
bot_filter_quantile=0.001, top_filter_quantile=0.999, is_post_process=True,
adversarial_model_params={
"metrics": "AUC", "max_depth": 2, "max_bin": 100,
"learning_rate": 0.02, "random_state": 42, "n_estimators": 500,
}, pregeneration_frac=2, only_generated_data=False,
gan_params={"batch_size": 500, "patience": 25,
"epochs": 500, }).generate_data_pipe(train.reset_index(drop=True), target,
test, deep_copy=True,
only_adversarial=False,
use_adversarial=True)

from gan-for-tabular-data.

ValueError: Input X contains NaN although NaN filtered about gan-for-tabular-data HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs