I am trying to impute numeric values from one specific column (it's called 'Comercializadora_encoded', and it is now a numeric column because I previously encoded the original object-type column with LabelEncoder() from sklearn).
2020-11-30 09:57:37,860 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 47 occurrences of value 16.0
2020-11-30 09:57:37,860 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 40 occurrences of value 7.0
2020-11-30 09:57:37,860 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 27 occurrences of value 44.0
2020-11-30 09:57:37,865 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 23 occurrences of value 66.0
2020-11-30 09:57:37,866 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 19 occurrences of value 29.0
2020-11-30 09:57:37,868 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 18 occurrences of value 28.0
2020-11-30 09:57:37,869 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 17 occurrences of value 56.0
2020-11-30 09:57:37,870 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 17 occurrences of value 21.0
2020-11-30 09:57:37,871 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 16 occurrences of value 81.0
2020-11-30 09:57:37,872 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 16 occurrences of value 34.0
2020-11-30 09:57:37,873 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 16 occurrences of value 74.0
2020-11-30 09:57:37,874 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 13 occurrences of value 43.0
2020-11-30 09:57:37,875 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 12 occurrences of value 1.0
2020-11-30 09:57:37,876 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 9 occurrences of value 52.0
2020-11-30 09:57:37,877 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 9 occurrences of value 38.0
2020-11-30 09:57:37,878 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 9 occurrences of value 9.0
2020-11-30 09:57:37,880 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 8 occurrences of value 12.0
2020-11-30 09:57:37,881 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 8 occurrences of value 25.0
2020-11-30 09:57:37,882 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 7 occurrences of value 69.0
2020-11-30 09:57:37,884 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 7 occurrences of value 79.0
2020-11-30 09:57:37,885 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 7 occurrences of value 63.0
2020-11-30 09:57:37,886 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 7 occurrences of value 6.0
2020-11-30 09:57:37,887 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 7 occurrences of value 76.0
2020-11-30 09:57:37,888 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 6 occurrences of value 67.0
2020-11-30 09:57:37,888 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 6 occurrences of value 54.0
2020-11-30 09:57:37,889 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 5 occurrences of value 26.0
2020-11-30 09:57:37,890 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 5 occurrences of value 20.0
2020-11-30 09:57:37,890 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 5 occurrences of value 48.0
2020-11-30 09:57:37,891 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 5 occurrences of value 49.0
2020-11-30 09:57:37,892 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 5 occurrences of value 10.0
2020-11-30 09:57:37,893 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 4 occurrences of value 23.0
2020-11-30 09:57:37,894 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 4 occurrences of value 53.0
2020-11-30 09:57:37,896 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 4 occurrences of value 5.0
2020-11-30 09:57:37,897 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 4 occurrences of value 36.0
2020-11-30 09:57:37,899 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 3 occurrences of value 57.0
2020-11-30 09:57:37,900 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 3 occurrences of value 27.0
2020-11-30 09:57:37,902 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 3 occurrences of value 0.0
2020-11-30 09:57:37,903 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 3 occurrences of value 17.0
2020-11-30 09:57:37,904 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 3 occurrences of value 2.0
2020-11-30 09:57:37,906 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 2 occurrences of value 45.0
2020-11-30 09:57:37,907 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 2 occurrences of value 71.0
2020-11-30 09:57:37,908 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 2 occurrences of value 46.0
2020-11-30 09:57:37,909 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 2 occurrences of value 4.0
2020-11-30 09:57:37,910 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 2 occurrences of value 50.0
2020-11-30 09:57:37,911 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 2 occurrences of value 14.0
2020-11-30 09:57:37,912 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 2 occurrences of value 68.0
2020-11-30 09:57:37,913 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 2 occurrences of value 22.0
2020-11-30 09:57:37,914 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 59.0
2020-11-30 09:57:37,916 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 65.0
2020-11-30 09:57:37,917 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 42.0
2020-11-30 09:57:37,919 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 72.0
2020-11-30 09:57:37,920 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 77.0
2020-11-30 09:57:37,921 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 60.0
2020-11-30 09:57:37,922 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 8.0
2020-11-30 09:57:37,923 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 3.0
2020-11-30 09:57:37,924 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 82.0
2020-11-30 09:57:37,925 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 13.0
2020-11-30 09:57:37,926 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 33.0
2020-11-30 09:57:37,927 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 15.0
2020-11-30 09:57:37,928 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 37.0
2020-11-30 09:57:37,930 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 62.0
2020-11-30 09:57:37,931 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 75.0
2020-11-30 09:57:37,932 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 40.0
2020-11-30 09:57:37,933 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 41.0
2020-11-30 09:57:37,934 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 30.0
2020-11-30 09:57:37,935 [INFO] CategoricalEncoder for column Comercializadora_encoded found only 1 occurrences of value 39.0
C:\Users\rcruz\Anaconda3\lib\site-packages\pandas\core\frame.py:3509: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-55-55b90ff782c9> in <module>
10
11 ## Fit an imputer model on the train data
---> 12 imputer.fit(train_df=df_train, num_epochs=50)
13
14 ## Impute missing values and return original dataframe with predictions
~\AppData\Roaming\Python\Python38\site-packages\datawig\simple_imputer.py in fit(self, train_df, test_df, ctx, learning_rate, num_epochs, patience, test_split, weight_decay, batch_size, final_fc_hidden_units, calibrate, class_weights, instance_weights)
384 self.output_path = self.imputer.output_path
385
--> 386 self.imputer = self.imputer.fit(train_df, test_df, ctx, learning_rate, num_epochs, patience,
387 test_split,
388 weight_decay, batch_size,
~\AppData\Roaming\Python\Python38\site-packages\datawig\imputer.py in fit(self, train_df, test_df, ctx, learning_rate, num_epochs, patience, test_split, weight_decay, batch_size, final_fc_hidden_units, calibrate)
261 train_df, test_df = random_split(train_df, [1.0 - test_split, test_split])
262
--> 263 iter_train, iter_test = self.__build_iterators(train_df, test_df, test_split)
264
265 self.__check_data(test_df)
~\AppData\Roaming\Python\Python38\site-packages\datawig\imputer.py in __build_iterators(self, train_df, test_df, test_split)
590
591 logger.debug("Building Train Iterator with {} elements".format(len(train_df)))
--> 592 iter_train = ImputerIterDf(
593 data_frame=train_df,
594 data_columns=self.data_encoders,
~\AppData\Roaming\Python\Python38\site-packages\datawig\iterators.py in __init__(self, data_frame, data_columns, label_columns, batch_size)
221 numerical_columns = [c for c in data_frame.columns if is_numeric_dtype(data_frame[c])]
222 string_columns = list(set(data_frame.columns) - set(numerical_columns))
--> 223 data_frame = data_frame.fillna(value={x: "" for x in string_columns})
224 data_frame = data_frame.fillna(value={x: np.nan for x in numerical_columns})
225
~\Anaconda3\lib\site-packages\pandas\core\frame.py in fillna(self, value, method, axis, inplace, limit, downcast, **kwargs)
4250 **kwargs
4251 ):
-> 4252 return super().fillna(
4253 value=value,
4254 method=method,
~\Anaconda3\lib\site-packages\pandas\core\generic.py in fillna(self, value, method, axis, inplace, limit, downcast)
6272 continue
6273 obj = result[k]
-> 6274 obj.fillna(v, limit=limit, inplace=True, downcast=downcast)
6275 return result if not inplace else None
6276
~\Anaconda3\lib\site-packages\pandas\core\series.py in fillna(self, value, method, axis, inplace, limit, downcast, **kwargs)
4339 **kwargs
4340 ):
-> 4341 return super().fillna(
4342 value=value,
4343 method=method,
~\Anaconda3\lib\site-packages\pandas\core\generic.py in fillna(self, value, method, axis, inplace, limit, downcast)
6255 )
6256
-> 6257 new_data = self._data.fillna(
6258 value=value, limit=limit, inplace=inplace, downcast=downcast
6259 )
~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in fillna(self, **kwargs)
573
574 def fillna(self, **kwargs):
--> 575 return self.apply("fillna", **kwargs)
576
577 def downcast(self, **kwargs):
~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
436 kwargs[k] = obj.reindex(b_items, axis=axis, copy=align_copy)
437
--> 438 applied = getattr(b, f)(**kwargs)
439 result_blocks = _extend_blocks(applied, result_blocks)
440
~\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in fillna(self, value, limit, inplace, downcast)
1950 def fillna(self, value, limit=None, inplace=False, downcast=None):
1951 values = self.values if inplace else self.values.copy()
-> 1952 values = values.fillna(value=value, limit=limit)
1953 return [
1954 self.make_block_same_class(
~\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
206 else:
207 kwargs[new_arg_name] = new_arg_value
--> 208 return func(*args, **kwargs)
209
210 return wrapper
~\Anaconda3\lib\site-packages\pandas\core\arrays\categorical.py in fillna(self, value, method, limit)
1871 elif is_hashable(value):
1872 if not isna(value) and value not in self.categories:
-> 1873 raise ValueError("fill value must be in categories")
1874
1875 mask = codes == -1
ValueError: fill value must be in categories
I've also tried to use categorical columns as input columns, and to convert the output column into a category.
Am I missing something?
Thank you very much.
Regards,
Rubén.