Comments (3)
Looks like this just a warning, not an error, the code runs through and returns a dataframe, right?
It looks like there are some values in some column that are very rare. For those classes it's difficult to make high precision imputations.
To avoid low precision imputations, I'd recommend to set the precision_threshold argument to some higher value than 0.0, like for instance 0.8 when calling complete
. With a threshold of 0.8, you could expect a precision of 0.8 for the imputed values.
Values that are still missing then cannot be imputed with high enough precision.
Closing this for now, feel free to reopen if more problems come up.
from datawig.
I hope this message finds you well. I have been trying to impute missing values in my dataset using datawig library. However when I use datawig library to impute the missing values in my dataset. It imputes each and every other column while leaving behind two columns. Both of the columns are of dtype: object. However, it imputes other object columns. I had tried your recommendation by increasing the precision_threshold = 0.80 which also did not do any good. Any recommendation of making it better. Here is the code along with the visualization of my dataset:
df.tail(155).
The code to impute the missing values is as follows:
import datawig
df = datawig.SimpleImputer.complete(df, precision_threshold=0.80)
df.isnull().sum()
PassengerId 0
HomePlanet 0
CryoSleep 0
Cabin 199
Destination 0
Age 0
VIP 0
RoomService 0
FoodCourt 0
ShoppingMall 0
Spa 0
VRDeck 0
Name 200
Transported 0
dtype: int64
The missing values for the column named Cabin and Name were left and were not imputed for I do not know what reason. Also before applying datawig imputation the number of missing values in Name and Cabin column were the same. Any kind help would be appreciated Thanks!!!!
from datawig.
I have exactly the same problem. Installed datawig in my conda environment with python 3.7 (because higher versions result to problems with mxnet). I downgraded numpy because I got an error after installation:
ERROR: mxnet 1.4.0 has requirement numpy<1.15.0,>=1.8.2, but you'll have numpy 1.17.2 which is incompatible.
Next, I tried to impute 3 columns from the titanic dataset using
datawig.SimpleImputer.complete(df, precision_threshold = 0.8, inplace=True)
Got a value error:
ValueError: fill value must be in categories
So I forced all columns to string type and then converted "nan" values to np.nan. Then I ran again and only "Embarked" was imputed:
I repeated the same steps with precision_threshold = 0.1 and in Colab with the same result.
Is this how datawig should work or am I missing something?
from datawig.
Related Issues (20)
- ValueError: fill value must be in categories HOT 4
- ValueError: cannot convert float NaN to integer HOT 3
- Question: When assigning a numeric variable
- about application on categorical and numerical data HOT 8
- How can I install Datawig? HOT 6
- Update your dependencies HOT 2
- Can we use any other Machine Learning or deep learning model of our choice in datawig?
- datawig.SimpleImputer.complete is not imputing any columns HOT 2
- ValueError: Cannot setitem on a Categorical with a new category, set the categories first
- Is it suitable for survival data? HOT 4
- Run on GPU
- AttributeError: 'Index' object has no attribute 'contains' while using the predict method HOT 5
- I hope this message finds you well. I have been trying to impute missing values in my dataset using datawig library. However when I use datawig library to impute the missing values in my dataset. It imputes each and every other column while leaving behind two columns. Both of the columns are of dtype: object. However, it imputes other object columns. I had tried your recommendation by increasing the precision_threshold = 0.80 which also did not do any good. Any recommendation of making it better. Here is the code along with the visualization of my dataset:
- Installation Error HOT 5
- Install on mac m1
- AttributeError: module 'numpy' has no attribute 'int'
- installation on Python 3.10
- Any alternative to datawig when you are using Python 3.10+ ?
- Is the repo literally dead since I did not see any update or maintenance since last spring?
- Installation error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datawig.