GithubHelp home page GithubHelp logo

Comments (3)

felixbiessmann avatar felixbiessmann commented on June 3, 2024

Looks like this just a warning, not an error, the code runs through and returns a dataframe, right?

It looks like there are some values in some column that are very rare. For those classes it's difficult to make high precision imputations.

To avoid low precision imputations, I'd recommend to set the precision_threshold argument to some higher value than 0.0, like for instance 0.8 when calling complete. With a threshold of 0.8, you could expect a precision of 0.8 for the imputed values.

Values that are still missing then cannot be imputed with high enough precision.

Closing this for now, feel free to reopen if more problems come up.

from datawig.

SAMNaqvi1212 avatar SAMNaqvi1212 commented on June 3, 2024

I hope this message finds you well. I have been trying to impute missing values in my dataset using datawig library. However when I use datawig library to impute the missing values in my dataset. It imputes each and every other column while leaving behind two columns. Both of the columns are of dtype: object. However, it imputes other object columns. I had tried your recommendation by increasing the precision_threshold = 0.80 which also did not do any good. Any recommendation of making it better. Here is the code along with the visualization of my dataset:
df.tail(155).
Capture

The code to impute the missing values is as follows:
import datawig
df = datawig.SimpleImputer.complete(df, precision_threshold=0.80)

df.isnull().sum()
PassengerId       0
HomePlanet        0
CryoSleep         0
Cabin           199
Destination       0
Age               0
VIP               0
RoomService       0
FoodCourt         0
ShoppingMall      0
Spa               0
VRDeck            0
Name            200
Transported       0
dtype: int64

The missing values for the column named Cabin and Name were left and were not imputed for I do not know what reason. Also before applying datawig imputation the number of missing values in Name and Cabin column were the same. Any kind help would be appreciated Thanks!!!!

from datawig.

ioakeim-h avatar ioakeim-h commented on June 3, 2024

I have exactly the same problem. Installed datawig in my conda environment with python 3.7 (because higher versions result to problems with mxnet). I downgraded numpy because I got an error after installation:
ERROR: mxnet 1.4.0 has requirement numpy<1.15.0,>=1.8.2, but you'll have numpy 1.17.2 which is incompatible.

Next, I tried to impute 3 columns from the titanic dataset using
datawig.SimpleImputer.complete(df, precision_threshold = 0.8, inplace=True)

image

Got a value error:
ValueError: fill value must be in categories

So I forced all columns to string type and then converted "nan" values to np.nan. Then I ran again and only "Embarked" was imputed:

image

I repeated the same steps with precision_threshold = 0.1 and in Colab with the same result.

Is this how datawig should work or am I missing something?

from datawig.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.