GithubHelp home page GithubHelp logo

Comments (4)

ExpandingMan avatar ExpandingMan commented on June 3, 2024

I agree this is problematic as it seems like column names should be respected. It's not quite obvious what would be the best way to fix this since the libxgboost objects don't store that kind of metadata, though Booster now stores feature names.

I agree that this should "just work" or, at worst, throw an error.

from xgboost.jl.

bobaronoff avatar bobaronoff commented on June 3, 2024

just a thought .....

Looking at booster.jl and dmatrix.jl, it appears that the feature_name property of the Booster object is set by calling getfeaturenames(dm) which calls getfeatureinfo(dm) which calls libxgboost. It looks to me that the DMatrix does store the data column names. Perhaps XGBoost.predict could compare the booster feature_names property to the getfeaturenames on the DMatrix object. If the names are out of order or some are missing then an error message can be thrown. I sure wish libxgboost had an api manual :-)

I apologize for my sloppy code but this is the gist of what I am thinking.

fnames1=booster.feature_names
fnames2=getfeaturenames(dm)
if length(fnames1)!=length(fnames2) || sum(fname1 .!= fnames2)>0
    error("prediction data column names and/or order does not match booster")
end

from xgboost.jl.

bobaronoff avatar bobaronoff commented on June 3, 2024

The above suggestion is for scenarios where XGBoost.predict is called with a DMatrix object. This can only throw an exception but not fix the problem. 'predict' can only fix the problem if it is called with (booster, df::DataFrame).

possible code might include something like ...

xnames=setdiff(booster.feature_names,names(df))
if length(xnames)>0
   msg="prediction data missing columns: " * join(xnames,",")
   error(msg)
end
if length(booster.feature_names)==length(names(df)) && sum(booster.feature_names .!= names(df))==0
   dm=DMatrix(df)
   XGBoost.predict(booster,dm)
else
   df2=copy(df)
   select!(df2,Symbol.(xnames))
   dm=DMatrix(df2)
   XGBoost.predict(booster,dm)
end

The downside is the need to make a copy of the DataFrame so that XGBoost.predict does not alter the original. However, more than likely the user will need to make copy unless they are fine with altering the original DataFrame. One way or another, columns need to coincide in order to score.

from xgboost.jl.

bobaronoff avatar bobaronoff commented on June 3, 2024

need to correct a typo in above code (see fourth line from bottom)::

xnames=setdiff(booster.feature_names,names(df))
if length(xnames)>0
   msg="prediction data missing columns: " * join(xnames,",")
   error(msg)
end
if length(booster.feature_names)==length(names(df)) && sum(booster.feature_names .!= names(df))==0
   dm=DMatrix(df)
   XGBoost.predict(booster,dm)
else
   df2=copy(df)
   select!(df2,Symbol.(booster.feature_names))
   dm=DMatrix(df2)
   XGBoost.predict(booster,dm)
end

from xgboost.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.