Comments (4)
I agree this is problematic as it seems like column names should be respected. It's not quite obvious what would be the best way to fix this since the libxgboost objects don't store that kind of metadata, though Booster
now stores feature names.
I agree that this should "just work" or, at worst, throw an error.
from xgboost.jl.
just a thought .....
Looking at booster.jl and dmatrix.jl, it appears that the feature_name property of the Booster object is set by calling getfeaturenames(dm) which calls getfeatureinfo(dm) which calls libxgboost. It looks to me that the DMatrix does store the data column names. Perhaps XGBoost.predict could compare the booster feature_names property to the getfeaturenames on the DMatrix object. If the names are out of order or some are missing then an error message can be thrown. I sure wish libxgboost had an api manual :-)
I apologize for my sloppy code but this is the gist of what I am thinking.
fnames1=booster.feature_names
fnames2=getfeaturenames(dm)
if length(fnames1)!=length(fnames2) || sum(fname1 .!= fnames2)>0
error("prediction data column names and/or order does not match booster")
end
from xgboost.jl.
The above suggestion is for scenarios where XGBoost.predict is called with a DMatrix object. This can only throw an exception but not fix the problem. 'predict' can only fix the problem if it is called with (booster, df::DataFrame).
possible code might include something like ...
xnames=setdiff(booster.feature_names,names(df))
if length(xnames)>0
msg="prediction data missing columns: " * join(xnames,",")
error(msg)
end
if length(booster.feature_names)==length(names(df)) && sum(booster.feature_names .!= names(df))==0
dm=DMatrix(df)
XGBoost.predict(booster,dm)
else
df2=copy(df)
select!(df2,Symbol.(xnames))
dm=DMatrix(df2)
XGBoost.predict(booster,dm)
end
The downside is the need to make a copy of the DataFrame so that XGBoost.predict does not alter the original. However, more than likely the user will need to make copy unless they are fine with altering the original DataFrame. One way or another, columns need to coincide in order to score.
from xgboost.jl.
need to correct a typo in above code (see fourth line from bottom)::
xnames=setdiff(booster.feature_names,names(df))
if length(xnames)>0
msg="prediction data missing columns: " * join(xnames,",")
error(msg)
end
if length(booster.feature_names)==length(names(df)) && sum(booster.feature_names .!= names(df))==0
dm=DMatrix(df)
XGBoost.predict(booster,dm)
else
df2=copy(df)
select!(df2,Symbol.(booster.feature_names))
dm=DMatrix(df2)
XGBoost.predict(booster,dm)
end
from xgboost.jl.
Related Issues (20)
- Huge import latency caused by `Term`, `GPUArrays`, and `CUDA` HOT 17
- Package now requires a Julia built with GPL libraries enabled HOT 2
- early_stopping_rounds? HOT 18
- what is the role of sparse `DMatrix` constructors? HOT 24
- Can not reduce verbosity HOT 4
- Classification: Support multiple metrics HOT 9
- XGBoost rewrite on Julia using Metal.jl HOT 8
- question on XGBoost_jll HOT 3
- TreeSHAP, libxgboost, and implications for predict function HOT 10
- Update term.jl dependency HOT 3
- Add GPU tests with buildkite
- Quick question on Custom objective HOT 3
- xgboost failing with julia 1.8.5 HOT 22
- issue finding libomp popped up after updating environment HOT 1
- xgboost fails to find libxgboost HOT 13
- Watchlist error HOT 1
- `predict` overwrites previously returned predictions HOT 2
- Saving and Loading Boosters HOT 13
- Precompile fails on Julia 1.10 HOT 5
- Saving and reloading boosters using IOBuffer HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xgboost.jl.