Am looking to 'modernize' my approach and switch from partial dependence plots to Shap

I don't see any additional options that we can pass to <a href="https://xgboost.readth

I just found the following at <a href="https://xgboost.readthedocs.io/en/stable/c.html

Here is the proper link: <a href="https://xgboost.readthedocs.io/en/stable/c.html#grou

am attempting a version of predict that allows for differing <code class="notranslate"

TreeSHAP, libxgboost, and implications for predict function about xgboost.jl HOT 10 OPEN

bobaronoff commented on June 12, 2024

TreeSHAP, libxgboost, and implications for predict function

from xgboost.jl.

Comments (10)

ExpandingMan commented on June 12, 2024 1

Btw, a really quick and minimal effort way of getting this working which I would be happy to merge is if we just added a type keyword arg when, if not nothing overrides all other options. We'd have to make any future keyword args campatible with it but I don't think that would be hard.

from xgboost.jl.

bobaronoff commented on June 12, 2024 1

Perhaps adding the 'type' keyword is the best approach. It seems most flexible particularly if more options are added in the future. I am willing to give a try at a PR (it would be my first ever), but it would need to be heavily edited. My programming skills are no where near yours. I am most concerned on how best to handle all the return shape configurations this creates.

In R, there is a way to specify a list of parameter options. I see julia packages that do this (Plots.jl comes to mind) but I don't know how to code this.

from xgboost.jl.

ExpandingMan commented on June 12, 2024

I don't see any additional options that we can pass to XGBoosterPredict...

To be clear, the parameters we already have in that opts dict are the only ones I see documented.

I'm also not seeing any reference anywhere to TreeSHAP, can you show specifically how this would be called?

from xgboost.jl.

bobaronoff commented on June 12, 2024

I just found the following at XGBoost C Package

Make prediction from DMatrix, replacing [XGBoosterPredict](https://xgboost.readthedocs.io/en/stable/c.html#group__Prediction_1ga3e4d11089d266ae4f913ab43864c6b12).

“type”: [0, 6]

0: normal prediction
1: output margin
2: predict contribution
3: predict approximated contribution
4: predict feature interaction
5: predict approximated feature interaction
6: predict leaf “training”: bool Whether the prediction function is used as part of a training loop. Not used for inplace prediction.

Looking over I think the parameters I saw reflect how they are named in the Python package, but according to the location I referenced they are implemented through the 'type' parameter which is not true/false but rather 0-6.

I apologize if I have this incorrect.

from xgboost.jl.

bobaronoff commented on June 12, 2024

Here is the proper link: XGBoosterPredict

from xgboost.jl.

ExpandingMan commented on June 12, 2024

Ah, I was looking at the wrong one, indeed we are using PredictFromDMatrix. I think that also the documentation is out of sync, maybe I should have been looking at this page instead of the one I linked.

I assume you are interested in additional values for type? Should be easy enough, though we'll have to think about what the options would look like on the Julia side since the type integer by itself is pretty opaque. Looks like currently the only option for type I handle is margin.

I'll probably get to this eventually. Of course, a PR would be welcome.

from xgboost.jl.

bobaronoff commented on June 12, 2024

am attempting a version of predict that allows for differing type values {0,6}. I am able to get the differing returns from libxgboost but getting confused on how to process results in to proper Julia array.

Here are 3 lines in current routine that I think I understand but not certain.

dims = reverse(unsafe_wrap(Array, oshape[], odim[]))
o = unsafe_wrap(Array, o[], tuple(dims...))
length(dims) > 1 ? transpose(o) : o

In seems that the 'reverse' function will effect a reshape when the unsafe_wrap converts the c array to a Julia Array. The last line affects a transpose if dims are >1. I understand this is in 2 dimensions (and it completes the conversion from row major to column major). I am not familiar how transpose works and what would happen if applied to a 3 dimensional array as might come from type=4 i.e., interaction (or type=2 i.e., contribution, in a multi: model).

Any thoughts would be greatly appreciated.

from xgboost.jl.

ExpandingMan commented on June 12, 2024

These lines are merely for adapting libxgboost's internal memory format (in which it returns) to the memory format of Julia arrays (in particular, the former is row-major and the latter is column-major). If the other type returns are implemented correctly, it should return the array metadata in exactly the same way as it does for type=0. Therefore, I don't think any of these lines should be touched at all.

from xgboost.jl.

bobaronoff commented on June 12, 2024

I must not be conveying the issue correctly. Here is my understanding and working with my data bears out that understanding. unsafe_wrap takes the C pointer and uses it to specify a Julia object stored at that pointer with the array dimensions supplied. It does nothing to remap the data in memory from row major to column major. For a two dimensional array if one reshapes by reversing the dimensions and transposing, the indices will map to the proper locations in memory. Theoretically this works for 3, 4, or any dimensional array. However, transpose is only designed for a 2 dimensional array. It throws an error if you try to use it for a 3 dimensional array.

libxgboost returns 3 dimensional arrays for type 4 and 5 ALWAYS and for type 2 and 3 when the objective is multi:softprob/multi:softmax. The current format (i.e. ,transpose) will fail every time for type 4 and 5 and sometimes (i.e., multi: objectives) for type 2 and 3. I have confirmed this on my data sets!!

Rather than modify a function to create situations that would fail, I think it better to leave the current XGBoost.predict() as it is and create a new function ( perhaps XGBoost.predictbytype()) that includes permutedims to handle all contingencies. The only reason to specify type is for the Shapley values which is a one time call and the reallocation cost would be less impactful and known to the user upfront.

I will change the function name. Since I am proposing a new function there is no need for backward compatibility and keeping margin is redundant. It will take me a bit to figure out how to roll back my fork so the current XGBoost.predict() remains untouched.

from xgboost.jl.

ExpandingMan commented on June 12, 2024

I'm a bit confused... why not just check if dims == 2 in the existing predict function? That way you can know whether transpose works or you have to do permutedims?

I'm not necessarily opposed to adding a new, lower-level function, that might have some advantages. However, the only think I can think of stopping us from just returning whatever is the appropriate array here is type stability and, again, that's already pretty compromised so I'm not sure it makes sense to try to keep it narrowed down.

from xgboost.jl.

TreeSHAP, libxgboost, and implications for predict function about xgboost.jl HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs