pbiecek / ema Goto Github PK
View Code? Open in Web Editor NEWExplanatory Model Analysis. Explore, Explain and Examine Predictive Models
Home Page: http://ema.drwhy.ai
Explanatory Model Analysis. Explore, Explain and Examine Predictive Models
Home Page: http://ema.drwhy.ai
In subsection 4.2 there is a comparison of CP and LIME, however, you present LIME 8 chapters later (chapter 12).
Wouldn't it be better to place this comparison in chapter 12?
If someone reads one chapter after another, he would not have to jump between chapters.
or
How about adding a separate chapter to point out similarities and differences between local methods? e.g. CP and LIME, LIME and SHAP, LIME and localModel, etc.
In Chapter 17:
There is a different definition of Ceteris Paribus than in Chapter 6
https://github.com/pbiecek/PM_VEE/blob/c246b72a9725ec4c2250379d36b42983b6ac9183/17-Partial-dependency.Rmd#L49
In the picture, the title of the chart is cut off
https://github.com/pbiecek/PM_VEE/blob/c246b72a9725ec4c2250379d36b42983b6ac9183/17-Partial-dependency.Rmd#L142
The title of the chart is not the same as the content of the plot
https://github.com/pbiecek/PM_VEE/blob/c246b72a9725ec4c2250379d36b42983b6ac9183/17-Partial-dependency.Rmd#L172
How about naming interpretable models "glass-boxes" instead of "white-boxes"?
The interior of the black-box is unknown because of opaque walls. So the opposition would be something that highlights transparency. If the walls of the box are white, you still can't look inside. You can do this if they are made of glass.
I think that it would be good to show variable-importance measures for 2d Ceteris Paribus plots. Especially as 2d vip is described in Section 6.3 Method.
When you show 2D CP profiles for age
sibsp
and parch
you say that "both variables age
and sibsp
are important and influence the model response."
Why did you omit variable parch? When I look at plot age
vs parch
I would say that age
and parch
have the largest mutual influence. Maybe I am wrong with assessing their influence, so VIP plot would be helpful, as I mentioned in 1) 😉
In 1.8 The structure of the book you describe the division of sections for methods into subsections named:
While the subsections in sections 1-7 are:
After 15 before 16
would use ModelOriented/auditor#130
In chapter 1.5 should be
"(though maybe not by every person)"
instead of
"(though maybe not by every pearson)"
ema/02-Model-Development-Process.Rmd
Line 128 in 8435e54
The loading of objects using archivists
fails, for example:
aread("pbiecek/models/58b24")
results in
Error in function (type, msg, asError = TRUE) : error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
I think the cause of the problem is a change in this repository: https://github.com/pbiecek/models
The archived objects seem to have moved down one level into the gallery
map.
However, when I change to code into
archivist::aread("pbiecek/models/gallery/58b24")
I still get an error, albeit another:
Error in downloadDB(remoteHook) : Such a repo: https://raw.githubusercontent.com/pbiecek/models/master/gallery does not exist or there is no archivist-like Repository on this repo.
You've described how to interpret CP for a continuous variable (Figure 4.2).
Moreover, for this instance (observation), the prediction would increase substantially if the value of the explanatory variable became lower than 20.
I think that there should be a similar short interpretation for CP for a discrete variable in Figure 4.3.
Side note:
For continuous variables, a model response is on the y-axis, for discrete on the x-axis. Can this be unified?
When you describe 2d Ceteris Paribus (Section 6.3), you say:
"Such multi-dimensional extensions are useful to check if, for instance, the model involves interactions. "
It would be helpful to explain when we can say that a Ceteris Paribus 2D plot indicates interaction.
How about adding a subsection in every chapter where different scenarios are shown with explanations what the reader should deduce from each plot?
Or at least extensively explain plots presented in "Method" sections in every chapter.
pdp_part_7 : w rozdziale 17.5 chyba podpis wykresu nie zgadza się z legendą i z tym co jest na wykresie?
This sentence in https://github.com/pbiecek/ema/blob/master/09-LIME.Rmd
Thus, we will use the
predict_surrogate()
method from thelocalModel
package.
is not correct. predict_surrogate()
has been moved to the DALEXtra
package.
In section 6.6.2 code for ceteris paribus plots gives 4 plots (for age, fare, sibs, parch).
Figure 6.8 presents only two of them (age and fare).
In section "7.4 Example" there is a plot for variable-importance measures and last paragraph of this section describes the plot. However, the description ("If Henry were older ...") looks more like a description of ceteris paribus profiles, than of oscillations.
1.4 Terminology
This part could address "coefficient" - term heavily used in the following section about black/glass boxes.
2.3 How to work with archivist?
Last sentence could also mention that data for model explainers (for which archivist hooks are used) is described in detail in chapter 4.
Typos:
1.8 notation, first paragraph
In some cases this may result in formulae wth a fairly complex system of indices.
(should be "with")
1.9 The structure, 5th paragraph
Chapter 12 presens a different approach to explanation
(should be "presents")
I am opening the issue here, but I do not know whether it should not be in the repository of the ingredients
package.
I think that the first parameter of the select_neighbours()
function should be an observation
instead of data
.
Now, the convention in ingredients is that the name of the function is somehow related to the first parameter.
plot()
(what?) ceteris_paribus_explainer
print()
(what?) ceteris_paribus_explainer
calculate_oscilations()
(of what?) ceteris_paribus_explainer
and for select_neighbours()
it should be the same
calculate_neighbours()
(of what?) observation
I was wondering why I don't see any difference between my break-down plot for additive attributions and my break-down plot for interactions.
library(DALEX)
library(ranger)
train <- titanic_imputed
set.seed(1234)
titanic_rf <- ranger(survived ~ ., data = train,
classification = TRUE, probability = TRUE)
exp_rf <- explain(titanic_rf,
data = train[, 1:7],
y = train$survived,
label = "Random Forest")
lucy <- data.frame(
class = factor("1st", levels = c("1st", "2nd", "3rd",
"deck crew", "engineering crew",
"restaurant staff", "victualling crew")),
gender = factor("female", levels = c("female", "male")),
age = 18, sibsp = 0, parch = 0, fare = 70,
embarked = factor("Southampton", levels = c("Belfast",
"Cherbourg","Queenstown","Southampton")))
bd_lucy <- predict_parts(exp_rf, lucy, type = "break_down")
plot(bd_lucy)
bd_lucy_order <- predict_parts(exp_rf, lucy, type = "break_down",
order = c("age", "gender", "fare", "class",
"parch", "sibsp", "embarked"))
plot(bd_lucy_order)
bd_lucy_int <- predict_parts(exp_rf, lucy, type = "break_down_interactions")
plot(bd_lucy_int)
bd_lucy
is different from bd_lucy_order
due to model interactions, I assume.
So I expected bd_lucy
to also be different from bd_lucy_int
, but this isn't the case.
What am I overlooking?
Figure 9.5 and 9.8 are not correct because of a bug in localModel
.
When I apply the workaround as described in the issue, I get a LIME explanation for Johnny D's prediction from the random forest model that is much more intuitive and consistent with the other explanations for the same prediction.
I think that all objects presented in subsections "Code snippets for R" should have archivist hooks.
In subsection 1 about Break-down for linear models you denote observation as
Maybe related to Call Center?
Statistics | Machine learning | Notes |
---|---|---|
data point, record, row of data | example, instance | Both domains also use "observation," which can refer to a single measurement or an entire vector of attributes depending on context. |
response variable, dependent variable | label, output | Both domains also use "target." Since practically all variables depend on other variables, the term "dependent variable" is potentially misleading. |
variable, covariate, predictor, independent variable | feature, side information, input | The term "independent variable" exists for historical reasons but is usually misleading--such a variable typically depends on other variables in the model. |
regressions | supervised learners, machines | Both estimate output(s) in terms of input(s). |
estimation | learning | Both translate data into quantitative claims, becoming more accurate as the supply of relevant data increases. |
hypothesis ≠ classifier | hypothesis | In both statistics and ML, a hypothesis is a scientific statement to be scrutinized, such as "The true value of this parameter is zero."In ML (but not in statistics), a hypothesis can also refer to the prediction rule that is output by a classifier algorithm. |
bias ≠ regression intercept | bias | Statistics distinguishes between (a) bias as form of estimation error and (b) the default prediction of a linear model in the special case where all inputs are 0. ML sometimes uses "bias" to refer to both of these concepts, although the best ML researchers certainly understand the difference. |
Maximize the likelihood to estimate model parameters | If your target distribution is discrete (such as in logistic regression), minimize the entropy to derive the best parameters.If your target distribution is continuous, fine, just maximize the likelihood. | For discrete distributions, maximizing the likelihood is equivalent to minimizing the entropy. |
Apply Occam's razor, or encode missing prior information with suitably uninformative priors. | Apply the principle of maximum entropy. | The principle of maximum entropy is conceptual and does not refer to maximizing a concrete objective function. The principle is that models should be conservative in the sense that they be no more confident in the predictions than is thoroughly justified by the data. In practice this works out as deriving an estimation procedure in terms of a bare-minimum set of criteria as exemplified here or here. |
logistic/multinomial regression | maximum entropy, MaxEnt | They are equivalent except in special multinomial settings like ordinal logistic regression. Note that maximum entropy here refers to the principle of maximum entropy, not the form of the objective function. Indeed, in MaxEnt, you minimize rather than maximize the entropy expression. |
X causes Y if surgical (or randomized controlled) manipulations in X are correlated with changes in Y | X causes Y if it doesn't obviously not cause Y. For example, X causes Y if X precedes Y in time (or is at least contemporaneous) | The stats definition is more aligned with common-sense intuition than the ML one proposed here. In fairness, not all ML practitioners are so abusive of causation terminology, and some of the blame belongs with even earlier abuses such as Granger causality. |
structural equations model | Bayesian network | These are nearly equivalent mathematically, although interpretations differ by use case, as discussed. |
sequential experimental design | active learning, reinforcement learning, hyperparameter optimization | Although these four subfields are very different from each other in terms of their standard use cases, they all address problems of optimization via a sequence of queries/experiments. |
XAI - post-hoc explanations
IML -
I am tryign to follow the tutorial on ceteris paribus. Is there a code on how to plot the diagram on Figure 7.1(A) titanic_lmr_v6
https://pbiecek.github.io/PM_VEE/ceterisParibus.html
Subsection 7.3.1 Nearest neighbors
Are there any heuristics related to the number of neighbors?
what is
I thought the in gover distance numerical and categorical variables are treated differently. But I'm not sure.
If the gover distance is a default measure, so what are alternatives? And why is it default?
I really don't like 3d plot A in Figure 4.1. 😉
Currently, z-axis (model response) is nearly useless, I can't read values.
And because of the 3d structure of the plot, I can't see what if class==' 'victualling crew'
or class=='restaurant staff'
.
How would it look like as a heatmap?
Or, as we were talking during the seminary, 3d plod could be interactive or at least a gif. But what about a paper version of the book?
Hi,
this book is super interesting, so I just want to contribute by proofreading.
The text in Figure 7 contains two typos:
the dirstribution and the average of the predictions when fixing values of subseqeunt explanatory
Please correct:
In section 7.5 (pros) it is mentioned that "The methodology can easily be extended to two or more variables." Could this idea be described in a bit more detail?
Plot produces warnings. From what I see, "only_numerical" is not a plot.ceteris_paribus_explainer parameter anymore.
maybe it could be changed to:
plot(cp_titanic_rf, variables = c("class", "embarked"), variable_type = "categorical") +
+ ggtitle("Ceteris Paribus Profiles", "For the random forest model and the Titanic dataset")
Also here is reference to only_numerical as plot parameter:
https://github.com/pbiecek/PM_VEE/blob/76c1580556f90b207b6f2d96fc9b7110fb0263bc/06-Ceteris-paribus-Profiles.Rmd#L169
Second thing:
https://github.com/pbiecek/PM_VEE/blob/76c1580556f90b207b6f2d96fc9b7110fb0263bc/06-Ceteris-paribus-Profiles.Rmd#L57
repeats
https://github.com/pbiecek/PM_VEE/blob/76c1580556f90b207b6f2d96fc9b7110fb0263bc/06-Ceteris-paribus-Profiles.Rmd#L59
Link to chapter 4 in table of contents seems to be broken.
Is:
.../dataSetsIntro.html
Should be:
.../DataSetsIntro.html
https://pbiecek.github.io/ema/residualDiagnostic.html links to the residualDiagnostics
topic, which might not be intended
In line 90 of 01-Introduction.Rmd there is a lack of space so the HTML output is:
"For example:\index{Model-agnostic approach}"
In Subsection 7.3.1 "Nearest neighbors" you mention R function select_neighbours()
.
I think that "Method" sections should be code free.
There are "Code snippets for R" sections for code and description of R functions and parameters.
Especially since you also plan to provide sections with python code.
In Chapter 17 :
aggregate_profiles{ingredients}
https://github.com/pbiecek/PM_VEE/blob/785718809256d524d0aee244a595ce6c5b718764/17-Partial-dependency.Rmd#L70 could be written as ingredients::aggregate_profiles
to be consistent with places like https://github.com/pbiecek/PM_VEE/blob/785718809256d524d0aee244a595ce6c5b718764/06-Ceteris-paribus-Profiles.Rmd#L116ingredients::show_rugs()
is inconsistent with ingredients::show_observations
and moreMore of 1. and 2. through the rest of this chapter.
In Chapter 18 :
More of 2. through the rest of this chapter.
In Chapter 19 :
Hello @pbiecek,
Many thanks for your great work. Recently I'm learning how to use DALEX
and DALEXtra
packages to conduct XAI analysis. May I ask you a question about the explain()
function?
When doing machine learning, we first split the whole dataset into the training and testing data. I have seen most of the tutorials are using the testing data to build the explainer (for example this one). This is appropriate when we want to evaluate the model's overall performance.
However, if the analysis purpose is to investigate the relationship between the input features and the target variable (i.e., feature importance, feature effects, and feature interaction), should we use the training data to build the explainer?
Your kind guidance is much appreciated!
Best regards,
Xiaochi
Hi,
I was trying to use Grouped partial-dependence profiles model_profile()
function in python, but unfortunately, it cannot handle as grouping variables floats/integers. The problem is that most RF/Logit models cannot handle text data, therefore variables need to encoded as integers (eg: out of the variable gender, I create gender_female with values 0,1). So when I try to apply your aggregated function, I keep getting this error:
sequence item 0: expected str instance, int found
Can you transform variables with n<10 categories into strings to be able to get aggregated results for categorical variables? Or just allow integers if for example n_cat<10?
Thanks!
PS; Cool library!
Is there currently a pdf Version of this book?
intro to modeling
before we will explain we need to model
as in ModelOriented/DALEX#237
I think that section with pros should be extended, now the only highlighted advantage is easiness of understanding. If it is the only advantage, why should we use CP? If at all.
When you point out the cons of 1d CP, it would be good to give examples for better understanding.
For example, when you mention "unrealistic settings", you can say about a situation, where CP consider an apartment with 4 rooms and 20m2.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.