GithubHelp home page GithubHelp logo

pbiecek / ema Goto Github PK

View Code? Open in Web Editor NEW
175.0 17.0 38.0 929.65 MB

Explanatory Model Analysis. Explore, Explain and Examine Predictive Models

Home Page: http://ema.drwhy.ai

TeX 0.11% CSS 0.01% R 0.04% HTML 16.08% Jupyter Notebook 83.77%
predictive-models explanatory-model-analysis explore explain examine machine-learning xai model-visualization

ema's People

Contributors

friesewoudloper avatar hbaniecki avatar jtr13 avatar martinholdrege avatar mvwestendorp avatar pbiecek avatar tomaszbur avatar xiaochi-liu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ema's Issues

Chapter 4: similarity of ceteris paribus and lime

In subsection 4.2 there is a comparison of CP and LIME, however, you present LIME 8 chapters later (chapter 12).
Wouldn't it be better to place this comparison in chapter 12?
If someone reads one chapter after another, he would not have to jump between chapters.
or
How about adding a separate chapter to point out similarities and differences between local methods? e.g. CP and LIME, LIME and SHAP, LIME and localModel, etc.

Chapter 17 - minor remarks

Chapter 1: glass-box instead of white-box

How about naming interpretable models "glass-boxes" instead of "white-boxes"?
The interior of the black-box is unknown because of opaque walls. So the opposition would be something that highlights transparency. If the walls of the box are white, you still can't look inside. You can do this if they are made of glass.

Chapter 6: plots in Code snippets for R

  1. I think that it would be good to show variable-importance measures for 2d Ceteris Paribus plots. Especially as 2d vip is described in Section 6.3 Method.

  2. When you show 2D CP profiles for age sibsp and parch you say that "both variables age and sibsp are important and influence the model response."
    Why did you omit variable parch? When I look at plot age vs parch I would say that age and parch have the largest mutual influence. Maybe I am wrong with assessing their influence, so VIP plot would be helpful, as I mentioned in 1) 😉

image

Chapter 1: outdated description of the chapter structure

In 1.8 The structure of the book you describe the division of sections for methods into subsections named:

  • Introduction
  • The Algorithm
  • Example
  • Pros and Cons
  • Code snippets

While the subsections in sections 1-7 are:

  • Introduction
  • Intuition
  • Method
  • Pros and cons
  • Code snippets for R

typo in word prediction (predction -> prediction)

Consider predction of a new observation for which the vector of explanatory variables assumes the value of $\underline{x}_*$, i.e., $f(\underline{\hat{\theta}};\underline{x}_*)$. Assume that $E_{Y | \underline{x}_*}(Y) = f(\underline{\theta};\underline{x}_*)$. It can be shown [@Hastie2009; @Shmueli2010] that the expected squared-error of prediction can be expressed as follows: \index{Expected squared-error of prediction}

Loading of objects with {archivist} fails

The loading of objects using archivists fails, for example:
aread("pbiecek/models/58b24")
results in
Error in function (type, msg, asError = TRUE) : error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version

I think the cause of the problem is a change in this repository: https://github.com/pbiecek/models

The archived objects seem to have moved down one level into the gallery map.

However, when I change to code into
archivist::aread("pbiecek/models/gallery/58b24")
I still get an error, albeit another:
Error in downloadDB(remoteHook) : Such a repo: https://raw.githubusercontent.com/pbiecek/models/master/gallery does not exist or there is no archivist-like Repository on this repo.

Chapter 4: how to interpret CP for discrete vars

You've described how to interpret CP for a continuous variable (Figure 4.2).

Moreover, for this instance (observation), the prediction would increase substantially if the value of the explanatory variable became lower than 20.

I think that there should be a similar short interpretation for CP for a discrete variable in Figure 4.3.

Side note:
For continuous variables, a model response is on the y-axis, for discrete on the x-axis. Can this be unified?

Refactoring of the first part of the book

As in figure below.
The proposed order:

  • aim of the book
  • three laws
  • the structure of the book (with piramide)
  • glass box vs black box
  • model agnostic vs model specific

  • model development
  • example data sets
  • do-it-yourself in R and python

20191224_122717

Chapter 6 and others: explanations of plots

When you describe 2d Ceteris Paribus (Section 6.3), you say:
"Such multi-dimensional extensions are useful to check if, for instance, the model involves interactions. "
It would be helpful to explain when we can say that a Ceteris Paribus 2D plot indicates interaction.

How about adding a subsection in every chapter where different scenarios are shown with explanations what the reader should deduce from each plot?
Or at least extensively explain plots presented in "Method" sections in every chapter.

pdp - missing glm model

pdp_part_7 : w rozdziale 17.5 chyba podpis wykresu nie zgadza się z legendą i z tym co jest na wykresie?

Chapter 6: plot/code slight mismatch

In section 6.6.2 code for ceteris paribus plots gives 4 plots (for age, fare, sibs, parch).
Figure 6.8 presents only two of them (age and fare).

Chapter 7: plot description

In section "7.4 Example" there is a plot for variable-importance measures and last paragraph of this section describes the plot. However, the description ("If Henry were older ...") looks more like a description of ceteris paribus profiles, than of oscillations.

Chapter 1-2: misc suggestions

1.4 Terminology
This part could address "coefficient" - term heavily used in the following section about black/glass boxes.

2.3 How to work with archivist?
Last sentence could also mention that data for model explainers (for which archivist hooks are used) is described in detail in chapter 4.

Typos:
1.8 notation, first paragraph
In some cases this may result in formulae wth a fairly complex system of indices.
(should be "with")

1.9 The structure, 5th paragraph
Chapter 12 presens a different approach to explanation
(should be "presents")

Chapter 7: order of params in `select_neighbours()` function

I am opening the issue here, but I do not know whether it should not be in the repository of the ingredients package.

I think that the first parameter of the select_neighbours() function should be an observation instead of data.

Now, the convention in ingredients is that the name of the function is somehow related to the first parameter.
plot() (what?) ceteris_paribus_explainer
print() (what?) ceteris_paribus_explainer
calculate_oscilations() (of what?) ceteris_paribus_explainer

and for select_neighbours() it should be the same
calculate_neighbours() (of what?) observation

No difference between break-down plot for additive attributions and that for interactions

I was wondering why I don't see any difference between my break-down plot for additive attributions and my break-down plot for interactions.

library(DALEX)
library(ranger)
train <- titanic_imputed
set.seed(1234)
titanic_rf <- ranger(survived ~ ., data = train,
                     classification = TRUE, probability = TRUE)
exp_rf <- explain(titanic_rf,
                  data = train[, 1:7],
                  y = train$survived,
                  label = "Random Forest")
lucy <- data.frame(
      class = factor("1st", levels = c("1st", "2nd", "3rd",
                     "deck crew", "engineering crew",
                     "restaurant staff", "victualling crew")),
      gender = factor("female", levels = c("female", "male")),
      age = 18, sibsp = 0, parch = 0, fare = 70,
      embarked = factor("Southampton", levels = c("Belfast",
                        "Cherbourg","Queenstown","Southampton")))

bd_lucy <- predict_parts(exp_rf, lucy, type = "break_down")
plot(bd_lucy)

bd_lucy_order <- predict_parts(exp_rf, lucy, type = "break_down",
                               order = c("age", "gender", "fare", "class",
                                         "parch", "sibsp", "embarked"))
plot(bd_lucy_order)

bd_lucy_int <- predict_parts(exp_rf, lucy, type = "break_down_interactions")
plot(bd_lucy_int)

bd_lucy is different from bd_lucy_order due to model interactions, I assume.
So I expected bd_lucy to also be different from bd_lucy_int, but this isn't the case.
What am I overlooking?

Figure 9.5 and 9.8 are not correct

Figure 9.5 and 9.8 are not correct because of a bug in localModel.

When I apply the workaround as described in the issue, I get a LIME explanation for Johnny D's prediction from the random forest model that is much more intuitive and consistent with the other explanations for the same prediction.

ex17-1

Chapter 9, Section 2

In subsection 1 about Break-down for linear models you denote observation as $x^ * $, but in subsection 2 when you introduce Break-down for general case you denote the observation as $x_ * $. Maybe it's worth making it more consistent 😉

Translating Between Statistics and Machine Learning

https://insights.sei.cmu.edu/sei_blog/2018/11/translating-between-statistics-and-machine-learning.html

Statistics Machine learning Notes
data point, record, row of data example, instance Both domains also use "observation," which can refer to a single measurement or an entire vector of attributes depending on context.
response variable, dependent variable label, output Both domains also use "target." Since practically all variables depend on other variables, the term "dependent variable" is potentially misleading.
variable, covariate, predictor, independent variable feature, side information, input The term "independent variable" exists for historical reasons but is usually misleading--such a variable typically depends on other variables in the model.
regressions supervised learners, machines Both estimate output(s) in terms of input(s).
estimation learning Both translate data into quantitative claims, becoming more accurate as the supply of relevant data increases.
hypothesis ≠ classifier hypothesis In both statistics and ML, a hypothesis is a scientific statement to be scrutinized, such as "The true value of this parameter is zero."In ML (but not in statistics), a hypothesis can also refer to the prediction rule that is output by a classifier algorithm.
bias ≠ regression intercept bias Statistics distinguishes between (a) bias as form of estimation error and (b) the default prediction of a linear model in the special case where all inputs are 0. ML sometimes uses "bias" to refer to both of these concepts, although the best ML researchers certainly understand the difference.
Maximize the likelihood to estimate model parameters If your target distribution is discrete (such as in logistic regression), minimize the entropy to derive the best parameters.If your target distribution is continuous, fine, just maximize the likelihood. For discrete distributions, maximizing the likelihood is equivalent to minimizing the entropy.
Apply Occam's razor, or encode missing prior information with suitably uninformative priors. Apply the principle of maximum entropy. The principle of maximum entropy is conceptual and does not refer to maximizing a concrete objective function. The principle is that models should be conservative in the sense that they be no more confident in the predictions than is thoroughly justified by the data. In practice this works out as deriving an estimation procedure in terms of a bare-minimum set of criteria as exemplified here or here.
logistic/multinomial regression maximum entropy, MaxEnt They are equivalent except in special multinomial settings like ordinal logistic regression. Note that maximum entropy here refers to the principle of maximum entropy, not the form of the objective function. Indeed, in MaxEnt, you minimize rather than maximize the entropy expression.
X causes Y if surgical (or randomized controlled) manipulations in X are correlated with changes in Y X causes Y if it doesn't obviously not cause Y. For example, X causes Y if X precedes Y in time (or is at least contemporaneous) The stats definition is more aligned with common-sense intuition than the ML one proposed here. In fairness, not all ML practitioners are so abusive of causation terminology, and some of the blame belongs with even earlier abuses such as Granger causality.
structural equations model Bayesian network These are nearly equivalent mathematically, although interpretations differ by use case, as discussed.
sequential experimental design active learning, reinforcement learning, hyperparameter optimization Although these four subfields are very different from each other in terms of their standard use cases, they all address problems of optimization via a sequence of queries/experiments.

Chapter 7: extended description of Nearest neighbors

Subsection 7.3.1 Nearest neighbors

  1. Are there any heuristics related to the number of neighbors?

  2. what is $d_k$ distance? Normalized Euclidean distance?
    I thought the in gover distance numerical and categorical variables are treated differently. But I'm not sure.

  3. If the gover distance is a default measure, so what are alternatives? And why is it default?

Chapter 4: Panel A of Figure 4.1

I really don't like 3d plot A in Figure 4.1. 😉
Currently, z-axis (model response) is nearly useless, I can't read values.
And because of the 3d structure of the plot, I can't see what if class==' 'victualling crew' or class=='restaurant staff'.
How would it look like as a heatmap?

Or, as we were talking during the seminary, 3d plod could be interactive or at least a gif. But what about a paper version of the book?

image

Chapter 7: notation in Section 7.3.3 Residuals

You wrote:
image
It is inconsistent with notation introduced in Section 4.3.
Index of the observation should be lower, not upper.

You also wrote:
"In addition we can add examination of the local residuals that assess local model fit."
What examination?
I think that this section should be extended.

Figure 7: minor spelling errors

Hi,
this book is super interesting, so I just want to contribute by proofreading.
The text in Figure 7 contains two typos:
the dirstribution and the average of the predictions when fixing values of subseqeunt explanatory
Please correct:

  • distribution
  • subsequent

Chapter 7: Pros and cons

In section 7.5 (pros) it is mentioned that "The methodology can easily be extended to two or more variables." Could this idea be described in a bit more detail?

Chapter 7: misc

https://github.com/pbiecek/PM_VEE/blob/76c1580556f90b207b6f2d96fc9b7110fb0263bc/06-Ceteris-paribus-Profiles.Rmd#L172

Plot produces warnings. From what I see, "only_numerical" is not a plot.ceteris_paribus_explainer parameter anymore.

maybe it could be changed to:

plot(cp_titanic_rf, variables = c("class", "embarked"), variable_type = "categorical") +
+   ggtitle("Ceteris Paribus Profiles", "For the random forest model and the Titanic dataset")

Also here is reference to only_numerical as plot parameter:
https://github.com/pbiecek/PM_VEE/blob/76c1580556f90b207b6f2d96fc9b7110fb0263bc/06-Ceteris-paribus-Profiles.Rmd#L169


Second thing:

https://github.com/pbiecek/PM_VEE/blob/76c1580556f90b207b6f2d96fc9b7110fb0263bc/06-Ceteris-paribus-Profiles.Rmd#L57
repeats
https://github.com/pbiecek/PM_VEE/blob/76c1580556f90b207b6f2d96fc9b7110fb0263bc/06-Ceteris-paribus-Profiles.Rmd#L59

toc link to chapter 4

Link to chapter 4 in table of contents seems to be broken.
Is:
.../dataSetsIntro.html
Should be:
.../DataSetsIntro.html

Introduction, section 1.5 typo

In line 90 of 01-Introduction.Rmd there is a lack of space so the HTML output is:
"For example:\index{Model-agnostic approach}"

Chapter 7: mixing methodology and R code

In Subsection 7.3.1 "Nearest neighbors" you mention R function select_neighbours().
I think that "Method" sections should be code free.

There are "Code snippets for R" sections for code and description of R functions and parameters.
Especially since you also plan to provide sections with python code.

Chapters 17-19 - minor typos

In Chapter 17 :

  1. Reference prints straight forward text https://github.com/pbiecek/PM_VEE/blob/785718809256d524d0aee244a595ce6c5b718764/17-Partial-dependency.Rmd#L46
  2. Here aggregate_profiles{ingredients} https://github.com/pbiecek/PM_VEE/blob/785718809256d524d0aee244a595ce6c5b718764/17-Partial-dependency.Rmd#L70 could be written as ingredients::aggregate_profiles to be consistent with places like https://github.com/pbiecek/PM_VEE/blob/785718809256d524d0aee244a595ce6c5b718764/06-Ceteris-paribus-Profiles.Rmd#L116
  3. In the last link ingredients::show_rugs() is inconsistent with ingredients::show_observations and more

More of 1. and 2. through the rest of this chapter.

In Chapter 18 :

  1. $number.or.rooms$ => $number.of.rooms$ https://github.com/pbiecek/PM_VEE/blob/785718809256d524d0aee244a595ce6c5b718764/18-Conditional-dependency.Rmd#L7
  2. Reference prints "??"
    https://github.com/pbiecek/PM_VEE/blob/785718809256d524d0aee244a595ce6c5b718764/18-Conditional-dependency.Rmd#L58
  3. Figure 18.1 axis labels wrong? x_{1} and x_{-i} do not add up

More of 2. through the rest of this chapter.

In Chapter 19 :

  1. $number.or.rooms$ => $number.of.rooms$
    https://github.com/pbiecek/PM_VEE/blob/785718809256d524d0aee244a595ce6c5b718764/19-Accumulated-local-dependency.Rmd#L9

Which dataset (training/testing) should be used to build the explainer?

Hello @pbiecek,

Many thanks for your great work. Recently I'm learning how to use DALEX and DALEXtra packages to conduct XAI analysis. May I ask you a question about the explain() function?

When doing machine learning, we first split the whole dataset into the training and testing data. I have seen most of the tutorials are using the testing data to build the explainer (for example this one). This is appropriate when we want to evaluate the model's overall performance.

However, if the analysis purpose is to investigate the relationship between the input features and the target variable (i.e., feature importance, feature effects, and feature interaction), should we use the training data to build the explainer?

Your kind guidance is much appreciated!

Best regards,
Xiaochi

Grouped partial-dependence profiles does not work with dummy variables

Hi,
I was trying to use Grouped partial-dependence profiles model_profile() function in python, but unfortunately, it cannot handle as grouping variables floats/integers. The problem is that most RF/Logit models cannot handle text data, therefore variables need to encoded as integers (eg: out of the variable gender, I create gender_female with values 0,1). So when I try to apply your aggregated function, I keep getting this error:

sequence item 0: expected str instance, int found

Can you transform variables with n<10 categories into strings to be able to get aggregated results for categorical variables? Or just allow integers if for example n_cat<10?

Thanks!

PS; Cool library!

Chapter 4: extending 'Pros and cons' section

I think that section with pros should be extended, now the only highlighted advantage is easiness of understanding. If it is the only advantage, why should we use CP? If at all.

When you point out the cons of 1d CP, it would be good to give examples for better understanding.
For example, when you mention "unrealistic settings", you can say about a situation, where CP consider an apartment with 4 rooms and 20m2.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.