GithubHelp home page GithubHelp logo

oegedijk / explainerdashboard Goto Github PK

View Code? Open in Web Editor NEW
2.2K 22.0 320.0 82.27 MB

Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models.

Home Page: http://explainerdashboard.readthedocs.io

License: MIT License

CSS 0.89% Python 99.11%
dash shap-values dashboard model-predictions data-scientists explainer interactive-dashboards permutation-importances shap plotly

explainerdashboard's Introduction

GitHub Workflow Status (with event) https://pypi.python.org/pypi/explainerdashboard/ https://anaconda.org/conda-forge/explainerdashboard/ codecov Downloads

explainerdashboard

by: Oege Dijk

This package makes it convenient to quickly deploy a dashboard web app that explains the workings of a (scikit-learn compatible) machine learning model. The dashboard provides interactive plots on model performance, feature importances, feature contributions to individual predictions, "what if" analysis, partial dependence plots, SHAP (interaction) values, visualization of individual decision trees, etc.

You can also interactively explore components of the dashboard in a notebook/colab environment (or just launch a dashboard straight from there). Or design a dashboard with your own custom layout and explanations (thanks to the modular design of the library). And you can combine multiple dashboards into a single ExplainerHub.

Dashboards can be exported to static html directly from a running dashboard, or programmatically as an artifact as part of an automated CI/CD deployment process.

Examples deployed at: titanicexplainer.herokuapp.com, detailed documentation at explainerdashboard.readthedocs.io, example notebook on how to launch dashboard for different models here, and an example notebook on how to interact with the explainer object here.

Works with scikit-learn, xgboost, catboost, lightgbm, and skorch (sklearn wrapper for tabular PyTorch models) and others.

Installation

You can install the package through pip:

pip install explainerdashboard

or conda-forge:

conda install -c conda-forge explainerdashboard

Demonstration:

explainerdashboard.gif

(for live demonstration see titanicexplainer.herokuapp.com)

Background

In a lot of organizations, especially governmental, but with the GDPR also increasingly in private sector, it is becoming more and more important to be able to explain the inner workings of your machine learning algorithms. Customers have to some extent a right to an explanation why they received a certain prediction, and more and more internal and external regulators require it. With recent innovations in explainable AI (e.g. SHAP values) the old black box trope is no longer valid, but it can still take quite a bit of data wrangling and plot manipulation to get the explanations out of a model. This library aims to make this easy.

The goal is manyfold:

  • Make it easy for data scientists to quickly inspect the workings and performance of their model in a few lines of code
  • Make it possible for non data scientist stakeholders such as managers, directors, internal and external watchdogs to interactively inspect the inner workings of the model without having to depend on a data scientist to generate every plot and table
  • Make it easy to build an application that explains individual predictions of your model for customers that ask for an explanation
  • Explain the inner workings of the model to the people working (human-in-the-loop) with it so that they gain understanding what the model does and doesn't do. This is important so that they can gain an intuition for when the model is likely missing information and may have to be overruled.

The library includes:

  • Shap values (i.e. what is the contributions of each feature to each individual prediction?)
  • Permutation importances (how much does the model metric deteriorate when you shuffle a feature?)
  • Partial dependence plots (how does the model prediction change when you vary a single feature?
  • Shap interaction values (decompose the shap value into a direct effect an interaction effects)
  • For Random Forests and xgboost models: visualisation of individual decision trees
  • Plus for classifiers: precision plots, confusion matrix, ROC AUC plot, PR AUC plot, etc
  • For regression models: goodness-of-fit plots, residual plots, etc.

The library is designed to be modular so that it should be easy to design your own interactive dashboards with plotly dash, with most of the work of calculating and formatting data, and rendering plots and tables handled by explainerdashboard, so that you can focus on the layout and project specific textual explanations. (i.e. design it so that it will be interpretable for business users in your organization, not just data scientists)

Alternatively, there is a built-in standard dashboard with pre-built tabs (that you can switch off individually)

Examples of use

Fitting a model, building the explainer object, building the dashboard, and then running it can be as simple as:

ExplainerDashboard(ClassifierExplainer(RandomForestClassifier().fit(X_train, y_train), X_test, y_test)).run()

Below a multi-line example, adding a few extra parameters. You can group onehot encoded categorical variables together using the cats parameter. You can either pass a dict specifying a list of onehot cols per categorical feature, or if you encode using e.g. pd.get_dummies(df.Name, prefix=['Name']) (resulting in column names 'Name_Adam', 'Name_Bob') you can simply pass the prefix 'Name':

from sklearn.ensemble import RandomForestClassifier
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
from explainerdashboard.datasets import titanic_survive, titanic_names

feature_descriptions = {
    "Sex": "Gender of passenger",
    "Gender": "Gender of passenger",
    "Deck": "The deck the passenger had their cabin on",
    "PassengerClass": "The class of the ticket: 1st, 2nd or 3rd class",
    "Fare": "The amount of money people paid", 
    "Embarked": "the port where the passenger boarded the Titanic. Either Southampton, Cherbourg or Queenstown",
    "Age": "Age of the passenger",
    "No_of_siblings_plus_spouses_on_board": "The sum of the number of siblings plus the number of spouses on board",
    "No_of_parents_plus_children_on_board" : "The sum of the number of parents plus the number of children on board",
}

X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()
model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

explainer = ClassifierExplainer(model, X_test, y_test, 
                                cats=['Deck', 'Embarked',
                                    {'Gender': ['Sex_male', 'Sex_female', 'Sex_nan']}],
                                cats_notencoded={'Embarked': 'Stowaway'}, # defaults to 'NOT_ENCODED'
                                descriptions=feature_descriptions, # adds a table and hover labels to dashboard
                                labels=['Not survived', 'Survived'], # defaults to ['0', '1', etc]
                                idxs = test_names, # defaults to X.index
                                index_name = "Passenger", # defaults to X.index.name
                                target = "Survival", # defaults to y.name
                                )

db = ExplainerDashboard(explainer, 
                        title="Titanic Explainer", # defaults to "Model Explainer"
                        shap_interaction=False, # you can switch off tabs with bools
                        )
db.run(port=8050)

For a regression model you can also pass the units of the target variable (e.g. dollars):

X_train, y_train, X_test, y_test = titanic_fare()
model = RandomForestRegressor().fit(X_train, y_train)

explainer = RegressionExplainer(model, X_test, y_test, 
                                cats=['Deck', 'Embarked', 'Sex'],
                                descriptions=feature_descriptions, 
                                units = "$", # defaults to ""
                                )

ExplainerDashboard(explainer).run()

y_test is actually optional, although some parts of the dashboard like performance metrics will obviously not be available: ExplainerDashboard(ClassifierExplainer(model, X_test)).run().

You can export a dashboard to static html with db.save_html('dashboard.html').

You can pass a specific index for the static dashboard to display

ExplainerDashboard(explainer, index=0).save_html('dashboard.html')

or

ExplainerDashboard(explainer, index='Cumings, Mrs. John Bradley (Florence Briggs Thayer)').save_html('dashboard.html')

For a simplified single page dashboard try ExplainerDashboard(explainer, simple=True).

Show simplified dashboard screenshot

docs/source/screenshots/simple_classifier_dashboard.png

ExplainerHub

You can combine multiple dashboards and host them in a single place using ExplainerHub:

db1 = ExplainerDashboard(explainer1, title="Classifier Explainer", 
         description="Model predicting survival on H.M.S. Titanic")
db2 = ExplainerDashboard(explainer2, title="Regression Explainer",
         description="Model predicting ticket price on H.M.S. Titanic")
hub = ExplainerHub([db1, db2])
hub.run()

You can adjust titles and descriptions, manage users and logins, store and load from config, manage the hub through a CLI and more. See the ExplainerHub documentation.

Show ExplainerHub screenshot

docs/source/screenshots/explainerhub.png

Dealing with slow calculations

Some of the calculations for the dashboard such as calculating SHAP (interaction) values and permutation importances can be slow for large datasets and complicated models. There are a few tricks to make this less painful:

  1. Switching off the interactions tab (shap_interaction=False) and disabling permutation importances (no_permutations=True). Especially SHAP interaction values can be very slow to calculate, and often are not needed for analysis. For permutation importances you can set the n_jobs parameter to speed up the calculation in parallel.
  2. Calculate approximate shap values. You can pass approximate=True as a shap parameter by passing shap_kwargs=dict(approximate=True) to the explainer initialization.
  3. Storing the explainer. The calculated properties are only calculated once for each instance, however each time when you instantiate a new explainer instance they will have to be recalculated. You can store them with explainer.dump("explainer.joblib") and load with e.g. ClassifierExplainer.from_file("explainer.joblib"). All calculated properties are stored along with the explainer.
  4. Using a smaller (test) dataset, or using smaller decision trees. TreeShap computational complexity is O(TLD^2), where T is the number of trees, L is the maximum number of leaves in any tree and D the maximal depth of any tree. So reducing the number of leaves or average depth in the decision tree can really speed up SHAP calculations.
  5. Pre-computing shap values. Perhaps you already have calculated the shap values somewhere, or you can calculate them off on a giant cluster somewhere, or your model supports GPU generated shap values. You can simply add these pre-calculated shap values to the explainer with explainer.set_shap_values() and explainer.set_shap_interaction_values() methods.
  6. Plotting only a random sample of points. When you have a lots of observations, simply rendering the plots may get slow as well. You can pass the plot_sample parameter to render a (different each time) random sample of observations for the various scatter plots in the dashboard. E.g.: ExplainerDashboard(explainer, plot_sample=1000).run()

Launching from within a notebook

When working inside Jupyter or Google Colab you can use ExplainerDashboard(mode='inline'), ExplainerDashboard(mode='external') or ExplainerDashboard(mode='jupyterlab'), to run the dashboard inline in the notebook, or in a seperate tab but keep the notebook interactive. (db.run(mode='inline') now also works)

There is also a specific interface for quickly displaying interactive components inline in your notebook: InlineExplainer(). For example you can use InlineExplainer(explainer).shap.dependence() to display the shap dependence component interactively in your notebook output cell.

Command line tool

You can store explainers to disk with explainer.dump("explainer.joblib") and then run them from the command-line:

$ explainerdashboard run explainer.joblib

Or store the full configuration of a dashboard to .yaml with e.g. dashboard.to_yaml("dashboard.yaml", explainerfile="explainer.joblib", dump_explainer=True) and run it with:

$ explainerdashboard run dashboard.yaml

You can also build explainers from the commandline with explainerdashboard build. See explainerdashboard CLI documentation for details.

Customizing your dashboard

The dashboard is highly modular and customizable so that you can adjust it your own needs and project.

Changing bootstrap theme

You can change the bootstrap theme by passing a link to the appropriate css file. You can use the convenient themes module of dash_bootstrap_components to generate the css url for you:

import dash_bootstrap_components as dbc

ExplainerDashboard(explainer, bootstrap=dbc.themes.FLATLY).run()

See the dbc themes documentation and bootwatch website for the different themes that are supported.

Switching off tabs

You can switch off individual tabs using boolean flags. This also makes sure that expensive calculations for that tab don't get executed:

ExplainerDashboard(explainer,
                    importances=False,
                    model_summary=True,
                    contributions=True,
                    whatif=True,
                    shap_dependence=True,
                    shap_interaction=False,
                    decision_trees=True)

Hiding components

You can also hide individual components on the various tabs:

    ExplainerDashboard(explainer, 
        # importances tab:
        hide_importances=True,
        # classification stats tab:
        hide_globalcutoff=True, hide_modelsummary=True, 
        hide_confusionmatrix=True, hide_precision=True, 
        hide_classification=True, hide_rocauc=True, 
        hide_prauc=True, hide_liftcurve=True, hide_cumprecision=True,
        # regression stats tab:
        # hide_modelsummary=True, 
        hide_predsvsactual=True, hide_residuals=True, 
        hide_regvscol=True,
        # individual predictions tab:
        hide_predindexselector=True, hide_predictionsummary=True,
        hide_contributiongraph=True, hide_pdp=True, 
        hide_contributiontable=True,
        # whatif tab:
        hide_whatifindexselector=True, hide_whatifprediction=True,
        hide_inputeditor=True, hide_whatifcontributiongraph=True, 
        hide_whatifcontributiontable=True, hide_whatifpdp=True,
        # shap dependence tab:
        hide_shapsummary=True, hide_shapdependence=True,
        # shap interactions tab:
        hide_interactionsummary=True, hide_interactiondependence=True,
        # decisiontrees tab:
        hide_treeindexselector=True, hide_treesgraph=True, 
        hide_treepathtable=True, hide_treepathgraph=True,
        ).run()

Hiding toggles and dropdowns inside components

You can also hide individual toggles and dropdowns using **kwargs. However they are not individually targeted, so if you pass hide_cats=True then the group cats toggle will be hidden on every component that has one:

ExplainerDashboard(explainer, 
                    no_permutations=True, # do not show or calculate permutation importances
                    hide_poweredby=True, # hide the poweredby:explainerdashboard footer
                    hide_popout=True, # hide the 'popout' button from each graph
                    hide_depth=True, # hide the depth (no of features) dropdown
                    hide_sort=True, # hide sort type dropdown in contributions graph/table
                    hide_orientation=True, # hide orientation dropdown in contributions graph/table
                    hide_type=True, # hide shap/permutation toggle on ImportancesComponent 
                    hide_dropna=True, # hide dropna toggle on pdp component
                    hide_sample=True, # hide sample size input on pdp component
                    hide_gridlines=True, # hide gridlines on pdp component
                    hide_gridpoints=True, # hide gridpoints input on pdp component
                    hide_cats_sort=True, # hide the sorting option for categorical features
                    hide_cutoff=True, # hide cutoff selector on classification components
                    hide_percentage=True, # hide percentage toggle on classificaiton components
                    hide_log_x=True, # hide x-axis logs toggle on regression plots
                    hide_log_y=True, # hide y-axis logs toggle on regression plots
                    hide_ratio=True, # hide the residuals type dropdown
                    hide_points=True, # hide the show violin scatter markers toggle
                    hide_winsor=True, # hide the winsorize input
                    hide_wizard=True, # hide the wizard toggle in lift curve component
                    hide_range=True, # hide the range subscript on feature input
                    hide_star_explanation=True, # hide the '* indicates observed label` text
)

Setting default values

You can also set default values for the various dropdowns and toggles. All the components with their parameters can be found in the documentation. Some examples of useful parameters to pass:

ExplainerDashboard(explainer, 
                    higher_is_better=False, # flip green and red in contributions graph
                    n_input_cols=3, # divide feature inputs into 3 columns on what if tab
                    col='Fare', # initial feature in shap graphs
                    color_col='Age', # color feature in shap dependence graph
                    interact_col='Age', # interaction feature in shap interaction
                    depth=5, # only show top 5 features
                    sort = 'low-to-high', # sort features from lowest shap to highest in contributions graph/table
                    cats_topx=3, # show only the top 3 categories for categorical features
                    cats_sort='alphabet', # short categorical features alphabetically
                    orientation='horizontal', # horizontal bars in contributions graph
                    index='Rugg, Miss. Emily', # initial index to display
                    pdp_col='Fare', # initial pdp feature
                    cutoff=0.8, # cutoff for classification plots
                    round=2 # rounding to apply to floats
                    show_metrics=['accuracy', 'f1', custom_metric] # only show certain metrics 
                    plot_sample=1000, # only display a 1000 random markers in scatter plots
                    )

Designing your own layout

All the components in the dashboard are modular and re-usable, which means that you can build your own custom dash dashboards around them.

By using the built-in ExplainerComponent class it is easy to build your own layouts, with just a bare minimum of knowledge of HTML and bootstrap. For example if you only wanted to display the ConfusionMatrixComponent and ShapContributionsGraphComponent, but hide a few toggles:

from explainerdashboard.custom import *

class CustomDashboard(ExplainerComponent):
    def __init__(self, explainer, name=None):
        super().__init__(explainer, title="Custom Dashboard")
        self.confusion = ConfusionMatrixComponent(explainer, name=self.name+"cm",
                            hide_selector=True, hide_percentage=True,
                            cutoff=0.75)
        self.contrib = ShapContributionsGraphComponent(explainer, name=self.name+"contrib",
                            hide_selector=True, hide_cats=True, 
                            hide_depth=True, hide_sort=True,
                            index='Rugg, Miss. Emily')
        
    def layout(self):
        return dbc.Container([
            dbc.Row([
                dbc.Col([
                    html.H1("Custom Demonstration:"),
                    html.H3("How to build your own layout using ExplainerComponents.")
                ])
            ]),
            dbc.Row([
                dbc.Col([
                    self.confusion.layout(),
                ]),
                dbc.Col([
                    self.contrib.layout(),
                ])
            ])
        ])

db = ExplainerDashboard(explainer, CustomDashboard, hide_header=True).run()
Show example custom dashboard screenshot

docs/source/screenshots/custom_dashboard.png

You can use this to define your own layouts, specifically tailored to your own model, project and needs. You can use the ExplainerComposites that are used for the tabs of the default dashboard as a starting point, and edit them to reorganize components, add text, etc. See custom dashboard documentation for more details. A deployed custom dashboard can be found here(source code).

Deployment

If you wish to use e.g. gunicorn or waitress to deploy the dashboard you should add app = db.flask_server() to your code to expose the Flask server. You can then start the server with e.g. gunicorn dashboard:app (assuming the file you defined the dashboard in was called dashboard.py). See also the ExplainerDashboard section and the deployment section of the documentation.

It can be helpful to store your explainer and dashboard layout to disk, and then reload, e.g.:

generate_dashboard.py:

from explainerdashboard import ClassifierExplainer, ExplainerDashboard
from explainerdashboard.custom import *

explainer = ClassifierExplainer(model, X_test, y_test)

# building an ExplainerDashboard ensures that all necessary properties 
# get calculated:
db = ExplainerDashboard(explainer, [ShapDependenceComposite, WhatIfComposite],
                        title='Awesome Dashboard', hide_whatifpdp=True)

# store both the explainer and the dashboard configuration:
db.to_yaml("dashboard.yaml", explainerfile="explainer.joblib", dump_explainer=True)

You can then reload it in dashboard.py:

from explainerdashboard import ClassifierExplainer, ExplainerDashboard

# you can override params during load from_config:
db = ExplainerDashboard.from_config("dashboard.yaml", title="Awesomer Title")

app = db.flask_server()

And then run it with:

    $ gunicorn dashboard:app

or with waitress (also works on Windows):

    $ waitress-serve dashboard:app

Minimizing memory usage

When you deploy a dashboard with a dataset with a large number of rows (n) and columns (m), the memory usage of the dashboard can be substantial. You can check the (approximate) memory usage with explainer.memory_usage(). (as a side note: if you have lots of rows, you probably want to set the plot_sample parameter as well)

In order to reduce the memory footprint there are a number of things you can do:

  1. Not including shap interaction tab: shap interaction values are shape (n*m*m), so can take a subtantial amount of memory.

  2. Setting a lower precision. By default shap values are stored as 'float64', but you can store them as 'float32' instead and save half the space: ClassifierExplainer(model, X_test, y_test, precision='float32'). You can also set a lower precision on your X_test dataset yourself of course.

  3. For multi class classifier, by default ClassifierExplainer calculates shap values for all classes. If you're only interested in a single class you can drop the other shap values: explainer.keep_shap_pos_label_only(pos_label)

  4. Storing data externally. You can for example only store a subset of 10.000 rows in the explainer itself (enough to generate importance and dependence plots), and store the rest of your millions of rows of input data in an external file or database:

    • with explainer.set_X_row_func() you can set a function that takes an index as argument and returns a single row dataframe with model compatible input data for that index. This function can include a query to a database or fileread.
    • with explainer.set_y_func() you can set a function that takes and index as argument and returns the observed outcome y for that index.
    • with explainer.set_index_list_func() you can set a function that returns a list of available indexes that can be queried. Only gets called upon start of the dashboard.

    If you have a very large number of indexes and the user is able to look them up elsewhere, you can also replace the index dropdowns with a simple free text field with index_dropdown=False. Only valid indexes (i.e. in the get_index_list() list) get propagated to other components by default, but this can be overriden with index_check=False. Instead of an index_list_func you can also set an explainer.set_index_check_func(func) which should return a bool whether the index exists or not.

    Important: these function can be called multiple times by multiple independent components, so probably best to implement some kind of caching functionality. The functions you pass can be also methods, so you have access to all of the internals of the explainer.

Documentation

Documentation can be found at explainerdashboard.readthedocs.io.

Example notebook on how to launch dashboards for different model types here: dashboard_examples.ipynb.

Example notebook on how to interact with the explainer object here: explainer_examples.ipynb.

Example notebook on how to design a custom dashboard: custom_examples.ipynb.

Deployed example:

You can find an example dashboard at titanicexplainer.herokuapp.com

(source code at https://github.com/oegedijk/explainingtitanic)

Citation:

A doi can be found at zenodo

explainerdashboard's People

Contributors

absynthe avatar achimgaedke avatar brandonserna avatar haizadtarik avatar hugocool avatar jenoovchi avatar mekomlusa avatar oegedijk avatar oegesam avatar rajgupt avatar raybellwaves avatar sa-so avatar salomonj11 avatar simon-free avatar tunayokumus avatar woochan-jang avatar yanhong-zhao-ef avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

explainerdashboard's Issues

Understanding depth/topx in ShapContributionsGraphComponent

Depth (resp. topx) is supposed to show the top contributing variables only. However, in my example, only 9 variables are relevant (i.e. abs(SHAP)>0 in total), so that if I choose depth >= 10 every variable is displayed. Is this intended? Shouldn't it show depth variables anyway, e.g. randomly choosing among the variables tying for the last place?

Plot vs. feature empty

(Tab "model_summary" in vanilla dashboard)
I'm using a non-special self-made dataset so I'm not sure why it's broken. Here is the error:

Exception on /_dash-update-component [POST]
Traceback (most recent call last):
  File "c:\users\...\venv\lib\site-packages\flask\app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "c:\users\...\venv\lib\site-packages\flask\app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "c:\users\...\venv\lib\site-packages\flask\app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "c:\users\...\venv\lib\site-packages\flask\_compat.py", line 39, in reraise
    raise value
  File "c:\users\...\venv\lib\site-packages\flask\app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "c:\users\...\venv\lib\site-packages\flask\app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "c:\users\...\venv\lib\site-packages\dash\dash.py", line 1076, in dispatch
    response.set_data(func(*args, outputs_list=outputs_list))
  File "c:\...\venv\lib\site-packages\dash\dash.py", line 1007, in add_context
    output_value = func(*args, **kwargs)  # %% callback invoked %%
  File "c:\...\venv\lib\site-packages\explainerdashboard\dashboard_components\regression_components.py", line 370, in update_residuals_graph
    winsor=winsor, dropna=True)
  File "c:\users\...\venv\lib\site-packages\explainerdashboard\explainers.py", line 2551, in plot_residuals_vs_feature
    self.y[na_mask], self.preds[na_mask], col_vals[na_mask],
  File "c:\users\...\venv\lib\site-packages\pandas\core\series.py", line 902, in __getitem__
    key = check_bool_indexer(self.index, key)
  File "c:\users\...\venv\lib\site-packages\pandas\core\indexing.py", line 2183, in check_bool_indexer
    "Unalignable boolean Series provided as "
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Including feature dropped in One-Hot-Encoding to cats / FeatureInputComponent

I just realized that there is a parameter cats - grouping one-hot-encoded variables back together is a thing I'm missing in Python since forever, great stuff! In particular, it simplifies the otherwise huge FeatureInputComponent.

However, I'm usually dropping one level when encoding, which is of course missing in the respective dropdown menu in FeatureInputComponent. Is it possible to include the dropped level so that it can be chosen in FeatureInputComponent as well?

Showing observed label in ClassifierPredictionSummaryComponent

Hi,

I am using feature_input_component to link ClassifierPredictionSummaryComponent to FeatureInputComponent. Changing index in FeatureInputComponent correctly change table and pie chart in ClassifierPredictionSummaryComponent. However, the observed label * indicator is not working in this case.

Thanks,

Addition: A SimplifiedClassifierDashboard

The default dashboard can be very overwhelming with lots of tabs, toggles and dropdowns. It would be nice to offer a simplified version. This can be built as a custom ExplainerComponent and included in custom, so that you could e.g.:

from explainerdashboard import ClassifierExplainer, ExplainerDashboard
from explainerdashboard.custom import SimplifiedClassifierDashboard

explainer = ClassifierExplainer(model, X, y)
ExplainerDashboard(explainer, SimplifiedClassifierDashboard).run()

It should probably include at least:

  • Confusion matrix + one other model quality indicator
  • Shap importances
  • Shap dependence
  • Shap contributions graph

And ideally would add in some dash_bootstrap_components sugar to make it look extra nice, plus perhaps some extra information on how to interpret the various graphs.

ModuleNotFoundError: No module named 'numba.serialize'

what is the correct version of the numba package to run the ClassifierExplainer.from_file ?? when I try to run the code below, I get the following message: ModuleNotFoundError: No module named 'numba.serialize'. My current version of numba is 0.52.0

attempted code:
from flask import Flask
from explainerdashboard import ClassifierExplainer, ExplainerDashboard

app = Flask(name)

explainer = ClassifierExplainer.from_file("explainer.joblib")

db = ExplainerDashboard(explainer, server=app, url_base_pathname="/dashboard/")

@app.route('/dashboard')
def return_dashboard():
return db.app.index()

app.run()

RegressionExplainer.from_file Q

I believe there is a typo in the release notes

https://github.com/oegedijk/explainerdashboard/blob/93a798995d1910774d6423c47c02013c0e51d862/RELEASE_NOTES.md#new-features-7

explainer.from_file() should be BaseExplainer.from_file()

Also not sure if this command is in the docs anywhere? Ahh think I found it

https://github.com/oegedijk/explainerdashboard/blob/22972df6c7bd0edc69838578235e11048df425a5/docs/source/cli.rst#explainerfrom_file

I know it would be breaking changes but any reason it was named "from_file" and not "load"?

Feature Request: Option to rename "index"

I imagine the term "index" is confusing to the lay person.

Could the term "index" be changed by the user? e.g. for the titanic example "index" could be changed to "person".

image

Temporary failure in name resolution - linux

Tested on my linux machine

Installation:

$ conda create -n test_env python=3.8
$ conda activate test_env
$ pip install explainerdashboard

and tested this https://explainerdashboard.readthedocs.io/en/latest/index.html#a-more-extended-example

This may be outside of explainerdashboard

>>> from sklearn.ensemble import RandomForestClassifier
>>> 
>>> from explainerdashboard import ClassifierExplainer, ExplainerDashboard
>>> from explainerdashboard.datasets import titanic_survive
>>> 
>>> X_train, y_train, X_test, y_test = titanic_survive()
>>> 
>>> model = RandomForestClassifier(n_estimators=50, max_depth=5)
>>> model.fit(X_train, y_train)
RandomForestClassifier(max_depth=5, n_estimators=50)
>>> 
>>> explainer = ClassifierExplainer(
...                 model, X_test, y_test,
...                 # optional:
...                 cats=['Sex', 'Deck', 'Embarked'],
...                 labels=['Not survived', 'Survived'])
Note: shap=='guess' so guessing for RandomForestClassifier shap='tree'...
Note: model_output=='probability', so assuming that raw shap output of RandomForestClassifier is in probability space...
Generating self.shap_explainer = shap.TreeExplainer(model)
Detected RandomForestClassifier model: Changing class type to RandomForestClassifierExplainer...
>>> 
>>> db = ExplainerDashboard(explainer, title="Titanic Explainer",
...                     whatif=False, # you can switch off tabs with bools
...                     shap_interaction=False,
...                     decision_trees=False)
Building ExplainerDashboard..
Generating layout...
Calculating shap values...
Calculating dependencies...
Calculating permutation importances (if slow, try setting n_jobs parameter)...
Calculating categorical permutation importances (if slow, try setting n_jobs parameter)...
Calculating prediction probabilities...
Calculating predictions...
Calculating pred_percentiles...
Registering callbacks...
>>> db.run(port=8051)
Starting ExplainerDashboard on http://localhost:8051
Dash is running on http://x86_64-conda_cos6-linux-gnu:8051/

 * Serving Flask app "explainerdashboard.dashboards" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.8/site-packages/explainerdashboard/dashboards.py", line 674, in run
    self.app.run_server(port=port, **kwargs)
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.8/site-packages/dash/dash.py", line 1716, in run_server
    self.server.run(host=host, port=port, debug=debug, **flask_run_options)
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.8/site-packages/flask/app.py", line 990, in run
    run_simple(host, port, self, **options)
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.8/site-packages/werkzeug/serving.py", line 1052, in run_simple
    inner()
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.8/site-packages/werkzeug/serving.py", line 996, in inner
    srv = make_server(
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.8/site-packages/werkzeug/serving.py", line 847, in make_server
    return ThreadedWSGIServer(
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.8/site-packages/werkzeug/serving.py", line 740, in __init__
    HTTPServer.__init__(self, server_address, handler)
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.8/socketserver.py", line 452, in __init__
    self.server_bind()
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.8/http/server.py", line 138, in server_bind
    socketserver.TCPServer.server_bind(self)
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.8/socketserver.py", line 466, in server_bind
    self.socket.bind(self.server_address)
socket.gaierror: [Errno -3] Temporary failure in name resolution

Problem with ClassifierRandomIndexComponent

I am just recreating the custom Dashboard for Titanic for a Regression Model (same thing I wrote about in the last post).
However, using ClassifierRandomIndexComponent in class CustomPredictionsTab throws an error:

`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in
----> 1 ExplainerDashboard(explainer, [CustomModelTab, CustomPredictionsTab], title='Stuff').run(port=8051)

c:\users...\venv\lib\site-packages\explainerdashboard\dashboards.py in init(self, explainer, tabs, title, hide_header, header_hide_title, header_hide_selector, block_selector_callbacks, pos_label, fluid, mode, width, height, external_stylesheets, server, url_base_pathname, responsive, logins, port, importances, model_summary, contributions, whatif, shap_dependence, shap_interaction, decision_trees, **kwargs)
414 block_selector_callbacks=block_selector_callbacks,
415 pos_label=pos_label,
--> 416 fluid=fluid, **kwargs)
417 else:
418 tabs = self._convert_str_tabs(tabs)

c:\users...\venv\lib\site-packages\explainerdashboard\dashboards.py in init(self, explainer, tabs, title, hide_title, hide_selector, block_selector_callbacks, pos_label, fluid, **kwargs)
110
111 self.selector = PosLabelSelector(explainer, pos_label=pos_label)
--> 112 self.tabs = [instantiate_component(tab, explainer, **kwargs) for tab in tabs]
113 assert len(self.tabs) > 0, 'When passing a list to tabs, need to pass at least one valid tab!'
114

c:\users...\venv\lib\site-packages\explainerdashboard\dashboards.py in (.0)
110
111 self.selector = PosLabelSelector(explainer, pos_label=pos_label)
--> 112 self.tabs = [instantiate_component(tab, explainer, **kwargs) for tab in tabs]
113 assert len(self.tabs) > 0, 'When passing a list to tabs, need to pass at least one valid tab!'
114

c:\users...\venv\lib\site-packages\explainerdashboard\dashboards.py in instantiate_component(component, explainer, **kwargs)
54
55 if inspect.isclass(component) and issubclass(component, ExplainerComponent):
---> 56 return component(explainer, **kwargs)
57 elif isinstance(component, ExplainerComponent):
58 return component

in init(self, explainer)
91 hide_slider=True, hide_labels=True,
92 hide_pred_or_perc=True,
---> 93 hide_selector=True, hide_button=False)
94
95 self.contributions = ShapContributionsGraphComponent(explainer,

c:\users...\venv\lib\site-packages\explainerdashboard\dashboard_components\connectors.py in init(self, explainer, title, name, hide_title, hide_index, hide_slider, hide_labels, hide_pred_or_perc, hide_selector, hide_button, pos_label, index, slider, labels, pred_or_perc, **kwargs)
65
66 if self.labels is None:
---> 67 self.labels = self.explainer.labels
68
69 if self.explainer.y_missing:

AttributeError: 'RegressionExplainer' object has no attribute 'labels'`

I'm slightly confused about that since it should be the same Component as on the vanilla "What if..." tab, which is working.
Anyway, is there a way to add the "What if..." Tab without e.g. the PDP-plot to the custom dashboard?

option to change colors of waterfall plot

I understand shap plots use red for positive (up) and blue for negative (down).

From a lay person point of view they are probably use to seeing green for positive (up), red for negative (down) and blue for final.

It would be nice to have an option to ExplainerDashboard to do this. e.g. waterfall_plot_colors='rg'

image

Error due to check_additivity

Hi,

I am getting the following error message for the random forest model:
Exception: Additivity check failed in TreeExplainer! Please ensure the data matrix you passed to the explainer is the same shape that the model was trained on. If your data shape is correct then please report this on GitHub. Consider retrying with the feature_perturbation='interventional' option. This check failed because for one of the samples the sum of the SHAP values was 0.626204, while the model output was 0.710000. If this difference is acceptable you can set check_additivity=False to disable this check.

Do you have any suggestion to solve the issue?

Thanks,
Saman

how to hide partial dependence plots?

AFAIK the key hide_pdp should hide the pdp plot (from all components?)

When running

db = ExplainerDashboard(
    explainer,
    hide_pdp=True,
)

It doesn't seem to remove the pdp plots from any tabs (Individual Predictions, What if...)

denied access to dashboard from explainer hub

Code is similar to

from explainerdashboard import ExplainerDashboard, ExplainerHub
import os

db1 = ExplainerDashboard.from_config("config/dashboard_ds-team.yaml")
db2 = ExplainerDashboard.from_config("config/dashboard_business.yaml")

hub = ExplainerHub([db1, db2])
hub_file = "config/hub.yaml"
hub.to_yaml(hub_file)

Then

hub_file = "config/hub.yaml"
hub = ExplainerHub.from_config(hub_file)
hub.run()

When clicking on one dashboard I see the image below

In the logs I see

Registering callbacks...
 * Serving Flask app "explainerdashboard.dashboards" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
127.0.0.1 - - [22/Dec/2020 15:31:24] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-component-suites/dash_renderer/[email protected]_8_3m1608565601.14.0.min.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-component-suites/dash_renderer/[email protected]_8_3m1608565601.7.2.min.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-component-suites/dash_renderer/[email protected]_8_3m1608565601.8.7.min.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /assets/bootstrap.min.css?m=1608565640.6276863 HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-component-suites/dash_renderer/[email protected]_8_3m1608565601.14.0.min.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-component-suites/dash_html_components/dash_html_components.v1_1_1m1608565602.min.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-component-suites/dash_core_components/dash_core_components-shared.v1_14_1m1608565602.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-component-suites/dash_table/bundle.v4_11_1m1608565601.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-component-suites/dash_core_components/dash_core_components.v1_14_1m1608565602.min.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-component-suites/dash_bootstrap_components/_components/dash_bootstrap_components.v0_11_1m1608565637.min.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-component-suites/dash_renderer/dash_renderer.v1_8_3m1608565601.min.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:31:25] "GET /assets/favicon.ico?m=1608565640.628687 HTTP/1.1" 200 -
127.0.0.1 - - [22/Dec/2020 15:32:25] "GET /dashboard1 HTTP/1.1" 308 -
127.0.0.1 - - [22/Dec/2020 15:32:25] "GET /dashboard1/ HTTP/1.1" 403 -

image

hide pdp on What if... tab

Thanks for working on this (#41).

Just tested and the pdp plot remained in the What if... tab. Is they a way to remove it similar to removing it from the Individual Predictions tab?

from explainerdashboard.datasets import (
    titanic_fare,
    titanic_names,
    feature_descriptions,
)
from sklearn.ensemble import RandomForestRegressor
from explainerdashboard import RegressionExplainer, ExplainerDashboard

X_train, y_train, X_test, y_test = titanic_fare()

model = RandomForestRegressor(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

train_names, test_names = titanic_names()

explainer = RegressionExplainer(
    model,
    X_test,
    y_test,
    cats=["Sex", "Deck", "Embarked"],
    idxs=test_names,
    target="Fare",
    descriptions=feature_descriptions,
    units="$",
)

db = ExplainerDashboard(
    explainer,
    importances=False,
    model_summary=False,
    decision_trees=False,
    no_permutations=True,
    hide_depth=True,
    hide_pdp=True,
)
db.run()

Error with sklearn.ensemble.GradientBoostingClassifier

With a model using GradientBoostingClassifier, get an error

AssertionError: len(shap_explainer.expected_value)=1and len(labels)={len(self.labels)} do not match!

Code:

from explainerdashboard.explainers import *
from explainerdashboard.dashboards import *
from explainerdashboard.datasets import *
 
import plotly.io as pio
import os
 
# load classifier data
 
X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()
 
# one-line example
 
#from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
 
 
#model = RandomForestClassifier(n_estimators=50, max_depth=5)
model = GradientBoostingClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)
 
explainer = ClassifierExplainer(model, X_test, y_test)
 
explainer.plot_shap_contributions(index=0)

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I keep getting this error for any and all ind of model running. I even tried running the example dataset- Titanic, even that ended up throwing same error.
Is there any update in process because of which this is happening, because it explainer dashboard was working fine until 2 days ago. I would appreciate a quick response. :)
This is really cool implementation and is very useful in my current work environment, thank you very much for working on this.

RandomIndex and FeatureInput not usable at the same time (anymore)

I updated on 0.2.13 and I'm really liking the cards, thanks for that! Actually, I wrapped my components into dbc.Cards before...๐Ÿ˜

Reworking my WhatIfTab I found that RegressionRandomIndexComponent and FeatureInputComponent are not getting along. Is this intended? More precisely, if my class looks like

self.index = RegressionRandomIndexComponent(explainer,
                                                    hide_title=False, hide_index=False,
                                                    hide_slider=True, hide_labels=True,
                                                    hide_pred_or_perc=True,
                                                    hide_selector=True, hide_button=False)

self.input = FeatureInputComponent(explainer)
        
self.contributions = ShapContributionsGraphComponent(explainer, depth=7,
                                                             hide_title=False, hide_index=False,
                                                             hide_depth=True, hide_sort=True,
                                                             hide_orientation=True, hide_cats=True,
                                                             hide_selector=True,
                                                             feature_input_component=self.input
                                                             )
        
self.connector = IndexConnector(self.index, [self.contributions])

the SHAP plot only refreshes after changing in the FeatureInputComponent. If I remove feature_input_component=self.input, it (obviously) only refreshes after hitting "Random Index".

Edit: This is probably connected with forcing hide_index=True when using FeatureInputComponent aswell. Is this strictly neccessary or only convenient for implementation reasons?

Question regarding deployment on Heroku

I just tried to deploy my app on Heroku by directly importing the github project.
However, I did not manage to "add the buildpack" correctly - I'm still generating a slug larger than 500MB. I did

  • add the folder bin from https://github.com/niteoweb/heroku-buildpack-shell.git to my project folder,
  • add the folder .heroku including the file run.sh containing "pip install -y xgboost".
    What am I doing wrong, do I have to add the buildpack somewhere in Heroku itself?

logins with one username and one password

Checking https://explainerdashboard.readthedocs.io/en/latest/deployment.html?highlight=auth#setting-logins-and-password

I was testing with one login and one password.

logins=["U", "P"] doesn't work (see below) but logins=[["U", "P"]] does.

I don't suppose there is a login kwarg? or it can handle a list of len 2? It seems them is coming from dash_auth so I could upstream this there.

  File "src/dashboard_cel.py", line 24, in <module>
    logins=["Celebrity", "Beyond"],
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\e\lib\site-packages\explainerdashboard\dashboards.py", line 369, in __init__
    self.auth = dash_auth.BasicAuth(self.app, logins)
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\e\lib\site-packages\dash_auth\basic_auth.py", line 11, in __init__
    else {k: v for k, v in username_password_list}
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\e\lib\site-packages\dash_auth\basic_auth.py", line 11, in <dictcomp>
    else {k: v for k, v in username_password_list}
ValueError: too many values to unpack (expected 2)

Internal Server Error

First time trying the dashboard.

I have a small python file which looks as follows

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from explainerdashboard import RegressionExplainer, ExplainerDashboard

X_train = pd.read_pickle("X_train.pkl")
X_test = pd.read_pickle("X_test.pkl")
y_train = pd.read_pickle("y_train.pkl")
y_test = pd.read_pickle("y_test.pkl")

model = RandomForestRegressor(n_jobs=-1, random_state=42)

model = model.fit(X_train, y_train)

explainer = RegressionExplainer(model, X_test, y_test, shap="tree")
ExplainerDashboard(explainer).run()

Also i'm on windows and installed explainerdashboard using

conda create -n test_env python=3.8
conda activate test_env
pip install explainerdashboard

When running the code I get

(test_env) C:\Users\131416\python\modelling>python explainer_dashboard.py
Generating self.shap_explainer = shap.TreeExplainer(model)
Changing class type to RandomForestRegressionExplainer...
Building ExplainerDashboard..
Generating ShadowDecTree for each individual decision tree...
Generating layout...
Calculating shap values...
Calculating predictions...
Calculating residuals...
Calculating absolute residuals...
Calculating dependencies...
Calculating importances...
Calculating shap interaction values...
Registering callbacks...
Starting ExplainerDashboard on http://localhost:8050
Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "explainerdashboard.dashboards" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
Exception on / [GET]
Traceback (most recent call last):
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\flask\app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\flask\app.py", line 1945, in full_dispatch_request
    self.try_trigger_before_first_request_functions()
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\flask\app.py", line 1993, in try_trigger_before_first_request_functions
    func()
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\dash\dash.py", line 1093, in _setup_server
    _validate.validate_layout(self.layout, self._layout_value())
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\dash\_validate.py", line 348, in validate_layout
    raise exceptions.DuplicateIdError(
dash.exceptions.DuplicateIdError: Duplicate component id found in the initial layout: `whatif-SL-input-qHjwzsAtGC`
127.0.0.1 - - [11/Nov/2020 07:48:30] "GET / HTTP/1.1" 500 -
Exception on /favicon.ico [GET]
Traceback (most recent call last):
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\flask\app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\flask\app.py", line 1945, in full_dispatch_request
    self.try_trigger_before_first_request_functions()
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\flask\app.py", line 1993, in try_trigger_before_first_request_functions
    func()
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\dash\dash.py", line 1093, in _setup_server
    _validate.validate_layout(self.layout, self._layout_value())
  File "C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\dash\_validate.py", line 348, in validate_layout
    raise exceptions.DuplicateIdError(
dash.exceptions.DuplicateIdError: Duplicate component id found in the initial layout: `whatif-SL-input-qHjwzsAtGC`
127.0.0.1 - - [11/Nov/2020 07:48:30] "GET /favicon.ico HTTP/1.1" 500 -

Updating to 0.2.19 broke custom dashboard

After updating to the most recent version, my custom dashboard isn't working anymore, neither after "inplace construction" nor after loading from disk. More precisely:

test_explainer = RegressionExplainer(dec_tree, X_small, y_small, cats=['Day_of_week', 'Hour', 'Vehicle', 'Position'])
ExplainerDashboard(test_explainer).run()

works fine but

explainer = RegressionExplainer(dec_tree, X_small, y_small, cats=['Day_of_week', 'Hour', 'Vehicle', 'Position'])
db = ExplainerDashboard(explainer, [VecPos, CustomFeatImpTabv2, CustomWhatIfTabv2])

yields

Building ExplainerDashboard..
Detected notebook environment, consider setting mode='external', mode='inline' or mode='jupyterlab' to keep the notebook interactive while the dashboard is running...
Generating layout...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-d2ec058da33d> in <module>
      1 db = ExplainerDashboard(explainer,
----> 2                         [VecPos, CustomFeatImpTabv2, CustomWhatIfTabv2],
      3                         # external_stylesheets=[FLATLY],
      4                         # title='Predictive Maintenance'
      5                        )

c:\users\hkoppen\...\site-packages\explainerdashboard\dashboards.py in __init__(self, explainer, tabs, title, name, description, hide_header, header_hide_title, header_hide_selector, hide_poweredby, block_selector_callbacks, pos_label, fluid, mode, width, height, bootstrap, external_stylesheets, server, url_base_pathname, responsive, logins, port, importances, model_summary, contributions, whatif, shap_dependence, shap_interaction, decision_trees, **kwargs)
    510                             block_selector_callbacks=self.block_selector_callbacks,
    511                             pos_label=self.pos_label,
--> 512                             fluid=fluid))
    513         else:
    514             tabs = self._convert_str_tabs(tabs)

c:\users\hkoppen\...\site-packages\explainerdashboard\dashboards.py in __init__(self, explainer, tabs, title, description, header_hide_title, header_hide_selector, hide_poweredby, block_selector_callbacks, pos_label, fluid, **kwargs)
    128 
    129         self.selector = PosLabelSelector(explainer, name="0", pos_label=pos_label)
--> 130         self.tabs  = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]
    131         assert len(self.tabs) > 0, 'When passing a list to tabs, need to pass at least one valid tab!'
    132 

c:\users\hkoppen\...\site-packages\explainerdashboard\dashboards.py in <listcomp>(.0)
    128 
    129         self.selector = PosLabelSelector(explainer, name="0", pos_label=pos_label)
--> 130         self.tabs  = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]
    131         assert len(self.tabs) > 0, 'When passing a list to tabs, need to pass at least one valid tab!'
    132 

c:\users\hkoppen\...\site-packages\explainerdashboard\dashboards.py in instantiate_component(component, explainer, name, **kwargs)
     64 
     65     if inspect.isclass(component) and issubclass(component, ExplainerComponent):
---> 66         component = component(explainer, name=name, **kwargs)
     67         return component
     68     elif isinstance(component, ExplainerComponent):

TypeError: __init__() got an unexpected keyword argument 'name'

Offtopic: I am going to deploy the dashboard using Docker in one or two weeks, hopefully I am able to give some feedback on the respective issue then.

to_yaml misses some kwargs e.g. no_permutations

I'm adding a few settings to the dashboard then saving as a yaml file

db = ExplainerDashboard(
    explainer,
    importances=False,
    model_summary=False,
    decision_trees=False,
    no_permutations=True,
    hide_depth=True,
    hide_pdp=True,
    title=title,
)
db.to_yaml("config/dashboard.yaml", explainerfile="config/explainer.joblib")

When loading the yaml it is missing some of these key works.

e.g. no_permutations, hide_depth, hide_pdp

I'm aware these are **kwargs to ExplainerComponents but it would be nice if the to_yaml also captured https://explainerdashboard.readthedocs.io/en/latest/custom.html?highlight=no_permutations#passing-parameters-as-kwargs

Feature request: Decision Trees tab: adjust location of labels for decision trees

Again, loving this package.

Using

explainer = RegressionExplainer(model, X_test, y_test, cats=cats, shap="tree")
ExplainerDashboard(explainer).run()

There are a couple of instances where the y label is on top of the avg pred label (see below) on the decision trees tab. Would be good if somehow they could be slightly separated. In addition, perhaps y could be renamed to a friendlier common term 'observed'?

image

Request: Plotting parameters in WhatifComponent

As far as I can see, WhatifComponent includes ShapContributionsGraphComponent, but does not include plot styling parameters such as "sort" and "orientation". I think it makes sens to add these :-)

Remove units for R2

I believe R-squared is unitless.

When doing

explainer = RegressionExplainer(..., units="$")

I see the snapshot below on the Model Performance tab.

capture

decision_trees plot fails for RandomForestRegressor with sklearn 0.24

With the just released sklearn 0.24, the .classes_ attribute has been deprecated for sklearn.tree.DecisionTreeRegressor,
resulting in the following error in plot_rf_trees(...):

>       if model.estimators_[0].classes_[0] is not None: #if classifier
E       AttributeError: 'DecisionTreeRegressor' object has no attribute 'classes_'

Failed to guess the type of shap explainer to use

First time using this code. Thanks.

When I try

explainer = RegressionExplainer(model, X_test, y_test)

I get

ValueError: Failed to guess the type of shap explainer to use. Please explicitly pass either shap='tree', 'linear', deep' or 'kernel'.

i'm not sure where to put this

Addition: SimplifiedRegressionDashbaord

The default dashboard can be very overwhelming with lots of tabs, toggles and dropdowns. It would be nice to offer a simplified version. This can be built as a custom ExplainerComponent and included in custom, so that you could e.g.:

from explainerdashboard import RegressionExplainer, ExplainerDashboard
from explainerdashboard.custom import SimplifiedRegressionDashboard

explainer = RegressionExplainer(model, X, y)
ExplainerDashboard(explainer, SimplifiedRegressionDashboard).run()

It should probably include at least:

predicted vs actual plot
Shap importances
Shap dependence
Shap contributions graph

And ideally would add in some dash_bootstrap_components sugar to make it look extra nice, plus perhaps some extra information on how to interpret the various graphs.

Layout questions

I built (well, it's highly unfinished) a dashboard motivated by an usecase in Predictive Maintenance (https://pm-dashboard-2020.herokuapp.com/). However, I wasn't able to align the (grey background of the) header of my cards (containing the description) with the header of the cards of the built-in plots. Did you set any global configuration there!?
Something similar happened when I included self-built plotly plots - creating them in the main file resulted in a different layout compared with the creation in a separate notebook ...

Plus, I have two wishes regarding the feature input component: ๐Ÿ˜

  1. Fix the spacing between variable name and field resp. dropdown menu
  2. Allow more than two columns (and one as well I guess) such that the component isn't unnecessary large if I, for instance, remove the card containing the map

SHAPError with Explainer

Hi!

When trying to replicate the example dashboard, I get the following error:

Here is the code:

from explainerdashboard.explainers import ClassifierExplainer
from explainerdashboard.dashboards import ExplainerDashboard

from explainerdashboard.datasets import titanic_survive, titanic_names

X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()

model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

explainer = ClassifierExplainer(
                model, X_test, y_test,
                # optional:
                cats=['Sex', 'Deck', 'Embarked'],
                labels=['Not survived', 'Survived'])

ExplainerDashboard(explainer).run()

SHAPError: Additivity check failed in TreeExplainer! Please ensure the data matrix you passed to the explainer is the same shape that the model was trained on. If your data shape is correct then please report this on GitHub. Consider retrying with the feature_perturbation='interventional' option. This check failed because for one of the samples the sum of the SHAP values was 0.898516, while the model output was -3111090361071495795172390141309047352589574977119107048305877198218705179454949444899246221946253350348130845905799474606115208759423898693485753827793299777443235521296469068241830714822432908526887716104261809192590482006041690112.000000. If this difference is acceptable you can set check_additivity=False to disable this check.

Problem with deploying a custom dashboard using waitress

I am currently trying to deploy explainer dashboards using waitress. Whereas it's working fine using the vanilla regression dashboard, there is an error when running waitress-serve --call "dashboard:create_c_app" where dashboard.py is

from explainerdashboard import ExplainerDashboard
from custom_tabs import *

def create_c_app():
    db = ExplainerDashboard.from_config("dashboard.yaml")
    app = db.flask_server()
    return app

The error is

Traceback (most recent call last):
  File "C:\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\...\venv\Scripts\waitress-serve.exe\__main__.py", line 7, in <module>
  File "c:\...\venv\lib\site-packages\waitress\runner.py", line 280, in run
    app = app()
  File "C:\...\dashboard.py", line 10, in create_c_app
    db = ExplainerDashboard.from_config("dashboard.yaml")
  File "c:\...\venv\lib\site-packages\explainerdashboard\dashboards.py", line 483, in from_config
    tabs = cls._yamltabs_to_tabs(dashboard_params['tabs'], explainer)
  File "c:\...\venv\lib\site-packages\explainerdashboard\dashboards.py", line 611, in _yamltabs_to_tabs
    return [instantiate_tab(tab, explainer)  for tab in yamltabs]
  File "c:\...\venv\lib\site-packages\explainerdashboard\dashboards.py", line 611, in <listcomp>
    return [instantiate_tab(tab, explainer)  for tab in yamltabs]
  File "c:\...\venv\lib\site-packages\explainerdashboard\dashboards.py", line 600, in instantiate_tab
    tab_class = getattr(import_module(tab['module']), tab['name'])
AttributeError: module '__main__' has no attribute 'CustomDescriptiveTab

The .yaml file looks like this:

[...]
    decision_trees: true
    tabs:
    - name: CustomDescriptiveTab
      module: __main__
      params: null
    - name: CustomFeatImpTab
      module: __main__
      params: null
    - name: CustomWhatIfTab
      module: __main__
      params: null

I don't understand why running via waitress is not working whereas loading the dashboard using ExplainerDashboard.from_config("dashboard.yaml") and running it locally works fine.

Catboost classifier to work with ClassifierExplainer

Naively passed in catboost classifier in the ClassifierExplainer leads to this error message:

Exception: Currently TreeExplainer can only handle models with categorical splits when feature_perturbation="tree_path_dependent" and no background data is passed. Please try again using shap.TreeExplainer(model, feature_perturbation="tree_path_dependent").

After digging into the source code of shap here https://github.com/slundberg/shap/blob/master/shap/explainers/_tree.py before they go on to calculate the shape values, they seem to have done something to correct the conditional sampling for SHAP and CatBoost categorical variables are not supported for now so these lines in the explainer might need a bit of tweaking to yield something like this:
self._shap_explainer = shap.TreeExplainer(self.model)
where internals of the SHAP tree explainer will handle Catboost out of the box.

A few fixes pop up to mind:

  1. expose these inputs for users to set model_output or feature_perturbation
    2.have a special CatBoost classifier class.

I will hot fix this issue in my fork and continue on. Maybe I will discover some new caveats that warrant a new class (for tree plotting perhaps)

Bug: RegressionRandomIndexComponent not robust

I have just managed to kill the whole dashboard because of an error with RegressionRandomIndexComponent, i.e. everything works fine without this component, but enabling it yields to the dashboard only displaying "Error loading layout.". The traceback is below.

I fixed it by appending ".astype('float')" to the data which goes into RegressionExplainer.

Exception on /_dash-layout [GET]
Traceback (most recent call last):
  File "c:\...\lib\site-packages\flask\app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "c:\...\lib\site-packages\flask\app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "c:\...\lib\site-packages\flask\app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "c:\...\lib\site-packages\flask\_compat.py", line 39, in reraise
    raise value
  File "c:\...\lib\site-packages\flask\app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "c:\...\lib\site-packages\flask\app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "c:\...\lib\site-packages\dash\dash.py", line 531, in serve_layout
    json.dumps(layout, cls=plotly.utils.PlotlyJSONEncoder),
  File "C:\lib\json\__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "c:\...\lib\site-packages\_plotly_utils\utils.py", line 45, in encode
    encoded_o = super(PlotlyJSONEncoder, self).encode(o)
  File "C:\lib\json\encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\lib\json\encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
TypeError: keys must be str, int, float, bool or None, not numpy.int64

(this shows multiple times)

ImportError: cannot import name 'XGBExplainer' from 'explainerdashboard'

ImportError: cannot import name 'XGBExplainer' from 'explainerdashboard' (C:\Users\131416\AppData\Local\Continuum\anaconda3\envs\test_env\lib\site-packages\explainerdashboard\__init__.py)

I'll investigate more when I have time. I'm also on windows.

Edit:

This works: from explainerdashboard.explainers import XGBExplainer but it seems RegressionExplainer can be imported from the top directory: from explainerdashboard import RegressionExplainer and I would expect other explainers to be able to be imported from the top directory.

Coming from

https://github.com/oegedijk/explainerdashboard/blob/master/explainerdashboard/__init__.py#L1

Probably just a design thing (no right or wrong way) so closing.

question: provide cats when data in already OHE?

I have a dataframe which has been one-hot-encoded.

The original data could be
NAME
Ray
Oege

Therefore, the data going in looks like
Ray Oege
1 0
0 1

I'm not sure how to pass this in to the expainer?
Some how it needs to know that Ray and Qege where once associated with a column called NAME.

I like the 'group cats' button in the dashboard

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.