GithubHelp home page GithubHelp logo

packtpublishing / interpretable-machine-learning-with-python Goto Github PK

View Code? Open in Web Editor NEW
426.0 18.0 181.0 473.35 MB

Interpretable Machine Learning with Python, published by Packt

License: MIT License

Jupyter Notebook 100.00%

interpretable-machine-learning-with-python's Introduction

Interpretable Machine Learning with Python

Interpretable Machine Learning with Pythone

This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Learn to build interpretable high-performance models with hands-on real-world examples

What is this book about?

Do you want to understand your models and mitigate the risks associated with poor predictions using practical machine learning (ML) interpretation? Interpretable Machine Learning with Python can help you overcome these challenges, using interpretation methods to build fairer and safer ML models.

This book covers the following exciting features:

  • Recognize the importance of interpretability in business
  • Study models that are intrinsically interpretable such as linear models, decision trees, and Naïve Bayes
  • Become well-versed in interpreting models with model-agnostic methods
  • Visualize how an image classifier works and what it learns
  • Understand how to mitigate the influence of bias in datasets

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

base_classifier = KerasClassifier(model=base_model,\
                                  clip_values=(min_, max_))
y_test_mdsample_prob = np.max(y_test_prob[sampl_md_idxs],\
                                                       axis=1)
y_test_smsample_prob = np.max(y_test_prob[sampl_sm_idxs],\
                                                       axis=1)

Following is what you need for this book: This book is for data scientists, machine learning developers, and data stewards who have an increasingly critical responsibility to explain how the AI systems they develop work, their impact on decision making, and how they identify and manage bias. Working knowledge of machine learning and the Python programming language is expected.

With the following software and hardware list you can run all code files present in the book (Chapter 1-14).

Software and Hardware List

You can install the software required in any operating system by first installing Jupyter Notebook or Jupyter Lab with the most recent version of Python, or install Anaconda which can install everything at once. While hardware requirements for Jupyter are relatively modest, we recommend a machine with at least 4 cores of 2Ghz and 8Gb of RAM.

Alternatively, to installing the software locally, you can run the code in the cloud using Google Colab or another cloud notebook service.

Either way, the following packages are required to run the code in all the chapters (Google Colab has all the packages denoted with a ^):

Chapter Software required OS required
1 - 13 ^ Python 3.6+ Windows, Mac OS X, and Linux (Any)
1 - 13 ^ matplotlib 3.2.2+ Windows, Mac OS X, and Linux (Any)
1 - 13 ^ scikit-learn 0.22.2+ Windows, Mac OS X, and Linux (Any)
1 - 12 ^ pandas 1.1.5+ Windows, Mac OS X, and Linux (Any)
2 - 13 machine-learning-datasets 0.01.16+ Windows, Mac OS X, and Linux (Any)
2 - 13 ^ numpy 1.19.5+ Windows, Mac OS X, and Linux (Any)
3 - 13 ^ seaborn 0.11.1+ Windows, Mac OS X, and Linux (Any)
3 - 13 ^ tensorflow 2.4.1+ Windows, Mac OS X, and Linux (Any)
5 - 12 shap 0.38.1+ Windows, Mac OS X, and Linux (Any)
1, 5, 10, 12 ^ scipy 1.4.1+ Windows, Mac OS X, and Linux (Any)
5, 10-12 ^ xgboost 0.90+ Windows, Mac OS X, and Linux (Any)
6, 11, 12 ^ lightgbm 2.2.3+ Windows, Mac OS X, and Linux (Any)
7 - 9 alibi 0.5.5+ Windows, Mac OS X, and Linux (Any)
10 - 13 ^ tqdm 4.41.1+ Windows, Mac OS X, and Linux (Any)
2, 9 ^ statsmodels 0.10.2+ Windows, Mac OS X, and Linux (Any)
3, 5 rulefit 0.3.1+ Windows, Mac OS X, and Linux (Any)
6, 8 lime 0.2.0.1+ Windows, Mac OS X, and Linux (Any)
7, 12 catboost 0.24.4+ Windows, Mac OS X, and Linux (Any)
8, 9 ^ Keras 2.4.3+ Windows, Mac OS X, and Linux (Any)
11, 12 ^ pydot 1.3.0+ Windows, Mac OS X, and Linux (Any)
11, 12 xai 0.0.4+ Windows, Mac OS X, and Linux (Any)
1 ^ beautifulsoup4 4.6.3+ Windows, Mac OS X, and Linux (Any)
1 ^ requests 2.23.0+ Windows, Mac OS X, and Linux (Any)
3 cvae 0.0.3+ Windows, Mac OS X, and Linux (Any)
3 interpret 0.2.2+ Windows, Mac OS X, and Linux (Any)
3 ^ six 1.15.0+ Windows, Mac OS X, and Linux (Any)
3 skope-rules 1.0.1+ Windows, Mac OS X, and Linux (Any)
4 PDPbox 0.2.0+ Windows, Mac OS X, and Linux (Any)
4 pycebox 0.0.1+ Windows, Mac OS X, and Linux (Any)
5 alepython 0.1+ Windows, Mac OS X, and Linux (Any)
5 tensorflow-docs 0.0.02+ Windows, Mac OS X, and Linux (Any)
6 ^ nltk 3.2.5+ Windows, Mac OS X, and Linux (Any)
7 witwidget 1.7.0+ Windows, Mac OS X, and Linux (Any)
8 ^ opencv-python 4.1.2.30+ Windows, Mac OS X, and Linux (Any)
8 ^ scikit-image 0.16.2+ Windows, Mac OS X, and Linux (Any)
8 tf-explain 0.2.1+ Windows, Mac OS X, and Linux (Any)
8 tf-keras-vis 0.5.5+ Windows, Mac OS X, and Linux (Any)
9 SALib 1.3.12+ Windows, Mac OS X, and Linux (Any)
9 distython 0.0.3+ Windows, Mac OS X, and Linux (Any)
10 ^ mlxtend 0.14.0+ Windows, Mac OS X, and Linux (Any)
10 sklearn-genetic 0.3.0+ Windows, Mac OS X, and Linux (Any)
11 aif360==0.3.0 Windows, Mac OS X, and Linux (Any)
11 BlackBoxAuditing==0.1.54 Windows, Mac OS X, and Linux (Any)
11 dowhy 0.5.1+ Windows, Mac OS X, and Linux (Any)
11 econml 0.9.0+ Windows, Mac OS X, and Linux (Any)
11 ^ networkx 2.5+ Windows, Mac OS X, and Linux (Any)
12 bayesian-optimization 1.2.0+ Windows, Mac OS X, and Linux (Any)
12 ^ graphviz 0.10.1+ Windows, Mac OS X, and Linux (Any)
12 tensorflow-lattice 2.0.7+ Windows, Mac OS X, and Linux (Any)
13 adversarial-robustness-toolbox 1.5.0+ Windows, Mac OS X, and Linux (Any)

NOTE: the library machine-learning-datasets is the official name of what in the book is referred to as mldatasets. Due to naming conflicts, it had to be changed.

The exact versions of each library, as tested, can be found in the requirements.txt file and installed like this should you have a dedicated environment for them:

> pip install -r requirements.txt

You might get some conflicts specifically with libraries cvae, alepython, pdpbox and xai. If this is the case, try:

> pip install --no-deps -r requirements.txt

Alternatively, you can install libraries one chapter at a time inside of a local Jupyter environment using cells with !pip install or run all the code in Google Colab with the following links:

Remember to make sure you click on the menu item "File > Save a copy in Drive" as soon you open each link to ensure that your notebook is saved as you run it. Also, notebooks denoted with plus sign (+) are relatively compute-intensive, and will take an extremely long time to run on Google Colab but if you must go to "Runtime > Change runtime type" and select "High-RAM" for runtime shape. Otherwise, a better cloud enviornment or local environment is preferable.

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Summary

The book does much more than explain technical topics, but here's a summary of the chapters:

Chapters topics

Related products

Get to Know the Authors

Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a Climate and Agronomic Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making — and machine learning interpretation helps bridge this gap more robustly.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781800203907

interpretable-machine-learning-with-python's People

Contributors

eraseri avatar packt-itservice avatar packtutkarshr avatar roshank10 avatar smasis001 avatar sonam-packt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

interpretable-machine-learning-with-python's Issues

Heights weights doesnt work

Bought book. Cloned this git. First cell after importing libraries is broken.

url = 'http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights' page = requests.get(url)

Chapter 03 - Flight delays

On chapter 03 I have following problem. Scikit-learn version I have installed is 0.22.2.post1

I printed out model name where it stops (logistic regression)

`for model_name in class_models.keys():
print(model_name)
fitted_model = class_models[model_name]['model'].fit(X_train, y_train_class)
y_train_pred = fitted_model.predict(X_train.values)
if model_name == 'ridge':
y_test_pred = fitted_model.predict(X_test.values)
else:
y_test_prob = fitted_model.predict_proba(X_test.values)[:,1]
y_test_pred = np.where(y_test_prob > 0.5, 1, 0)
class_models[model_name]['fitted'] = fitted_model
class_models[model_name]['probs'] = y_test_prob
class_models[model_name]['preds'] = y_test_pred
class_models[model_name]['Accuracy_train'] = metrics.accuracy_score(y_train_class, y_train_pred)
class_models[model_name]['Accuracy_test'] = metrics.accuracy_score(y_test_class, y_test_pred)
class_models[model_name]['Recall_train'] = metrics.recall_score(y_train_class, y_train_pred)
class_models[model_name]['Recall_test'] = metrics.recall_score(y_test_class, y_test_pred)
if model_name != 'ridge':
class_models[model_name]['ROC_AUC_test'] = metrics.roc_auc_score(y_test_class, y_test_prob)
else:
class_models[model_name]['ROC_AUC_test'] = 0
class_models[model_name]['F1_test'] = metrics.f1_score(y_test_class, y_test_pred)
class_models[model_name]['MCC_test'] = metrics.matthews_corrcoef(y_test_class, y_test_pred)
logistic

AttributeError Traceback (most recent call last)
in
1 for model_name in class_models.keys():
2 print(model_name)
----> 3 fitted_model = class_models[model_name]['model'].fit(X_train, y_train_class)
4 y_train_pred = fitted_model.predict(X_train.values)
5 if model_name == 'ridge':

~\miniconda3\envs\tensorflow\lib\site-packages\sklearn\linear_model_logistic.py in fit(self, X, y, sample_weight)
1589 else:
1590 prefer = 'processes'
-> 1591 fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
1592 **joblib_parallel_args(prefer=prefer))(
1593 path_func(X, y, pos_class=class
, Cs=[C_],

~\miniconda3\envs\tensorflow\lib\site-packages\joblib\parallel.py in call(self, iterable)
1039 # remaining jobs.
1040 self._iterating = False
-> 1041 if self.dispatch_one_batch(iterator):
1042 self._iterating = self._original_iterator is not None
1043

~\miniconda3\envs\tensorflow\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
857 return False
858 else:
--> 859 self._dispatch(tasks)
860 return True
861

~\miniconda3\envs\tensorflow\lib\site-packages\joblib\parallel.py in _dispatch(self, batch)
775 with self._lock:
776 job_idx = len(self._jobs)
--> 777 job = self._backend.apply_async(batch, callback=cb)
778 # A job can complete so quickly than its callback is
779 # called before we get here, causing self._jobs to

~\miniconda3\envs\tensorflow\lib\site-packages\joblib_parallel_backends.py in apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)

~\miniconda3\envs\tensorflow\lib\site-packages\joblib_parallel_backends.py in init(self, batch)
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
573
574 def get(self):

~\miniconda3\envs\tensorflow\lib\site-packages\joblib\parallel.py in call(self)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264

~\miniconda3\envs\tensorflow\lib\site-packages\joblib\parallel.py in (.0)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264

~\miniconda3\envs\tensorflow\lib\site-packages\sklearn\linear_model_logistic.py in _logistic_regression_path(X, y, pos_class, Cs, fit_intercept, max_iter, tol, verbose, solver, coef, class_weight, dual, penalty, intercept_scaling, multi_class, random_state, check_input, max_squared_sum, sample_weight, l1_ratio)
936 options={"iprint": iprint, "gtol": tol, "maxiter": max_iter}
937 )
--> 938 n_iter_i = _check_optimize_result(
939 solver, opt_res, max_iter,
940 extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)

~\miniconda3\envs\tensorflow\lib\site-packages\sklearn\utils\optimize.py in _check_optimize_result(solver, result, max_iter, extra_warning_msg)
241 " https://scikit-learn.org/stable/modules/"
242 "preprocessing.html"
--> 243 ).format(solver, result.status, result.message.decode("latin1"))
244 if extra_warning_msg is not None:
245 warning_msg += "\n" + extra_warning_msg

AttributeError: 'str' object has no attribute 'decode'`

Can't create environment with requirements.txt

Try to create environment with python version 3.6, 3.7, 3.8 and 3.9 and requirements.txt

pip install -r requirements.txt
but none of them are successful. The reason is pip can resolve dependencies conflicts between the packages.
try to ignore install the environment with option '--no-deps', installation could be completed but could run any example due to lack of dependencies.

Is there any method to set up the environment?

Chapter 02 - plotting

Hi,

On following part of book I get error down below

plt.rcParams.update({'font.size': 14}) fig, axarr = plt.subplots(2, 2, figsize=(12,8), sharex=True, sharey=False) mldatasets.create_decision_plot(X_test, y_test, log_result, [5, 1], ['ap_hi [mmHg]', 'age [years]'], X_highlight, filler_feature_values, filler_feature_ranges, ax=axarr.flat[0]) mldatasets.create_decision_plot(X_test, y_test, log_result, [5, 7], ['ap_hi [mmHg]', 'cholesterol [1-3]'], X_highlight, filler_feature_values, filler_feature_ranges, ax=axarr.flat[1]) mldatasets.create_decision_plot(X_test, y_test, log_result, [5, 6], ['ap_hi [mmHg]', 'ap_lo [mmHg]'], X_highlight, filler_feature_values, filler_feature_ranges, ax=axarr.flat[2]) mldatasets.create_decision_plot(X_test, y_test, log_result, [5, 4], ['ap_hi [mmHg]', 'weight [kg]'], X_highlight, filler_feature_values, filler_feature_ranges, ax=axarr.flat[3]) plt.subplots_adjust(top = 1, bottom=0, hspace=0.2, wspace=0.2) plt.show()

`

TypeError Traceback (most recent call last)
in
1 plt.rcParams.update({'font.size': 14})
2 fig, axarr = plt.subplots(2, 2, figsize=(12,8), sharex=True, sharey=False)
----> 3 mldatasets.create_decision_plot(X_test, y_test, log_result, [5, 1], ['ap_hi [mmHg]', 'age [years]'],
4 X_highlight, filler_feature_values, filler_feature_ranges, ax=axarr.flat[0])
5 mldatasets.create_decision_plot(X_test, y_test, log_result, [5, 7], ['ap_hi [mmHg]', 'cholesterol [1-3]'],

c:\users***\miniconda3\lib\site-packages\machine_learning_datasets\common.py in create_decision_plot(X, y, model, feature_index, feature_names, X_highlight, filler_feature_values, filler_feature_ranges, ax)
399 filler_values = dict((k, filler_feature_values[k]) for k in filler_feature_values.keys() if k not in feature_index)
400 filler_ranges = dict((k, filler_feature_ranges[k]) for k in filler_feature_ranges.keys() if k not in feature_index)
--> 401 ax = plot_decision_regions(sm.add_constant(X).to_numpy(), y.to_numpy(), clf=model,
402 feature_index=feature_index,
403 X_highlight=X_highlight,

c:\users***\miniconda3\lib\site-packages\mlxtend\plotting\decision_regions.py in plot_decision_regions(X, y, clf, feature_index, filler_feature_values, filler_feature_ranges, ax, X_highlight, res, legend, hide_spines, markers, colors, scatter_kwargs, contourf_kwargs, scatter_highlight_kwargs)
242 antialiased=True)
243
--> 244 ax.axis(xmin=xx.min(), xmax=xx.max(), y_min=yy.min(), y_max=yy.max())
245
246 # Scatter training data samples

c:\users***\miniconda3\lib\site-packages\matplotlib\axes_base.py in axis(self, emit, *args, **kwargs)
1933 self.set_ylim(ymin, ymax, emit=emit, auto=yauto)
1934 if kwargs:
-> 1935 raise TypeError(f"axis() got an unexpected keyword argument "
1936 f"'{next(iter(kwargs))}'")
1937 return (*self.get_xlim(), *self.get_ylim())

TypeError: axis() got an unexpected keyword argument 'y_min'
`

CH 2 Failing

Chapter 2 is failing because I can't install CVAE nor Rulefit. I tried to drop the install of Tensorflow (currently 2.12.0) so that I can install CVAE (error says it requires TF version <2 >=1. When I tried to drop TF it says that it needs Tensorboard to be dropped which requires Python version to be dropped to > 2.7 < 2.8.

This seems awfully outdated. I feel myself going down rabbit holes in order to get this working. Not sure how I'm expected to make it through this book if the code for the book isn't at all maintained.

Please update or add instructions for how to get dependencies installed. I am currently working with conda version 23.11.0 and Python 3.11.5

Thank you

pip install machine-learning-datasets not working

While trying to pip install machine-learning-datasets I got the following error:
ERROR: Cannot uninstall 'llvmlite'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
I tried uninstalling and reinstalling llvmlite but to no avail...
please advise, or can I just forget about machine-learning-datasets

typo in textbook

ROC-AUC curve that was explained on page 81 by plotting the proportion of true
positive rate (Recall) on the x-axis and the false positive rate on the y-axis should be
plotting the proportion of the true positive rate (Recall) on the y-axis and the false positive rate on the x-axis.

index error in chapter 7

it seems like the original dataset has changed. Therefore when we follow the steps and try to index for the values of interest (idx1 = 5231, idx2 = 2726, idx3 = 10127), only index 10127 is returned, and even then it has different values then in the book/github notebook

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.