reiinakano / scikit-plot Goto Github PK

View Code? Open in Web Editor NEW

2.4K 66.0 279.0 3.05 MB

An intuitive library to add plotting functionality to scikit-learn objects.

License: MIT License

Python 100.00%

scikit-learn visualization machine-learning data-science plotting plot

scikit-plot's Introduction

Welcome to Scikit-plot

Single line functions for detailed visualizations

The quickest and easiest way to go from analysis...

...to this.

Scikit-plot is the result of an unartistic data scientist's dreadful realization that visualization is one of the most crucial components in the data science process, not just a mere afterthought.

Gaining insights is simply a lot easier when you're looking at a colored heatmap of a confusion matrix complete with class labels rather than a single-line dump of numbers enclosed in brackets. Besides, if you ever need to present your results to someone (virtually any time anybody hires you to do data science), you show them visualizations, not a bunch of numbers in Excel.

That said, there are a number of visualizations that frequently pop up in machine learning. Scikit-plot is a humble attempt to provide aesthetically-challenged programmers (such as myself) the opportunity to generate quick and beautiful graphs and plots with as little boilerplate as possible.

Okay then, prove it. Show us an example.

Say we use Naive Bayes in multi-class classification and decide we want to visualize the results of a common classification metric, the Area under the Receiver Operating Characteristic curve. Since the ROC is only valid in binary classification, we want to show the respective ROC of each class if it were the positive class. As an added bonus, let's show the micro-averaged and macro-averaged curve in the plot as well.

Let's use scikit-plot with the sample digits dataset from scikit-learn.

# The usual train-test split mumbo-jumbo
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
nb = GaussianNB()
nb.fit(X_train, y_train)
predicted_probas = nb.predict_proba(X_test)

# The magic happens here
import matplotlib.pyplot as plt
import scikitplot as skplt
skplt.metrics.plot_roc(y_test, predicted_probas)
plt.show()

Pretty.

And... That's it. Encaptured in that small example is the entire philosophy of Scikit-plot: single line functions for detailed visualization. You simply browse the plots available in the documentation, and call the function with the necessary arguments. Scikit-plot tries to stay out of your way as much as possible. No unnecessary bells and whistles. And when you do need the bells and whistles, each function offers a myriad of parameters for customizing various elements in your plots.

Finally, compare and view the non-scikit-plot way of plotting the multi-class ROC curve. Which one would you rather do?

Maximum flexibility. Compatibility with non-scikit-learn objects.

Although Scikit-plot is loosely based around the scikit-learn interface, you don't actually need Scikit-learn objects to use the available functions. As long as you provide the functions what they're asking for, they'll happily draw the plots for you.

Here's a quick example to generate the precision-recall curves of a Keras classifier on a sample dataset.

# Import what's needed for the Functions API
import matplotlib.pyplot as plt
import scikitplot as skplt

# This is a Keras classifier. We'll generate probabilities on the test set.
keras_clf.fit(X_train, y_train, batch_size=64, nb_epoch=10, verbose=2)
probas = keras_clf.predict_proba(X_test, batch_size=64)

# Now plot.
skplt.metrics.plot_precision_recall_curve(y_test, probas)
plt.show()

You can see clearly here that skplt.metrics.plot_precision_recall_curve needs only the ground truth y-values and the predicted probabilities to generate the plot. This lets you use anything you want as the classifier, from Keras NNs to NLTK Naive Bayes to that groundbreaking classifier algorithm you just wrote.

The possibilities are endless.

Installation

Installation is simple! First, make sure you have the dependencies Scikit-learn and Matplotlib installed.

Then just run:

pip install scikit-plot

Or if you want the latest development version, clone this repo and run

python setup.py install

at the root folder.

If using conda, you can install Scikit-plot by running:

conda install -c conda-forge scikit-plot

Documentation and Examples

Explore the full features of Scikit-plot.

You can find detailed documentation here.

Examples are found in the examples folder of this repo.

Contributing to Scikit-plot

Reporting a bug? Suggesting a feature? Want to add your own plot to the library? Visit our contributor guidelines.

Citing Scikit-plot

Are you using Scikit-plot in an academic paper? You should be! Reviewers love eye candy.

If so, please consider citing Scikit-plot with DOI

APA

Reiichiro Nakano. (2018). reiinakano/scikit-plot: 0.3.7 [Data set]. Zenodo. http://doi.org/10.5281/zenodo.293191

IEEE

[1]Reiichiro Nakano, “reiinakano/scikit-plot: 0.3.7”. Zenodo, 19-Feb-2017.

ACM

[1]Reiichiro Nakano 2018. reiinakano/scikit-plot: 0.3.7. Zenodo.

Happy plotting!

scikit-plot's People

Stargazers

Watchers

Forkers

jmrinaldi ratko92 directorscut82 123fengye741 garftalk stevenlol allensmile sunjieee benjamesbabala awesome-archive techscientist xypan1232 codeaudit unmeless vosilov moandcompany filintod nicolasfauchereau frankherfert ml-ai-nlp-ir arbdigital miguelperalvo doug-friedman olummy snazz2001 themrmax salemameen onisimchukv yasiral faisal-w alkavaev rickymos lstmemery sher-ali juwlee muharremokutan hanbman 0xsimulacra falconzyx likaiguo divyansha libardo1 fage2016 megamanics hydrosquall yochju dansanz pursh2002 sergeigks emmanuelq2 ghosthamlet alistairwalsh enavarroai stella-gao wikke cnglen adolfoeliazat excaliburzero hbcbh1999 koolhead17 davbzh abhishekhp2016 jengelman kevinbsc emredjan strongdan johny-c xinqiyang yuanjie-ai muzi-8 eycab sidazhou lugq1990 ryanliwag chansonz cr458 weeang763162 hanfeijp yudaifurukawa philippjfr arianguyen piyush-ahuja chongyang915 kevinpsx symmetriccolors tikyau yangbain jacoobr tenyee cuijie2014 huangshizhi aohan237 abiraja2004 datalee batermj olamyy p768lwy3 seralouk xhluca imnotkk

scikit-plot's Issues

Is this project still maintained?

Hi,

I think this project is a great idea, are you still working on it?

Customization of Precision Recall curves

It would be best to add ability for the user to select which curves they want to display. E.g some people might not want to display the macro and micro averaged curves, display only a specific class' ROC curve.

This approach could then be extended to Precision Recall curves as well

EDIT: ROC Curves now has curves argument thanks to @doug-friedman

Is there a [problem in precision_recall_curve?

Dear sir
Thanks for your excellent work in the scikit-plot. I am confused in the function of precision_recall_curve. As we all know,the PR curve goes through two points:(0,1) and(1,0).But I used the test code to draw the curve and found that the curve does not goes the point of (1,0).I used the below code and get the curve.

import scikitplot as skplt
rf = GaussianNB ()
rf = rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)
plt.show()

n_jobs plot_learning_curve

I'm concerned about the how long it takes given that the model must be trained within the wrapper for the Factory API. The plot_learning_curve method has an n_jobs parameter, but plot_precision_recall_curve doesn't seem to -- am I missing something?

Too many indices For Array

I am facing IndexError but i don't know why as my train and test set are in perfect shape as required by cross_val_score but still im getting this error. Any suggestions ?

K-NN for Amazon Fud Review.html.pdf

ValueError: Found input variables with inconsistent numbers of samples

I'm trying to plot the ROC curve, but I get ValueError: Found input variables with inconsistent numbers of samples.
Here's the code I use:

`skplt.metrics.plot_roc(labels_test.values, pred_w2v_cnn.values)

plt.show()`

Both labels_test.values and pred_w2v_cnn.values have the same length and both are of type np.ndarray. I'd be thankful if anyone can help me to solve this problem.

How to stratify the data when using the classifier factory?

Hello,

Thanks so much for your great work on scikit-plot. I've found it quite useful in my ML workflows.

I'm wondering: I work with imbalanced datasets pretty frequently, so it's important for me to be able to stratify my train/test splits. When I use the classifier factory to generate plots directly from the classifier object, I don't see any options to stratify my data (e.g. in the plot_confusion_matrix function). How can I accomplish this?

Not working in virtualenv

Hey,

It would be cool if this worked in a virtual environment.
It's generally possible by using a different matplotlib backend, such as 'AGG'. This would only allow to save the plot's as figures though (I think).

The specific error I get when installing through virtualenv is:

RuntimeError: Python is not installed as a framework. 
The Mac OS X backend will not be able to   function correctly if Python is not installed as a framework.
See the Python documentation for more information on installing Python as a framework on Mac OS X.
Please either reinstall Python as a framework, or try one of the other backends.
If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. 
See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.

Add cross-validation curve for appointed parameter of param_range

@reiinakano Hi, I have just found that the project of scikit-plot is really helpful for those what to do the data analysis or machine learning, and I use it much. During my working time, I find that for the parameter choosen maybe also be plot for visualization. I have writen a new method for plot the cross-validation for a appointed parameter. I want to create a new branch added the new added method. Is that ok?

Make plotting functions work with array-like inputs

Plots such as plot_roc_curve must be able to take any array-like objects. As of today, they only take numpy arrays as input, otherwise an exception is raised. This numpy array conversion must be done inside the function itself.

Example:

skplt.plot_roc_curve([0, 1], [[0.2, 0.8], [0.8, 0.2]])

does not work while

skplt.plot_roc_curve(np.array([0, 1]), np.array([[0.2, 0.8], [0.8, 0.2]]))

does

Create a function for plotting Lift Chart?

I mean a different Lift Chart or classification than the "lift chart" in the cumulative gain curve and lift curve.

Here is a sample:
https://medium.com/@inlinecoder/disrupting-the-entrance-point-to-a-predictive-data-analytics-12676aa91a8d
https://cran.r-project.org/web/packages/datarobot/vignettes/AdvancedVignette.html

I think this lift chart is quite common in machine learning and data science industry.

I wrote one for binary classification but not sure if it can be extended to multiclass.

def plotLiftChart(actual, predicted):
    df_dict = {'actual': list (actual), 'pred': list(predicted)}
    df = pd.DataFrame(df_dict)
    pred_ranks = pd.qcut(df['pred'].rank(method='first'), 100, labels=False)
    actual_ranks = pd.qcut(df['actual'].rank(method='first'), 100, labels=False)
    pred_percentiles = df.groupby(pred_ranks).mean()
    actual_percentiles = df.groupby(actual_ranks).mean()
    plt.title('Lift Chart')
    plt.plot(np.arange(.01, 1.01, .01), np.array(pred_percentiles['pred']),
             color='darkorange', lw=2, label='Prediction')
    plt.plot(np.arange(.01, 1.01, .01), np.array(pred_percentiles['actual']),
             color='navy', lw=2, linestyle='--', label='Actual')
    plt.ylabel('Target Percentile')
    plt.xlabel('Population Percentile')
    plt.xlim([0.0, 1.0])
    plt.ylim([-0.05, 1.05])
    plt.legend(loc="best")

Add Jupyter notebook examples

It would be nice to have Jupyter notebooks in the "examples" folder showing the different plots as used in a Jupyter notebook. It could contain the same exact code as the examples in the .py files, but adjusted for size (Jupyter notebook plots tend to come out much smaller).

Develop "Functions" API

The current method of appending plotting methods to scikit-learn objects may feel a little restrictive. Work is currently ongoing to develop a "Functions" API where stand-alone functions are exposed for maximum flexibility and compatibility with even non-scikit-learn objects. The current API will then need to be refactored to use the "Functions" API to prevent redundancy.

Add Jupyter notebook examples for plot_cumulative_gain and plot_lift_curve

Add Jupyter notebook examples for metrics.plot_cumulative_gain and metrics.plot_lift_curve

Problems with installing on Anaconda (OS X)

After

sudo pip install scikit-plot

I get

RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.

All other libraries I know and use does not require this setup.

Plot precision-recall curve for support vector machine classifier

Hello I want to plot a precision-recall curve for SVC (support vector machine classifier), but the scikit-learn svm classifier does not implement a predict_proba method. How can I do that in scikit-plot (as far as I can see in the documentation it accepts prediction probabilities to plot the curve)?

Note that the scikit-learn documentation page has an example of precision-recall curve for SVC

Thank you,
Nikos

Example Request: PyTorch Precision Recall

Thank you for the Keras example in README: scikit-plot looks like a very elegant solution for plotting ML / DL curves.

I was wondering if you have tried plotting PyTorch optimizers with scikit-plot? If you have an example of PyTorch it will help me out a lot.

Thank you again for an awesome library!

multilabel-indicator format is not supported

While plotting the roc curve I'm getting this error. please help

Add new features of plotting Gain Charts and Lift Charts?

Hi Team!

I would like to know if you have any plan of adding new functions to plot Gain Charts and Lift Charts since they are popular in data science projects.

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/tutorials/mlp_bankloan_outputtype_02.html

https://docs.microsoft.com/en-us/sql/analysis-services/data-mining/lift-chart-analysis-services-data-mining

Thank you!

add more argument pass through from plot_cumulative_gain to matplotlib

I would like to control the color in plot_cumulative_gain and think that like pandas these could be pass through arguments.

0.2.3 to 0.2.6 update failed

I've just tried to upgrade the package, but it gave the following error:

Collecting scikit-plot
  Using cached scikit-plot-0.2.6.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-7wut1485/scikit-plot/setup.py", line 9, in <module>
        import scikitplot
      File "/tmp/pip-build-7wut1485/scikit-plot/scikitplot/__init__.py", line 5, in <module>
        from scikitplot.classifiers import classifier_factory
      File "/tmp/pip-build-7wut1485/scikit-plot/scikitplot/classifiers.py", line 5, in <module>
        import matplotlib.pyplot as plt
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/pyplot.py", line 115, in <module>
        _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/backends/__init__.py", line 32, in pylab_setup
        globals(),locals(),[backend_name],0)
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/backends/backend_tkagg.py", line 6, in <module>
        from six.moves import tkinter as Tk
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 92, in __get__
        result = self._resolve()
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 115, in _resolve
        return _import_module(self.mod)
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 82, in _import_module
        __import__(name)
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/lib/python3.6/tkinter/__init__.py", line 36, in <module>
        import _tkinter # If this fails your Python may not be configured for Tk
    ModuleNotFoundError: No module named '_tkinter'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-7wut1485/scikit-plot/

Unfortunately, I don't know how to debug this problem. If you need some info, please don't hesitate to ask!

Adding a parameter to plot_confusion_matrix() to hide overlaid counts

Hi @reiinakano,

Thank you for this great repo! I am using plot_confusion_matrix() but my counts are quite large so the overlaid counts end up overlapping each other and result in a cluttered plot. I was wondering if I could submit a pull request to update this function to add a hide_counts parameter to give the option to not plot the counts? I've already forked and created a branch with the changes. Thank you!

matplotlib deprecation warning

Hello I have installed scikitplot (version 0.3.1) from pip and get the following warning when I plot a confusion matrix plot

C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\site-packages\matplotlib\cbook\deprecation.py:106: MatplotlibDeprecationWarning: The spectral and spectral_r colormap was deprecated in version 2.0. Use nipy_spectral and nipy_spectral_r instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)

Cheers,
Nikos

Throws error "IndexError: too many indices for array" when trying to plot roc for binary classification

For binary classification, when I input numpy arrays having test label and test probabilities, it throws the following error :


y_true = np.array(ytest)
y_probas = np.array(p_test)
skplt.metrics.plot_roc_curve(y_true,y_probas)
plt.show()

IndexError                                Traceback (most recent call last)
<ipython-input-49-1b02f082006a> in <module>()
----> 1 skplt.metrics.plot_roc_curve(y_true,y_probas)
      2 plt.show()


/Users/tarun/anaconda/envs/gl-env/lib/python2.7/site-packages/scikitplot/metrics.pyc in plot_roc_curve(y_true, y_probas, title, curves, ax, figsize, cmap, title_fontsize, text_fontsize)
    247     roc_auc = dict()
    248     for i in range(len(classes)):
--> 249         fpr[i], tpr[i], _ = roc_curve(y_true, probas[:, i],
    250                                       pos_label=classes[i])
    251         roc_auc[i] = auc(fpr[i], tpr[i])

IndexError: too many indices for array

Class mismatch in skplt.plot_confusion_matrix when test has fewer classes than training

Hello,
I have an issue when trying to plot a confusion matrix fewer classes in my test set than in training.
The class with 12 000+ occcurences in my sample should be labelled 'O'
is it possible to get around this, or to include the label set manually as an input?

it's not a big issue but would be nice if we could fix it.
Thanks for your help

Any interest in moving towards pytest testing framework?

Noticed most of the tests use unittest.
Is there any interest in porting this over to pytest eventually? This gives several benefits such as parametrization, monkey patching whist maintaining compatibility with unittest and thus allowing for a gradual overhaul.
The added benefits of this is that we would be more in line with the testing practices used by scikit-learn and thus increase compatibility between the two libraries!

Confusion Matrix plots are confusing with long categories

When you create a confusion matrix plot for a classification problem that has a lot of categories with long names, the names for the "Predicted label" axis can overlap, causing the axis to become unreadable. Even using large dimensions for the figsize parameter isn't enough in some cases.

Here is an example.

Perhaps there could be a new optional parameter added to allow the labels on the "Predicted label" axis to be rotated 90 degrees in order to make them easier to read.

MatplotlibDeprecationWarning: Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated

skplot 0.3.5
matplotlib 2.2.3

skplt.metrics.plot_confusion_matrix(y_test, prediction)

/home/chris/.local/lib/python3.5/site-packages/matplotlib/cbook/deprecation.py:107: MatplotlibDeprecationWarning: Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated; use an actual boolean (True/False) instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)

Little Error in official document

http://scikit-plot.readthedocs.io/en/stable/apidocs.html#classifier-plots

in plot_confusion_matrix paragraph, the code is

 rf = classifier_factory(RandomForestClassifier())
 rf.plot_learning_curve(X, y, normalize=True)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
plt.show()

it should be

 rf = classifier_factory(RandomForestClassifier())
 rf.**plot_confusion_matrix**(X, y, normalize=True)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
plt.show()

conda-forge build!

scikit-plot's so amazing, I decided to make a conda-forge recipe for it! It should up available via conda-forge within a few hours.

installation error

Hi i tried to pip install skikit-plot but got the following error message:
Command "python setup.py egg_info" failed with error code 1.

Any suggestion?

Thanks very much.

Elena

`plot_confusion_matrix` has weird white grid lines with Seaborn

As pointed out by @frankherfert , when Seaborn is used (import seaborn) with scikit-plot, the confusion matrix tends to have weird white grid lines. Suggestions on how to get rid of this (especially from experienced Seaborn users) without adding a Seaborn dependency would be much appreciated.

Update Jupyter notebook examples

With v0.3.0, the Jupyter notebook examples are now outdated. It should be trivial to change the examples to the v0.3.0 format.

Add numerical digit precision parameter

Hi there,

I was wondering if there is a way of defining the digit numerical precision of values such as roc_auc.

To see what I mean, let me point you to sklearn API such as for Classification Report, where the parameter digits defines to what precision the values are presented.

This is specially important, for example, when one is training classifiers that are already in the top, say, +99.5% of accuracy/precision/recall/auc and we want to study differences amongst classifiers that are competing at the 0.1% level.

Namely I noticed that digit precision is not consistent throughout scikit-plot, where roc_auc is presenting three digit precision, whil precision_recall is presenting four digit precision.

As you can imagine, for scientific publication purposes it's a bit inelegant to present bound metrics with different precision.

Thanks!

add classes_to_plot option to plot_cumulative_gain

I think it could be useful, when one wants to plot only e.g. class 1, to have an option to produce consistent plots for both plot_cumulative_gain and plot_roc

At the moment, instead, only plot_roc supports such option.

Thanks a lot

Problems with colours

The following methods return wrong colour ranges for plotting:

skplt.plot_pca_2d_projection(pca, X, y)
plt.show()

Returns a single colour for all classes.

skplt.plot_precision_recall_curve(y_true=y, y_probas=probas)
plt.show()

Returns repetition of colours for large classes.

why the auc calculated from plot_roc_curve is different than I manually？

calculated use plot_roc_curve

pred = clf.predict_proba(data_test)
skplt.plot_roc_curve(target_test, pred)
plt.show()

it's result is 0.81

calculated manually

pred_y = clf.predict(data_test)
false_positive_rate, true_positive_rate, thresholds = roc_curve(target_test, pred_y)
roc_auc = auc(false_positive_rate, true_positive_rate)

plt.title('Receiver Operating Characteristic')
plt.plot(false_positive_rate, true_positive_rate, 'b', label='AUC = %0.2f'% roc_auc)
plt.legend(loc='lower right')
plt.plot([0,1],[0,1],'r--')
plt.xlim([0,1.0])
plt.ylim([0,1.0])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

it's result is 0.72

The problem I encountered is bi-class classification not multi-class classification.

Thank you for your help!

AttributeError: module 'scikitplot' has no attribute 'metrics'

when I use updated version I got this error. And it seems some sample code need update.

Plot ONLY one class

Hello i have a precision-recall curve where i plot as the following:

skplt.metrics.plot_precision_recall_curve(y_test, y_probas, curves=['each_class'])

I have two classes in the data (one positive and one negative class with labels 1 and -1 respectively). Questions: How can I plot ONLY the positive class?

Thank you

Deprecation of 'spectral' colormap in matplotlib

Several plots in the metrics package trigger the matplotlib deprecation warning as they use the 'spectral' colormap by default.

skplt.metrics.plot_roc_curve(y_train, y_probas)

MatplotlibDeprecationWarning: The spectral and spectral_r colormap was deprecated in version 2.0. 
Use nipy_spectral and nipy_spectral_r instead.

Needs to be changed to 'nipy_spectral'

ModuleNotFoundError: No module named 'scikitplot'

Hi. I'm beginner in python and I got this error when I tried to run a code that uses scikitplot in w7. I'm using Spyder to do it. I used this command "import scikitplot as skplt".
I checked if scikitplot is installed and it is installed in my computer. I ran this code in Ubuntu last year, but I need to run again in w7 and it is not working in this SO. How can I solve this issue, please?

Custom Scorer for CV inside plot_learning_curve

Hello,

I am using cross-validation with a particular metric, Kappa score, rather than the standard accuracy metric.

cross_val_score(clf, x_train, y_train, scoring=kappa_scorer, cv=kf, n_jobs=-1)

I would like to to set the CV done inside the plot_learning_curve method for each set of train_sizes to use the Kappa Scorer and not the accuracy score. I would also like to use the Kappa Scorer to evaluate the models performance for the training set. Is there any way to set this in the plot_learning_curve method ?

Code to integrate

Dear Reiinakano,

first thank you for your really helpful scikit-plot package. I use it a lot.
I have written just a module to create a all-in-one gain-+lift+probability plot for all classes function
which would fit into your package well.
So feel free to integrate my code into scikit-plot. I would be very pleased.
You can have a look under:
https://sourceforge.net/projects/gains-chart

Best regards, Erich

Likely an import error?

When trying to install scikit-plot with pip3 I got this error:

Collecting scikit-plot
  Downloading scikit-plot-0.1.dev3.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/setup.py", line 9, in <module>
        import scikitplot
      File "/private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/scikitplot/__init__.py", line 5, in <module>
        from scikitplot.classifiers import classifier_factory
      File "/private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/scikitplot/classifiers.py", line 9, in <module>
        from sklearn.model_selection import learning_curve
    ModuleNotFoundError: No module named 'sklearn'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/

My guess is just that classifiers.py just doesn't import sklearn.

Add `figsize`, `title_fontsize`, and `text_fontsize` parameter for existing plots.

As discussed in #11 and pointed out by @frankherfert , plots generated by scikit-plot are quite small on Jupyter notebook. Adding a figsize, title_fontsize, and text_fontsize parameter will let the user adjust the size of the plot based on his/her preferences.

figsize should accept a 2-tuple and be kept at a default value of None and passed to the plt.subplots function during figure creation. title_fontsize should be kept to default value "large" and text_fontsize to default value "medium"

PCA: biplot

The plot_pca_2d_projection method plots scores and colors by target. Biplots usually plot the scores and vectors with different scalings (scaling 1: distance, and scaling 2: correlation): biplots could included in plot_pca_2d_projection or as separate method. The target argument would preferably be optional. The ecopy library includes a nice biplot interface.

Oh, and if anybody has suggestions for what should be included in v0.3.0, please do say here.

Error installing No module named sklearn.metrics

Hi there,
I am getting an error installing it

pip install scikit-plot                                                              ~ 1
Collecting scikit-plot
  Downloading scikit-plot-0.2.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\setup.py", line 9, in <module>
        import scikitplot
      File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\__init__.py", line 5, in <module>
        from scikitplot.classifiers import classifier_factory
      File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\classifiers.py", line 7, in <module>
        from scikitplot import plotters
      File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\plotters.py", line 9, in <module>
        from sklearn.metrics import confusion_matrix
    ImportError: No module named sklearn.metrics

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\