GithubHelp home page GithubHelp logo

sarsiz / pandas-profiling Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ydataai/ydata-profiling

0.0 1.0 0.0 39.5 MB

Create HTML profiling reports from pandas DataFrame objects

License: MIT License

Python 4.57% HTML 0.32% Makefile 0.01% Batchfile 0.02% CSS 0.06% Jupyter Notebook 95.02%

pandas-profiling's Introduction

Pandas Profiling

Pandas Profiling Logo Header

Build Status Code Coverage Release Version Python Version Code style: black

Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great but a little basic for serious exploratory data analysis. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

  • Type inference: detect the types of columns in a dataframe.
  • Essentials: type, unique values, missing values
  • Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
  • Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
  • Most frequent values
  • Histogram
  • Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
  • Missing values matrix, count, heatmap and dendrogram of missing values
  • Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.

Announcements

With your help, we got approved for GitHub Sponsors! It's extra exciting that GitHub matches your contribution for the first year. Therefore, we welcome you to support the project through GitHub!

The v2.4 release includes many new features (performance, exporting, GUI and datasets) and stability improvements.

January 7, 2020


Contents: Examples | Installation | Documentation | Large datasets | Command line usage | Advanced usage | Types | How to contribute | Editor Integration | Dependencies


Examples

The following examples can give you an impression of what the package can do:

Installation

Using pip

PyPi Downloads PyPi Monthly Downloads PyPi Version

You can install using the pip package manager by running

pip install pandas-profiling[notebook,html]

Alternatively, you could install directly from Github:

pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Using conda

Conda Downloads Conda Version

You can install using the conda package manager by running

conda install -c conda-forge pandas-profiling

From source

Download the source code by cloning the repository or by pressing 'Download ZIP' on this page. Install by navigating to the proper directory and running

python setup.py install

Documentation

The documentation for pandas_profiling can be found here.

Getting started

Start by loading in your pandas DataFrame, e.g. by using

import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(
    np.random.rand(100, 5),
    columns=['a', 'b', 'c', 'd', 'e']
)

To generate the report, run:

profile = ProfileReport(df, title='Pandas Profiling Report', html={'style':{'full_width':True}})

Jupyter Notebook

We recommend generating reports interactively by using the Jupyter notebook. There are two interfaces (see animations below): through widgets and through a HTML report.

Notebook Widgets

This is achieved by simply displaying the report. In the Jupyter Notebook, run:

profile

The HTML report can be included in a Juyter notebook:

HTML

Run the following code:

profile.to_notebook_iframe()

Saving the report

If you want to generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file(output_file="your_report.html")

Alternatively, you can obtain the data as json:

# As a string
json_data = profile.to_json()

# As a file
profile.to_file(output_file="your_report.json")

Large datasets

Version 2.4 introduces minimal mode. This is a default configuration that disables expensive computations (such as correlations and dynamic binning). Use the following syntax:

profile = ProfileReport(large_dataset, minimal=True)
profile.to_file(output_file="output.html")

Command line usage

For standard formatted CSV files that can be read immediately by pandas, you can use the pandas_profiling executable. Run

pandas_profiling -h

for information about options and arguments.

Advanced usage

A set of options is available in order to adapt the report generated.

  • title (str): Title for the report ('Pandas Profiling Report' by default).
  • pool_size (int): Number of workers in thread pool. When set to zero, it is set to the number of CPUs available (0 by default).
  • progress_bar (bool): If True, pandas-profiling will display a progress bar.

More settings can be found in the default configuration file, minimal configuration file and dark themed configuration file.

Example

profile = df.profile_report(title='Pandas Profiling Report', plot={'histogram': {'bins': 8}})
profile.to_file(output_file="output.html")

Types

Types are a powerful abstraction for effective data analysis, that goes beyond the logical data types (integer, float etc.). pandas-profiling currently recognizes the following types:

  • Boolean
  • Numerical
  • Date
  • Categorical
  • URL
  • Path

We have developed a type system for Python, tailored for data analysis: visions. Selecting the right typeset drastically reduces the complexity the code of your analysis. Future versions of pandas-profiling will have extended type support through visions!

How to contribute

Questions: Stackoverflow "pandas-profiling"

The package is actively maintained and developed as open-source software. If pandas-profiling was helpful or interesting to you, you might want to get involved. There are several ways of contributing and helping our thousands of users. If you would like to be a industry partner or sponsor, please drop us a line.

The documentation is generated using pdoc3. If you are contributing to this project, you can rebuild the documentation using:

make docs

or on Windows:

make.bat docs

Read more on getting involved in the Contribution Guide.

Editor integration

PyCharm integration

  1. Install pandas-profiling via the instructions above

  2. Locate your pandas-profiling executable.

    On macOS / Linux / BSD:

    $ which pandas_profiling
    (example) /usr/local/bin/pandas_profiling
    

    On Windows:

    $ where pandas_profiling
    (example) C:\ProgramData\Anaconda3\Scripts\pandas_profiling.exe
    
  3. In Pycharm, go to Settings (or Preferences on macOS) > Tools > External tools

  4. Click the + icon to add a new external tool

  5. Insert the following values

    • Name: Pandas Profiling
    • Program: The location obtained in step 2
    • Arguments: "$FilePath$" "$FileDir$/$FileNameWithoutAllExtensions$_report.html"
    • Working Directory: $ProjectFileDir$

PyCharm Integration

To use the PyCharm Integration, right click on any dataset file: External Tools > Pandas Profiling.

Other integrations

Other editor integrations may be contributed via pull requests.

Dependencies

The profile report is written in HTML and CSS, which means pandas-profiling requires a modern browser.

You need Python 3 to run this package. Other dependencies can be found in the requirements files:

Filename Requirements
requirements.txt Package requirements
requirements-dev.txt Requirements for development
requirements-test.txt Requirements for testing
setup.py Requirements for Widgets etc.

pandas-profiling's People

Contributors

sbrugman avatar jospolfliet avatar romainx avatar conradoqg avatar nparley avatar cclauss avatar endremborza avatar gliptak avatar philip-khor avatar alvinthai avatar dependabot-preview[bot] avatar ajupton avatar arsenyinfo avatar aylr avatar bastinrobin avatar dpavlic avatar proinsias avatar haginot avatar jameslamb avatar janhonho avatar kevanshea avatar leonardaukea avatar liorregev avatar marco-cardoso avatar saitotsutomu avatar tejaslodaya avatar unnir avatar wosaku avatar beingzy avatar drkarthi avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.