GithubHelp home page GithubHelp logo

dvdjlaw / pyscagnostics Goto Github PK

View Code? Open in Web Editor NEW

This project forked from uschilaa/binostics

3.0 0.0 1.0 298 KB

A Python package to calculate graph theoretic scagnostics

License: Other

C++ 83.55% Python 16.45%

pyscagnostics's Introduction

pyscagnostics

Python wrapper for computing graph theoretic scatterplot diagnostics.

Scagnostics describe various measures of interest for pairs of variables, based on their appearance on a scatterplot. They are useful tool for discovering interesting or unusual scatterplots from a scatterplot matrix, without having to look at every individual plot.

Wilkinson L., Anand, A., and Grossman, R. (2006). High-Dimensional visual analytics: Interactive exploration guided by pairwise views of point distributions. IEEE Transactions on Visualization and Computer Graphics, November/December 2006 (Vol. 12, No. 6) pp. 1363-1372.

Installation

pip install pyscagnostics

Usage

from pyscagnostics import scagnostics

# Using NumPy arrays or lists
measures, _ = scagnostics(x, y)
print(measures)

# Using Pandas DataFrame
all_measures = scagnostics(df)
for measures, _ in all_measures:
    print(measures)

Documentation

def scagnostics(
    *args,
    bins: int=50,
    remove_outliers: bool=True
) -> Tuple[dict, np.ndarray]:
    """Scatterplot diagnostic (scagnostic) measures

    Scagnostics describe various measures of interest for pairs of variables,
    based on their appearance on a scatterplot.  They are useful tool for
    discovering interesting or unusual scatterplots from a scatterplot matrix,
    without having to look at every individual plot.

    Example:
        `scagnostics` can take an x, y pair of iterables (e.g. lists or NumPy arrays):
        ```
            from pyscagnostics import scagnostics
            import numpy as np

            # Simulate data for example
            x = np.random.uniform(0, 1, 100)
            y = np.random.uniform(0, 1, 100)

            measures, bins = scagnostics(x, y)
        ```

        A Pandas DataFrame can also be passed as the singular required argument. The
        output will be a generator of results:
        ```
            from pyscagnostics import scagnostics
            import numpy as np
            import pandas as pd

            # Simulate data for example
            x = np.random.uniform(0, 1, 100)
            y = np.random.uniform(0, 1, 100)
            z = np.random.uniform(0, 1, 100)
            df = pd.DataFrame({
                'x': x,
                'y': y,
                'z': z
            })

            results = scagnostics(df)
            for x, y, result in results:
                measures, bins = result
                print(measures)
        ```

    Args:
        *args:
            x, y: Lists or numpy arrays
            df: A Pandas DataFrame
        bins: Max number of bins for the hexagonal grid axis
            The data are internally binned starting with a (bins x bins) hexagonal grid
            and re-binned with smaller bin sizes until less than 250 empty bins remain.
        remove_outliers: If True, will remove outliers before calculations

    Returns:
        (measures, bins)
            measures is a dict with scores for each of 9 scagnostic measures.
                See pyscagnostics.measure_names for a list of measures

            bins is a 3 x n numpy array of x-coordinates, y-coordinates, and
                counts for the hex-bin grid. The x and y coordinates are re-scaled
                between 0 and 1000. This is returned for debugging and inspection purposes.

        If the input is a DataFrame, the output will be a generator yielding a tuples of
        scagnostic results for each column pair:
            (x, y, (measures, bins))
    """

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.