GithubHelp home page GithubHelp logo

isabella232 / autonormalize Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alteryx/autonormalize

0.0 0.0 0.0 19.58 MB

python library for automated dataset normalization

Home Page: https://blog.featurelabs.com/automatic-dataset-normalization-for-feature-engineering-in-python/

License: BSD 3-Clause "New" or "Revised" License

Python 98.06% Makefile 1.30% Shell 0.63%

autonormalize's Introduction

AutoNormalize

Tests

AutoNormalize is a Python library for automated datatable normalization. It allows you to build an EntitySet from a single denormalized table and generate features for machine learning using Featuretools.

Getting Started

Install

pip install featuretools[autonormalize]

Uninstall

pip uninstall autonormalize

Demos

API Reference

auto_entityset

auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None)

Creates a normalized entityset from a dataframe.

Arguments:

  • df (pd.Dataframe) : the dataframe containing data

  • accuracy (0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)

  • index (str, optional) : name of column that is intended index of df

  • name (str, optional) : the name of created EntitySet

  • time_index (str, optional) : name of time column in the dataframe.

Returns:

  • entityset (ft.EntitySet) : created entity set

find_dependencies

find_dependencies(df, accuracy=0.98, index=None)

Finds dependencies within dataframe with the DFD search algorithm.

Returns:

  • dependencies (Dependencies) : the dependencies found in the data within the contraints provided

normalize_dataframe

normalize_dataframe(df, dependencies)

Normalizes dataframe based on the dependencies given. Keys for the newly created DataFrames can only be columns that are strings, ints, or categories. Keys are chosen according to the priority:

  1. shortest lenghts
  2. has "id" in some form in the name of an attribute
  3. has attribute furthest to left in the table

Returns:

  • new_dfs (list[pd.DataFrame]) : list of new dataframes

make_entityset

make_entityset(df, dependencies, name=None, time_index=None)

Creates a normalized EntitySet from dataframe based on the dependencies given. Keys are chosen in the same fashion as for normalize_dataframeand a new index will be created if any key has more than a single attribute.

Returns:

  • entityset (ft.EntitySet) : created EntitySet

normalize_entity

normalize_entity(es, accuracy=0.98)

Returns a new normalized EntitySet from an EntitySet with a single entity.

Arguments:

  • es (ft.EntitySet) : EntitySet with a single entity to normalize

Returns:

  • new_es (ft.EntitySet) : new normalized EntitySet

Built at Alteryx Innovation Labs

Alteryx Innovation Labs

autonormalize's People

Contributors

allisonportis avatar thehomebrewnerd avatar jeff-hernandez avatar kmax12 avatar rwedge avatar tuethan1999 avatar gsheni avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.