GithubHelp home page GithubHelp logo

dsx's Introduction

<< Data Science Utilities (DSX)>>

The dsx package contains a collection of wrapper functions to simplify common operations in data analytics tasks. The core module ds_utils (data science utilities) is designed to work with DataFrame in Pandas to simplify common tasks.

The package can be can be used in the following setup:

  • Jupyter Notebook
  • Jupyter Lab
  • PyCharm's Python Console
  • iPython Console
  • Python Script

xqrid

Intallation

  • Installation using Pip:
    pip install dsx

Documentation

Full Documentation Site: Documentation

1. Core Module: "ds_utils"

The core module is "ds_utils". The module contains a list of functions that can accomplish common data analytics tasks with less codes. Basically, these functions are wrappers for commonly-used methods in Pandas, particularly methods of DataFrame object.

Some of the key features of the DataFrame utility functions are as following:

  • Generate metadata of columns in a DataFrame
    • Number & percentage of missing values
    • Number & percentage of unique values
    • Data Type
  • Generate accumulated percentage of values in a column
  • Quick Rename of a single column
  • Reorder columns of a DataFrame
  • Standardize column names into iPython-friendly names
  • Retrieve column name(s) by a partial keyword
  • Expand concatenated string in a column into child table
  • Visualize DataFrame object
    • DataGrid Viewer
    • Pivot Table Viewer
    • Quick Analyzer (Pivot table and visualizations)

1.1 Usage

Below is example codes for importing the module:

    from dsx.ds_utils import *

There are 2 categories of methods in dsx's classes, which are to be called in different ways:

  • Methods: Dynamic functions of the class's instance
    • Invoke through the extended domain ('ds') of the native DataFrame object
    df = pd.read_excel(os.path.join(os.getcwd(), "data.xlsx"))
    df.ds.isnull("Column_Name")
  • Static functions Static functions from the class's object
    • Invoke as a static function of pd_utils class
    df = pd.read_excel(os.path.join(os.getcwd(), "data.xlsx"))
    dsx.isnull(df, "Column_Name")

xpvt

2. Data Science Workflow "ds_workflow" (Active Development / Work-In-Progress)

The "ml_utils" module contains the methods for simplifying common tasks in a data science workflow. The methods are built on top of the functions in the core module "pd_utils".

Some of the key features of the module are as the following:

  • Get the column name of the features that are categorical
  • Get the column name of the features that are numerical
  • Create or merge the dummy variables created from categorical features with option to use k-1 dummification
  • Data Exploration
    • Generate barplot and accumulated percentage report for all the categorical features
    • Generate distribution plot for all the numerical features
    • Generate heatmap of the the correlation matrix
  • Preprocessing
    • Create a dataframe with all standardized features merged with other features
    • Generate features list
  • Model Assessment
    • Generate Recall-Precision-Threshold Curve
    • Generate truepositive_falsepositive Curve

2.1 Usage

The methods in the module are only callable as the extended domain 'ml' in the native Pandas DataFrame object.

Calling a method in "ml_workflow":

    df = pd.read_excel(os.path.join(os.getcwd(), "data.xlsx"))
    
    cols_categorical = df.ml.get_features_categorical()

dsx's People

Contributors

nick-dsaid avatar nictsyen avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

nick-dsaid

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.