GithubHelp home page GithubHelp logo

canimus / auto-eda Goto Github PK

View Code? Open in Web Editor NEW

This project forked from darenasc/auto-eda

0.0 2.0 0.0 253 KB

Automated Exploratory Data Analysis. Simplifying Data Exploration

Jupyter Notebook 10.91% Python 11.54% PLpgSQL 48.37% TSQL 29.19%

auto-eda's Introduction

Auto-EDA

Automated Exploratory Data Analysis. Simplifying Data Exploration.

You can check some examples in the documentation.

Basic data exploration on databases currently supporting:

  • MSSQL Server
  • MySQL
  • SQLite
  • PostgreSQL
  • Oracle

Given two connections, a source and target database, it will collect metadata for a exploration such as:

  • Number of rows and columns.
  • Number of distinct values and nulls per column.
  • Distribution of the categorical variables.
  • Statistics of the numerical variables.
  • Trends from time series data.

The metadata from the source database will be stored in a metadata database that it will be accesible for any visualization tool to explore it.

How To use AutoEDADB

  • Clone or download the package.
  • Create two connections as described here to a source database and to the metadata database.
    • Source database: This is the DB you want to explore. You don't need any additional information, just a valid connection to the database.
    • Metadata database: It can be created if not exists. This database will store the information from the source databases.
  • Edit the two connection strings and then the call of describe_server(<YOUR_SERVER>) in explorer.py.
  • Run it with python explorer.py

To Do

  • Using samples for large tables.
  • Update frequencies at once after collecting all the distinct values.
  • Encapsulate SQL code and reference it by engine: 'sqlserver', 'mysql', 'postgres', 'sqlite', etc.
  • Add multithreading processing to the queries.
  • Resume mode, now it deletes and insert again.

auto-eda's People

Contributors

darenasc avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.