GithubHelp home page GithubHelp logo

docker-postgres-madlib's Introduction

postgres-madlib

This image provides a postgres database (11.5) with the extension MADlib installed. The instance contains preloaded datasets.

Automated builds are available on dockerhub

Installation

Prerequisites

Connect to database

docker-compose.yml

version: '3.7'

services:
  postgres:
    image: jonixis/postgres-madlib:latest
    container_name: postgres-madlib
    ports:
      - "5432:5432"
    restart: unless-stopped
    volumes:
      - data:/var/lib/postgresql/data

volumes:
  data:

Start the container. Create file 'docker-compose.yml' and run command in same directory (automatically pulls image from dockerhub).

sudo docker-compose up -d

Connect to postgres from the host with e.g. pgcli.

pgcli -h localhost -U postgres -d postgres

Data

The dataset is automatically loaded on the first startup of the container. The database is persisted in a docker volume. Therefore, it lives on even if the container is deleted. If you want to reset the database simply remove container, delete volume and start container again:

Stop and remove container.

sudo docker-compose down

Delete docker volume with postgres data.

sudo docker volume rm docker-postgres-madlib_data

Start container.

sudo docker-compose up -d

Example Query

To run a logistic regression on the admission table, run the following queries.

Convert 'chance_of_admit' to integer (0 or 1).

ALTER TABLE admission ALTER COLUMN chance_of_admit TYPE integer;

Train:

DROP TABLE IF EXISTS admission_logregr, admission_logregr_summary;
SELECT logregr_train( 'admission',                                                                       -- Source table
                      'admission_logregr',                                                               -- Output table
                      'chance_of_admit',                                                                 -- Dependent variable
                      'ARRAY[1, gre_score, toefl_score, university_rating, sop, lor, cgpa, research]',   -- Feature vector
                      NULL,                                                                              -- Grouping
                      20,                                                                                -- Max iterations
                      'irls'                                                                             -- Optimizer to use
                    );

Predict:

-- Display prediction value along with the original value
SELECT a.serial_no, logregr_predict(coef, ARRAY[1, gre_score, toefl_score, university_rating, sop, lor, cgpa, research]),
       a.chance_of_admit::BOOLEAN
FROM admission a, admission_logregr m
ORDER BY a.serial_no;

MADlib Documentation for Logistic Regression: https://madlib.apache.org/docs/latest/group__grp__logreg.html#examples


Source dataset admission_table.csv: https://www.kaggle.com/mohansacharya/graduate-admissions

Source dataset rhc.csv: http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets

docker-postgres-madlib's People

Contributors

jonixis avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.