GithubHelp home page GithubHelp logo

minio-hive-trino-superset-airflow's Introduction

Get things running

  1. Clone repo
  2. Install docker + docker-compose
  3. Run docker-compose up -d
  4. Run bash superset_init.sh
  5. Done! Checkout the service endpoints:

Trino: http://localhost:8080/ui/ (username can be anything)
Minio: http://localhost:9001/ (username: minio_access_key, password: minio_secret_key)
Superset: http://localhost:8088/ (username: admin, password: admin)
AirFlow: https://localhost:9090/ (username: admin, password: airflow)

Connect to Trino in Superset:

  1. Go to data dropdown and click databases
  2. Click the + database button
  3. For Select a database to connect choose presto
  4. In SQLALCHEMY URI put trino://hive@trino-coordinator:8080/hive
  5. Switch over to Advanced tab
  6. In SQL Lab select all options
  7. In Security select Allow data upload

Trino CLI

docker exec -it trino-hive-superset-docker_trino-coordinator_1 trino

Upload parquet file on MinIO bucket datalake and run commands:

CREATE SCHEMA IF NOT EXISTS hive.LPD_datasets_metadata
WITH (location = 's3a://datalake/');

# Path s3a://datalake is the holding directory. We dont give full file path. Only parent directory
CREATE TABLE IF NOT EXISTS hive.LPD_datasets_metadata.LPD_datasets_metadata (
  img_name	VARCHAR,
	size_img	INTEGER,
	img_w	INTEGER,
	img_h	INTEGER,
	area_img	INTEGER,
	x_min	DOUBLE,
	y_min	DOUBLE,
	x_max	DOUBLE,
	y_max	DOUBLE,
	bbox_w	DOUBLE,
	bbox_h	DOUBLE,
	area_bbox	DOUBLE,
	xmin_scale	DOUBLE,
	ymin_scale	DOUBLE,
	xmax_scale	DOUBLE,
	ymax_scale	DOUBLE,
	area_scale	DOUBLE,
	bbox_wscale	DOUBLE,
	bbox_hscale	DOUBLE
)
WITH (
  external_location = 's3a://datalake/',
  format = 'PARQUET'
);

AirFlow

When we need to run and set up the tasks, we need to attach to AirFlow's container and install the corresponding lib

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.