Light

geekyme / taxianalytics Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 47 KB

A casual exercise to build a simple data processing application

Dockerfile 1.82% Shell 20.41% Go 77.77%

taxianalytics's Introduction

Taxi Analytics

Above is the system design for this application. For simplicity, the 'Worker' logic and 'Visualizer' logic is handled in this same Go program.

Major decision decisions

Pull-based subscription so that multiple workers can consume from the same subscription, allowing us to scale up writes. Read more
Using a time-series database (InfluxDB) for efficiency of queries since we are interested in a metric over time. Read more
Batch writes so that we minimize IO overhead to the database. Read more
Separate data processing, storage, and visualizing. We may want to visualize the same data points in various ways (rides / hr, avg meter reading) or scale up these individual concerns.

Other considerations

At larger scale / more complex requirements, we may want to ingest data into a pipeline of Apache Spark functions.
At larger scale, a single node of InfluxDB may not be sufficient for High Availability / Resilience. For that, an enterprise installation of InfluxDB will grant features for distributed mode

Setup Minikube

Ensure you have minikube setup. See here.
Add helm and tiller for easier install of InfluxDB. See here

Setup InfluxDB

helm install --name v1 stable/influxdb
You should be able to get the hostname of influxdb. Eg. http://v1-influxdb.default:8086
Create a database called taxianalytics

Setup App

Setup your google cloud project and topic for PubSub. We will need the project id and topic name later. See here
Get your your google cloud service account key and save in <project_root>/key.json. See here
Set the key. export GCLOUD_KEY=$(cat key.json)
Set your google cloud project. export TAXI_PROJECT=<gcloud_project_id>
Set the PubSub subscription name. export TAXI_SUB_NAME=<gcloud_pubsub_topic>
Set the Database host name. export DB_HOST=<influx_db_host>
Run the app. go run main.go

Deploy App in Minikube

All the above environment variables must be setup
Run the deploy script using sh deploy.sh

taxianalytics's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs