Usage-Based Data Discovery Tool - A simple web application

Setting up the Neptune graph database

The easiest way to set up the Neptune graph is through with a pre-defined AWS CloudFormation Stack. Easy startup instructions can be found at https://docs.aws.amazon.com/neptune/latest/userguide/get-started-create-cluster.html. The Region is very important: make sure you are using a region where you have an Internet Gateway. Also, note that some regions require HTTPS (vs. HTTP). In any case, HTTPS is better practice.) Once you decide on Region, make sure to create the EC2 key pair in that Region.

With Region in mind and Key Pair in hand, choose one of the Launch Stack buttons in the section "Using an AWS CloudFormation Stack to Create a Neptune DB Cluster".

To Do: Add where to find important values...

At the end of this, you will have a Neptune DB Cluster and at least one DB Instance. You will also have an EC2 host through which to access the Neptune Database (it has no externally reachable endpoint).

Setting up to run the application components

The following instructions are verified for the EC2 host created by the Neptune stack.

SSH to the EC2 host

Run this locally:

Clone this repository.

$ cd into the project directory.

Install Python virtual environments , if not already installed

$ pip3 install virtualenv

Create a Python virtual environment with the command

$ python3 -m venv ./venv.

For simplicity, we name it 'venv'.

Start up the virtual environment using the following command

$ source venv/bin/activate.

Install the required python packages

$ pip3 install -r requirements.txt

Loading the graph data

Loading the graph data is currently a two step process: Parse the CSV file with the relationships, outputting a new CSV with additiona information and in the expected structure:

$ python3 parse_csv.pl -i <input_file> -o algo-output.csv

Load the graph into the database

$ python load_graph.pl

To Do: make the filename a command line switch instead of requiring "algo-output.csv".

Deploying the Web User Interface application

The Web Interface for Usage-based Discovery is written in Python Flask, and deployed using Zappa.

Load your AWS credentials into the environment

$ aws configure

Initialize Zappa

$ zappa init

Modify the zappa_settings.json file

You will need to add the vpc_config section and environment_variables. The Neptune environment variable can be obtained from the Neptune console: click on the Neptune cluster name. The value for NEPTUNEDBRO will be under the Connectivity & security tab in the Endpoints table. Choose the one where Type=Reader. (Note that the wss prefix indicates it is using secure web sockets.) To get the subnet_ids, click the Subnet groups on the side panel of the Neptune cluster.

{
    "dev": {
        "app_function": "app.app",
        "aws_region" : "us-west-1",
        "slim_handler" : true,
        "manage_roles" : true,
        "profile_name": null,
        "project_name": "ubd-tool"
        "runtime": "python3.7",
        "s3_bucket": "zappa-qdzy6v21k"
        "vpc_config": {
                "SubnetIds": [ "subnet-0a4cf776ec14a1ad1", "subnet-05c38176413aed035"],
                "SecurityGroupIds": [ "sg-04932af6ea6405a98" ]
        }
        "environment_variables": {
                "NEPTUNEDBRO": "wss://neptunedbcluster-dijacguygxid.cluster-ro-copyeo4gkrow.us-west-1.neptune.amazonaws.com:8182/gremlin"
        }
    }
}

To deploy, now type:

$ zappa deploy dev

This will create a lambda function with the code from app.py and instantiate an API gateway. Static files will go into the S3 bucket listed in the config file.

(Q: does bucket need to be pre-created? Need to find out...)

To run the application in local developer mode

$ python3 app.py

Browse to http://localhost:5000

Any questions or suggestions for improvement?

Questions - email [email protected]
Suggestions - submit a pull request

Ways to contribute to this project: coming soon!

clynnes / ubd-tool-remotedb Goto Github PK

ubd-tool-remotedb's Introduction

Usage-Based Data Discovery Tool - A simple web application

Setting up the Neptune graph database

Setting up to run the application components

Loading the graph data

Deploying the Web User Interface application

To run the application in local developer mode

ubd-tool-remotedb's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs