GithubHelp home page GithubHelp logo

yclerc / knowledgegraph-terraform-flask-app Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 848 KB

Automatic Knowledge Graph generating API.

License: MIT License

Dockerfile 1.49% Python 41.44% HTML 2.04% HCL 55.03%
python flask natural-language-processing nlp knowledge-graph owlready2 ontology docker terraform aws api

knowledgegraph-terraform-flask-app's Introduction

KnowledgeGraph-Terraform-Flask-app

  • Description: Deployment framework to AWS for secure, autoscaling, High-Availability microservices
  • Provided microservice: In this example, we use an API generating Knowledge Graphs from arxiv.org
  • Use:
    • with provided flask app, no modifications required
    • with your own flask app
      • replace the content of folder ./app/ by your own miscroservice
      • update requirements.txt
      • update Dockerfile
      • update terraform files config.tf and variables.tf
      • That's it !

NB: Feel free to contribute to this project by creating issues :)


Table of Contents


General info

This project deploys an API on AWS according to the following workflow: devops

Flask App

C4 Diagram

c4

./app folder structure

  • downloads/ --------- temp folder when downloading pdf from arxiv
  • models/ ------------- helper functions for app.py
  • ontologies/ --------- to store generated ontology world.owl
  • templates/ ---------- html templates for rendering in web browser (not supported in this version)
  • tests/ ---------------- test scripts for pytest (not supported in this version)
  • uploads/ ------------ folder to stage manually pdf documents for upload (not supported in this version)
  • app.py -------------- flask app and main routes
  • Dockerfile ---------- to build container
  • requirements.txt --- project dependencies generated using pipreqs package

Functional blocks

  • Web Scrapping
  • Natural Language Processing
  • Ontology / Knowledge Graph
    • use owlready2
    • currently does not import full foaf model (due to import bug in Protégé)

Install

Quickstart

Global dependencies: (please refer to links for installation tutorials if necessary)

Clone and go to the newly created repository :

$ git clone <project https address>
$ cd KnowledgeGraph-Terraform-Flask-app

Create a deployment virtualenv and activate it:

# for UNIX systems:
$ python -m venv deploy_venv
$ source deploy_venv/bin/activate

# for Windows systems:
$ python -m venv deploy_venv
$ deploy venv\Scripts\activate

Install requirements from txt file:

$ pip install -r requirements.txt

Select endpoint for database

Various DB available:

- local DynamoDB, for integration testing
- hosted AWS DynamoDB, for production 

Select chosen option by commenting/uncommenting related lines in models/model.py

If you wish to use a local DynamoDB, you should configure it using the following commands: Refer to this tutorial for details.

  1. download DynamoDB .zip package from the tutorial

  2. extract package to chosen location

  3. from a bash shell, at this location, launch DynamoDBLocal.jar with:

     $ java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb
    
  4. keep this shell window open to use your DB

  5. in another shell tab, create your table

     $ aws dynamodb create-table --table-name arxivTable     --attribute-definitions AttributeName=_id,AttributeType=S --key-schema AttributeName=_id,KeyType=HASH --billing-mode PAY_PER_REQUEST --endpoint-url http://localhost:8000
    
  6. check if the table exists

     $ aws dynamodb list-tables --endpoint-url http://localhost:8000
    
  7. When needed, you can destroy the table using the command:

     $ aws dynamodb delete-table --table-name arxivTable --endpoint-url http://localhost:8000
    

Launch microservice on localhost

$ cd app/
$ python app.py

Open http://localhost:5000 in a browser to interact with the API

Docker locally

build and run container using following commands.

$ docker build -t knowledgegraph-terraform-flask-app .
$ docker run -d -p 5000:5000 knowledgegraph-terraform-flask-app
$ curl http://localhost:5000

Deploy

Resulting architecture generated in AWS : Flask-Microservice

Refer to this tutorial to get more details. Use commands below to ensure proper deployment.

Docker push to AWS


NB: This step assumes you already have a configured programatic CLI access to an active AWS account. Refer to this tutorial for more details.

Make sure to select proper DB endpoint (AWS hosted DynamoDB) in models/model.py before building your container.


Create repository on AWS ECR:

$ aws ecr create-repository --repository-name knowledgegraph-terraform-flask-app --image-scanning-configuration scanOnPush=true --region eu-west-3 

NB: Insert your actual AWS ID in place of <AWS_ID> in the following command lines.


Get credentials:

$ aws ecr get-login-password --region eu-west-3 | docker login --username AWS --password-stdin <AWS_ID>.dkr.ecr.eu-west-3.amazonaws.com/knowledgegraph-terraform-flask-app

From your browser open the AWS Console, open Services, Elastic Container Registry.

Select the knowledgegraph-terraform-flask-app. The ECR URI will be needed later on.

Back to the shell, log into the ECR service of your AWS account (use your own AWS_ID) with the following commands.

Tag and push to ECR:

$ docker tag knowledgegraph-terraform-flask-app:latest <AWS_ID>.dkr.ecr.eu-west-3.amazonaws.com/knowledgegraph-terraform-flask-app:latest

$ docker push <AWS_ID>.dkr.ecr.eu-west-3.amazonaws.com/knowledgegraph-terraform-flask-app:latest

Deploy Terraform plan

$ cd ../terraform
$ terraform init

The Terraform code will deploy the following configuration:

  • IAM: Identity access management policy configuration
  • VPC: Public and private subnets, routes, and a NAT Gateway
  • EC2: Autoscaling implementation
  • ECS: Cluster configuration
  • ALB: Load balancer configuration
  • DynamoDB: Table configuration
  • CloudWatch: Alert metrics configuration

# check configuration files:
$ terraform validate 

# prepare and review execution plan:
# this command prompts for a valid ECR URI (see AWS console)
$ terraform plan  

# deploy plan to AWS:
# this command prompts for a valid ECR URI (see AWS console)
# then type 'yes' when prompted to launch execution
$ terraform apply 

The execution may take a while. If successful, the output will be the newly created URI for our API endpoint. Copy and paste this URI to your browser in order to access the API.

Remove deployed architecture

Delete the API completely from AWS:

$ terraform destroy

You can finally delete the ECR registry directly from your browser in AWS console.

In case of errors during deletion, check manually from AWS Console for services that are still up and running.

Use

API manager

An API contract is provided through Postman API Platform, based on OpenAPI specifications.

See API contracts for information on the KnowledgeGraph-Terraform-Flask-app API and available routes:

See these resources for more content on how to document APIs

Use scenarii

To Do: programatic access for tester in fully hosted scenarii?? --> AWS IAM role and associated acces keys for DynamoDB ???

Test fully hosted microservice

  • Go to provided endpoint
  • Security, access restriction: TBD
  • Upload unit file
  • Upload batch not supported
  • Generate ontology

OR

Deploy your own cloud hosted microservice

  • Follow Deploy section
  • with your endpoint, same steps as for fully hosted microservice
  • Launch API from your machine to perform batch imports

OR

Test your own microservice on localhost

  • launch local API instance (with local DynamoDB instance)
  • with your endpoint, same steps as for fully hosted microservice
  • Perform batch imports (for instance, batch size = increasing multiples of 10)

NB:

  • The fully hosted Flask app relies extensively on network connectivity (timeouts may occur)
  • Always prefer to launch batch imports from local API instance
  • An area of improvement could be to use a cache such as celery.
  • Another option would be to tweak parameters of the architecture, especially limitations on:
    • Internet Gateway,
    • NAT Gateway,
    • Application Load Balancer.

Example of successful batch request from local API instance, 10 documents, elapsed time: 3 min batch10success

Code testing librairies

Testing not yet maintained in this version. Tech stack to use:

black: Clean code automatically on app files by using black package

$ black <filename>.py 

pylint: Rate code quality and suggests improvements

$ python -m pylint <filename>.py

pytest: Perform unit tests from tests folder and check coverage

$ python -m pytest --cov

Monitor

Monitor you microservice from AWS CloudWatch

Follow this tutorial to implement monitoring.

Work with generated ontology

  • Install Protégé on your machine
  • Open downloaded file worl.owl
  • Launch reasoner in Protégé (Pellet)
  • Visualize Graph using Protégé plug-in OntoGraf

Example of Knowledge Graph obtained in Protégé: ontlogy

knowledgegraph-terraform-flask-app's People

Contributors

yclerc avatar

Watchers

 avatar

knowledgegraph-terraform-flask-app's Issues

Load foaf ??

Check again experienced bug and see if there is a turnaround

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.