GithubHelp home page GithubHelp logo

bloomtech-labs / bridges-to-prosperity-ds-b Goto Github PK

View Code? Open in Web Editor NEW
2.0 9.0 4.0 29.87 MB

Features AWS hosted API for the visualization and prediction of bridge sites for Bridges to Prosperity

Home Page: https://bridgestoprosperity.org/

License: MIT License

Dockerfile 0.01% Python 0.03% Jupyter Notebook 99.97%
fastapi docker-image data-visualizations aws

bridges-to-prosperity-ds-b's Introduction

Bridges To Prosperity

1️⃣ Bridge of Prosperity Data Science API

You can find the deployed project frontend at https://b.bridgestoprosperity.dev/

You can find the deployed data science API at http://bridges-to-presperity-08272020.eba-3nqy3zpc.us-east-1.elasticbeanstalk.com/

4️⃣ Contributors

Alex Kaiser Jake Krafczyk Ping Ao

Project Overview

1️⃣ Trello Board

1️⃣ Web Backend

1️⃣ Web Frontend

Data Sets

Final Datasets in either CSV or XLSX

Description

Our API provides various merged and integrated Bridges-to-Prosperity bridge data endpoints, passing Rwandan bridge site data to the web backend/frontend application. The API is basted on the FastAPI framework, and hosted via AWS Elastic Beanstalk.

Detailed instructions on how to get started with FastAPI, Docker and AWS web deployment via Elastic Beanstalk can be found in this ds starter readme.

Tech stack

  • AWS Elastic Beanstalk: Platform as a service, hosts your API.
  • Docker: Containers, for reproducible environments.
  • FastAPI: Web framework. Like Flask, but faster, with automatic interactive docs.
  • Pandas: Open source data analysis and manipulation tool.
  • Flake8: Linter, enforces PEP8 style guide.
  • FuzzyWuzzy: Fuzzy string matching like a boss.
  • Plotly: Visualization library, for Python & JavaScript.
  • Pytest: Testing framework, runs your unit tests.

API Endpoints

Getting started

Create a new repository from this template.

Clone the repo

git clone https://github.com/YOUR-GITHUB-USERNAME/YOUR-REPO-NAME.git

cd YOUR-REPO-NAME

Build the Docker image

docker-compose build

Run the Docker image

docker-compose up

Go to localhost:8000 in your browser.




There you'll see the API documentation as well as several distinct endpoints:

  • An endpoint for POST requests, /raw: Initial endpoint returning raw site assessment data as provided by B2P for initial probing by web-backend.

  • An endpoint for POST requests, /sites: Endpoint returning clean site assessment data.

  • An endpoint for POST requests, /villages: Endpoint returning clean village and ID data as provided by the Gov. of Rwanda.

  • An endpoint for POST requests, /final-data: Endpoint returning merged data following the agreed upon format.


{
  "project_code": "1014328",
  "province": "Southern Province",
  "district": "Kamonyi",
  "sector": "Gacurabwenge",
  "cell": "Gihinga",
  "village": "Kagarama",
  "village_id": "28010101",
  "name": "Kagarama",
  "type": "Suspension",
  "stage": "Rejected",
  "sub_stage": "Technical",
  "Individuals_directly_served": 0,
  "span": 0,
  "lat": -1.984548,
  "long": 29.931428,
  "communities_served": "['Kagarama', 'Ryabitana', 'Karama', 'Rugogwe', 'Karehe']"
},...

  • An endpoint for POST requests, /final-data/extended: Similar to the /final-data endpoint, but provides additional information on district_id, sector_id, cell_id, form, case_safe_id, opportunity_id, and country.

{
  "project_code": "1014107",
  "province": "Western Province",
  "district": "Rusizi",
  "district_id": 36,
  "sector": "Giheke",
  "sector_id": "3605",
  "cell": "Gakomeye",
  "cell_id": "360502",
  "village": "Buzi",
  "village_id": "36050201",
  "name": "Buzi",
  "type": "Suspended",
  "stage": "Rejected",
  "sub_stage": "Technical",
  "Individuals_directly_served": 0,
  "span": 0,
  "lat": -2.42056,
  "long": 28.9662,
  "communities_served": "['Buzi', 'Kabuga', 'Kagarama', 'Gacyamo', 'Gasheke']",
  "form": "Project Assessment - 2018.10.29",
  "case_safe_id": "a1if1000002e51bAAA",
  "opportunity_id": "006f100000d1fk1",
  "country": "Rwanda"
},

File structure

Overall the file structure should be very intuitive and easy to follow.

data/ contains anything related to datasets or images.

notebooks/ is where any additional notebooks used for initial data exploration, data cleaning, and the extensive data merging procedures are stored.

/project/requirements.txt is where you add Python packages that your app requires. Then run docker-compose build to re-build your Docker image.

├── data
|    ├── edit
|    ├── final
|    ├── image
|    └── raw
|
├── notebooks
|
├── project
|    ├── requirements.txt
     └── app
          ├── __init__.py
          ├── main.py
          ├── api
          │   ├── __init__.py
          │   ├── raw.py
          │   ├── sites.py
          │   ├── villages.py
          │   └── final_data_extended.py
          │   └── final_data.py
          └── tests
                ├── __init__.py
                ├── test_main.py
                ├── test_predict.py
                └── test_viz.py

Non-compiled Endpoints

For the three non-compiled endpoints /raw, /sites and /villages we used pandas in order to load the respective datasets found in data/raw and converted them into JSON objects using the standard json library.
An example for the simple endpoint setup is shown below.

./project/app/api/villages.py

# Imports
from fastapi import APIRouter
import pandas as pd
import json

router = APIRouter()

names_codes = "https://raw.githubusercontent.com/Lambda-School-Labs/Labs25-Bridges_to_Prosperity-TeamB-ds/main/data/edit/Rwanda_Administrative_Levels_and_Codes_Province_through_Village_clean_2020-08-25.csv"
names_codes = pd.read_csv(names_codes)

# /villages endpoint
@router.get("/villages")
async def villages():
    output = names_codes.to_json(orient="records")
    parsed = json.loads(output)
    return parsed

Merged dataset Endpoints

The two deployed production endpoints /final-data and /final-data/extended follow a slightly different approach as the returned JSON object data required a specific structure in order to be integrated with the web-backend application.

the CSV dataset was loaded with the request library:

request = requests.get(
        "https://raw.githubusercontent.com/Lambda-School-Labs/Labs25-Bridges_to_Prosperity-TeamB-ds/main/data/edit/B2P_Rwanda_Sites%2BIDs_full_2020-09-21.csv"
    )
    buff = io.StringIO(request.text)
    directread = csv.DictReader(buff)

And data objects/dictionaries were assembled by looping over directread

    # Loop over rows and return according to desired format
    for row in directread:

        # splitting "communities_served" into list of strings with every
        # iteration
        if len(row["communities_served"]) == 0:
            communities_served = ["unavailable"]
        else:
            communities_served = list(row["communities_served"].split(", "))

        # Set key for dictionary
        key = row["project_code"]

        # Set output format
        output[key] = {
            "project_code": row["project_code"],
            "province": row["province"],
            "district": row["district"],
            "sector": row["sector"],
            "cell": row["cell"],
            "village": row["village"],
            "village_id": row["village_id"],
            "name": row["name"],
            "type": row["type"],
            "stage": row["stage"],
            "sub_stage": row["sub_stage"],
            "Individuals_directly_served": int(row["Individuals_directly_served"]),
            "span": int(row["span"]),
            "lat": float(row["lat"]),
            "long": float(row["long"]),
            "communities_served": communities_served,
        }

Additional files

app/main.py is where you edit your app's title and description, which are displayed at the top of the your automatically generated documentation. This file also configures "Cross-Origin Resource Sharing", which you shouldn't need to edit.

app/api/predict.py defines the Machine Learning endpoint. /predict accepts POST requests and responds with random predictions. In a notebook, train your model and pickle it. Then in this source code file, unpickle your model and edit the predict function to return real predictions.

When your API receives a POST request, FastAPI automatically parses and validates the request body JSON, using the Item class attributes and functions. Edit this class so it's consistent with the column names and types from your training dataframe.

Deployment to AWS

Web deployment of the API was done analogously to the procedure described in the ds starter readme.

We used Docker to build the image locally, test it, then pushed it to Docker Hub.

docker build -f project/Dockerfile -t YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME ./project

docker login

docker push YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME

Then we used the EB CLI:

git add --all

git commit -m "Your commit message"

eb init -p docker YOUR-APP-NAME --region us-east-1

eb create YOUR-APP-NAME

eb open

To redeploy:

  • git commit ...
  • docker build ...
  • docker push ...
  • eb deploy
  • eb open

URLs to Deployed Endpoints



Testing

We used FastAPIs build in TestClient to test endpoints.

bridges-to-prosperity-ds-b's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.