Bridges To Prosperity

1️⃣ Bridge of Prosperity Data Science API

You can find the deployed project frontend at https://b.bridgestoprosperity.dev/

You can find the deployed data science API at http://bridges-to-presperity-08272020.eba-3nqy3zpc.us-east-1.elasticbeanstalk.com/

4️⃣ Contributors

Alex Kaiser	Jake Krafczyk	Ping Ao

Project Overview

1️⃣ Trello Board

1️⃣ Web Backend

1️⃣ Web Frontend

Data Sets

Final Datasets in either CSV or XLSX

Description

Our API provides various merged and integrated Bridges-to-Prosperity bridge data endpoints, passing Rwandan bridge site data to the web backend/frontend application. The API is basted on the FastAPI framework, and hosted via AWS Elastic Beanstalk.

Detailed instructions on how to get started with FastAPI, Docker and AWS web deployment via Elastic Beanstalk can be found in this ds starter readme.

Tech stack

AWS Elastic Beanstalk: Platform as a service, hosts your API.
Docker: Containers, for reproducible environments.
FastAPI: Web framework. Like Flask, but faster, with automatic interactive docs.
Pandas: Open source data analysis and manipulation tool.
Flake8: Linter, enforces PEP8 style guide.
FuzzyWuzzy: Fuzzy string matching like a boss.
Plotly: Visualization library, for Python & JavaScript.
Pytest: Testing framework, runs your unit tests.

API Endpoints

Getting started

Create a new repository from this template.

Clone the repo

git clone https://github.com/YOUR-GITHUB-USERNAME/YOUR-REPO-NAME.git

cd YOUR-REPO-NAME

Build the Docker image

docker-compose build

Run the Docker image

docker-compose up

Go to localhost:8000 in your browser.

There you'll see the API documentation as well as several distinct endpoints:

An endpoint for POST requests, /raw: Initial endpoint returning raw site assessment data as provided by B2P for initial probing by web-backend.
An endpoint for POST requests, /sites: Endpoint returning clean site assessment data.
An endpoint for POST requests, /villages: Endpoint returning clean village and ID data as provided by the Gov. of Rwanda.
An endpoint for POST requests, /final-data: Endpoint returning merged data following the agreed upon format.

{
  "project_code": "1014328",
  "province": "Southern Province",
  "district": "Kamonyi",
  "sector": "Gacurabwenge",
  "cell": "Gihinga",
  "village": "Kagarama",
  "village_id": "28010101",
  "name": "Kagarama",
  "type": "Suspension",
  "stage": "Rejected",
  "sub_stage": "Technical",
  "Individuals_directly_served": 0,
  "span": 0,
  "lat": -1.984548,
  "long": 29.931428,
  "communities_served": "['Kagarama', 'Ryabitana', 'Karama', 'Rugogwe', 'Karehe']"
},...

An endpoint for POST requests, /final-data/extended: Similar to the /final-data endpoint, but provides additional information on district_id, sector_id, cell_id, form, case_safe_id, opportunity_id, and country.

{
  "project_code": "1014107",
  "province": "Western Province",
  "district": "Rusizi",
  "district_id": 36,
  "sector": "Giheke",
  "sector_id": "3605",
  "cell": "Gakomeye",
  "cell_id": "360502",
  "village": "Buzi",
  "village_id": "36050201",
  "name": "Buzi",
  "type": "Suspended",
  "stage": "Rejected",
  "sub_stage": "Technical",
  "Individuals_directly_served": 0,
  "span": 0,
  "lat": -2.42056,
  "long": 28.9662,
  "communities_served": "['Buzi', 'Kabuga', 'Kagarama', 'Gacyamo', 'Gasheke']",
  "form": "Project Assessment - 2018.10.29",
  "case_safe_id": "a1if1000002e51bAAA",
  "opportunity_id": "006f100000d1fk1",
  "country": "Rwanda"
},

File structure

Overall the file structure should be very intuitive and easy to follow.

data/ contains anything related to datasets or images.

notebooks/ is where any additional notebooks used for initial data exploration, data cleaning, and the extensive data merging procedures are stored.

/project/requirements.txt is where you add Python packages that your app requires. Then run docker-compose build to re-build your Docker image.

├── data
|    ├── edit
|    ├── final
|    ├── image
|    └── raw
|
├── notebooks
|
├── project
|    ├── requirements.txt
     └── app
          ├── __init__.py
          ├── main.py
          ├── api
          │   ├── __init__.py
          │   ├── raw.py
          │   ├── sites.py
          │   ├── villages.py
          │   └── final_data_extended.py
          │   └── final_data.py
          └── tests
                ├── __init__.py
                ├── test_main.py
                ├── test_predict.py
                └── test_viz.py

Non-compiled Endpoints

For the three non-compiled endpoints /raw, /sites and /villages we used pandas in order to load the respective datasets found in data/raw and converted them into JSON objects using the standard json library.
An example for the simple endpoint setup is shown below.

./project/app/api/villages.py

# Imports
from fastapi import APIRouter
import pandas as pd
import json

router = APIRouter()

names_codes = "https://raw.githubusercontent.com/Lambda-School-Labs/Labs25-Bridges_to_Prosperity-TeamB-ds/main/data/edit/Rwanda_Administrative_Levels_and_Codes_Province_through_Village_clean_2020-08-25.csv"
names_codes = pd.read_csv(names_codes)

# /villages endpoint
@router.get("/villages")
async def villages():
    output = names_codes.to_json(orient="records")
    parsed = json.loads(output)
    return parsed

Merged dataset Endpoints

The two deployed production endpoints /final-data and /final-data/extended follow a slightly different approach as the returned JSON object data required a specific structure in order to be integrated with the web-backend application.

the CSV dataset was loaded with the request library:

request = requests.get(
        "https://raw.githubusercontent.com/Lambda-School-Labs/Labs25-Bridges_to_Prosperity-TeamB-ds/main/data/edit/B2P_Rwanda_Sites%2BIDs_full_2020-09-21.csv"
    )
    buff = io.StringIO(request.text)
    directread = csv.DictReader(buff)

And data objects/dictionaries were assembled by looping over directread

    # Loop over rows and return according to desired format
    for row in directread:

        # splitting "communities_served" into list of strings with every
        # iteration
        if len(row["communities_served"]) == 0:
            communities_served = ["unavailable"]
        else:
            communities_served = list(row["communities_served"].split(", "))

        # Set key for dictionary
        key = row["project_code"]

        # Set output format
        output[key] = {
            "project_code": row["project_code"],
            "province": row["province"],
            "district": row["district"],
            "sector": row["sector"],
            "cell": row["cell"],
            "village": row["village"],
            "village_id": row["village_id"],
            "name": row["name"],
            "type": row["type"],
            "stage": row["stage"],
            "sub_stage": row["sub_stage"],
            "Individuals_directly_served": int(row["Individuals_directly_served"]),
            "span": int(row["span"]),
            "lat": float(row["lat"]),
            "long": float(row["long"]),
            "communities_served": communities_served,
        }

Additional files

app/main.py is where you edit your app's title and description, which are displayed at the top of the your automatically generated documentation. This file also configures "Cross-Origin Resource Sharing", which you shouldn't need to edit.

app/api/predict.py defines the Machine Learning endpoint. /predict accepts POST requests and responds with random predictions. In a notebook, train your model and pickle it. Then in this source code file, unpickle your model and edit the predict function to return real predictions.

When your API receives a POST request, FastAPI automatically parses and validates the request body JSON, using the Item class attributes and functions. Edit this class so it's consistent with the column names and types from your training dataframe.

Deployment to AWS

Web deployment of the API was done analogously to the procedure described in the ds starter readme.

We used Docker to build the image locally, test it, then pushed it to Docker Hub.

docker build -f project/Dockerfile -t YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME ./project

docker login

docker push YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME

Then we used the EB CLI:

git add --all

git commit -m "Your commit message"

eb init -p docker YOUR-APP-NAME --region us-east-1

eb create YOUR-APP-NAME

eb open

To redeploy:

git commit ...
docker build ...
docker push ...
eb deploy
eb open

URLs to Deployed Endpoints

API test interface: http://bridges-to-presperity-08272020.eba-3nqy3zpc.us-east-1.elasticbeanstalk.com/

Data output in desired format: http://bridges-to-presperity-08272020.eba-3nqy3zpc.us-east-1.elasticbeanstalk.com/final-data

Data output with some extended information: http://bridges-to-presperity-08272020.eba-3nqy3zpc.us-east-1.elasticbeanstalk.com/final-data/extended

Testing

We used FastAPIs build in TestClient to test endpoints.

bloomtech-labs / bridges-to-prosperity-ds-b Goto Github PK