GithubHelp home page GithubHelp logo

kids-first / kf-portal-etl-task-service Goto Github PK

View Code? Open in Web Editor NEW

This project forked from overture-stack/microservice-template-java

2.0 7.0 1.0 806 KB

:white_check_mark: A template for a microservice written in Java and a resource server under Ego with JWT authorization

License: GNU Affero General Public License v3.0

Java 54.08% Shell 45.19% Dockerfile 0.73%

kf-portal-etl-task-service's Introduction

Kids First ETL Task Runner


Microservice to execute ETL as a FSM process when requested by the Kids First Release Coordinator

Table of Contents

Introduction

License Codacy Badge

This application will initiate ETL tasks as requested by the Kids First Release Coordinator. Tasks are modeled as Finite State Machines (FSMs) and specific state transitions (initialize, run, publish) can be started by HTTPS messages from the Release Coordinator.

ETL processes are run in docker containers which are created when the task is Run, and terminate when the ETL is staged. This allows multiple tasks to be run simultaneously in distinct containers. The ETL publish step is performed by the Java Application.

Although this application is built to accomplish the specific work of the KF ETL, the this is built with a configuration model that makes it easy to adopt any other dockerized task that the release controller wants to manage.

Authorization for requests made to the Task Runner is performed via JWTs from an EGO server. A valid token with Admin permissions for a recognized Task Coordinator application is required for any action to be taken on any request to this ETL Task Runner.

Task Model

Each Task is and FSM with the following state-transitions (yellow and green boxes), which can be initiated by the given requests (white diamonds).

State Transition Diagram for Tasks

The possible Task Commands from the Release coordinator are: Initialize, Status, Run, Publish. These steps match the requests outlined in the Kids First Task Coordinator documentation.

  1. Initialize - This will check if the task runner is able to run a new task. Any checks that are needed ahead of starting a task are performed here. For any ETL tasks, this will include checking permissions to access the studies that will be requested in the ETL feed. On success this will return a unique task code to be used for this task.

  2. Run - Start the task running and stage the results. Given ID for a task currently in PENDING status from the Intialize step, and any required variables for the task, a docker container will be created and the task run.

  3. Publish - Given a task ID that is currently in STAGED status after completing the Run step, this will publish the results.

Requirements

The application can be run locally or in a docker container, the requirements for each setup are listed below.

Auth0

This application required a valid Auth0 token. Auth0 can be configured through the configuration :

auth0:
  issuer: "https://kids-first.auth0.com/"
  apiAudience: "https://kf-release-coord.kidsfirstdrc.org"

Only token granted with type client-credentials are considered valid (see Auth0 Documentation

Local

Docker

Quick Start

Make sure the JWT Verification Key URL is configured, then you can run the server in a docker container or on your local machine.

Configure JWT Verification Key

Update application.yml. Set auth.jwt.publicKeyUrl to the URL to fetch the JWT verification key. The application will not start if it can't set the verification key for the JWTConverter.

The default value in the application.yml file is set to connect to EGO running locally on its default port 8081.

Run Local

$ mvn spring-boot:run

Application will run by default on port 1234

Configure the port by changing server.port in application.yml

Run Docker

First build the image:

$ docker-compose build

When ready, run it:

$ docker-compose up

Application will run by default on port 1234

Configure the port by changing services.api.ports in docker-compose.yml. Port 1234 was used by default so the value is easy to identify and change in the configuration file.

Testing

TODO: Additional instructions for testing the application.

API

The task runner provides 2 services that can be called by the Release Coordinator:

  • /status - General health and version status of the Task Runner
  • /tasks - Commands to create and interact with tasks:
    • Initiate new task
    • Begin Staging or Publishing existing Task
    • Query Status of Task by task ID

Api Specifications can be found here.

Acknowledgements

Services provided to accomplish the task coordination process as defined in the Kids First Task Coordinator

JWT Authentication model provided by Overture's Ego service. This microservice is built on a fork of the Ego Microservice Template for Java

kf-portal-etl-task-service's People

Contributors

joneubank avatar rtisma avatar jecos avatar adipaul1981 avatar andricdu avatar evans-g-crsj avatar blackdenc avatar fgerthoffert avatar

Stargazers

Bruno Grande avatar Daniel Kolbman avatar

Watchers

James Cloos avatar Allison Heath avatar  avatar Trevar Simmons avatar  avatar Karthik K avatar  avatar

Forkers

rtisma

kf-portal-etl-task-service's Issues

Remove "All rights reserved" in the copyright notice

We need to remote the mention of the "All rights reserved" in the copyright in the header of every file. We will need the authorization of the copyright owner, OICR, in order to do so. It's not true that all rights are reserved. Some are, but some aren't, and they are detailed in the license we use. This mention of "All rights reserved" interferes with the terms of the open source license.

Example: https://github.com/kids-first/kf-portal-etl-task-service/blob/master/src/test/java/io/kf/coordinator/ETLCoordinatorTaskMainTests.java#L2

We should probably also add "Centre de recherche du CHU Sainte-Justine" as a new co-owner of the copyright, for their contributions.

Remove hardcoded Bearer

currently, KPETS prefixes the incomming token with Bearer. This is not correct, it should just use the Authorization field as is.

Timeout for hanging docker kf-etl

Sometimes, the spark in the kf-etl docker container errors out, but does not exist the container. there should be a time out or someway to detect that spark errored out

Application based JWT

the task service should be able to request its own JWT and use that when making authorized requests to rollcall and the coordinator. This would mean the task service would have its own credentials when asking for a jwt from ego, and these credentials need to be stored in vault. In addition, rollcall has to be aware of the shape of the in-comming jwt.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.