kids-first / kf-api-dataservice Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 1.0 21.71 MB

:file_cabinet: Primary API for interacting with the Kids First data

Home Page: http://kf-api-dataservice.kidsfirstdrc.org

License: Apache License 2.0

Python 99.65% Shell 0.04% Mako 0.06% HTML 0.07% Dockerfile 0.17%

api flask python rest

kf-api-dataservice's People

Contributors

Stargazers

Watchers

Forkers

connorbarnhill

kf-api-dataservice's Issues

Fix Participant put and delete methods in resource class

Problem:
When trying to put/delete a participant that does not exist (kf_id does not exist in db), an exception is thrown because these methods are not returning when they should be.

Solution:
Add return statement before self._not_found() method in both put and delete methods in:
dataservice.api.participant resources.py

Validate:
Write tests for put and delete to verify that put/delete for a kf_id that does not exist, will return the correct response (content and code = 404)

Example code change:

@participant.expect(participant_fields)
    def put(self, kf_id):
        """
        Update an existing participant
        """
        body = request.json
        participant = models.participant.query.filter_by(kf_id=kf_id).one_or_none()
        if not participant:
            return self._not_found(kf_id)

        participant.external_id = body.get('external_id')
        db.session.commit()

        return kf_response(participant, 201, 'participant updated')

Deploy to Development

This is depending on Alex getting a DB up. Trying to figure out how to mark a ticket in another repo as a blocker... But for now - you get the point.

Populate data model with Dummy Data

Create Dummy Data Generator Application

Since we will continuously be needing to generate dummy data, we will create an app that goes with the data service to populate our data model/service with Dummy Data.

Make sure we populate data model with dummy data when we're done.

Create Initial Aliquot Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with an Aliquot.

Tasks

Create Model
Resource
API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Use Marshmallow schemas to deserialize objects

Objects are still being created by manually parsing request bodies. We should move to using marshmallow's load to do input validation and construction of orm objects.

Add fields masks to endpoints

Users should be able to select which fields they want the api to return by specifying them in the request.

Eg:
GET /participants?fields=kf_id,demographic,diagnoses.pathological_diagnosis

{
  "results": [
    {
      "kf_id": "AABB1123",
      "demographic": {
        "race": "white",
        "age": 324,
        "gender": "male"
      },
      "diagnoses": {
        "pathological_diagnosis": "medulloblastoma"
      }
    }
  ]
}

Here, the user specifies they want the kf_id of the participant, the participant's demographic with default fields, and the pathological_diagnosis field from the participant's diagnosis.

To be able to connect to Postgres in our deployment environments, we will need to maintain a password to Postgres. We are using Vault, so will need to incorporate the retrieval of our password with the Vault cluster, probably using the hvac package.

Add Investigator to the data model

Per AOC feedback of the portal mockups:
"When they do a query can they?
They want to know who the investigators are
They also want to do a query based on an investigator"

This will require us to add investigator to the data model.

Create Dockerfile

A dockerfile is needed to build a container with our api so that it may be shipped to our container registry during our deployment.

Acceptance Criteria:
Be deployable to our environment
Be usable by other groups (IE OICR) to set up a local dev environment.

History for entities

For each entity type we'll want to capture a history of updates. Exact implementation TBD as we should lay out some of the key use cases first.

Some builtin concepts that might help with the TBD are trigger functionality to provide similar features to postgres' old time travel feature: https://www.postgresql.org/docs/current/static/contrib-spi.html or temporal tables extensions: https://github.com/arkhipov/temporal_tables

Should use care, as time travel was originally removed because of storage and performance: https://www.postgresql.org/docs/6.3/static/c0503.htm

Have Dataservice available via ECS Service

This needs more conversation with Alex & Dan.

Need Sub domain... Route53 etc

Create Initial 'Sample' Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Sample.

Tasks

Create Model
Resource
API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create MVP ERD

Note: This is due EOD Thursday 1/25/18

We need to create an ERD for OICR of the MVP entities

Participant
- Rename Person to Participant
Sample
File
Demographic
Diagnosis
~Participant Relationships
- (Needs spec’d out)
~Phenotype
- Just HPO ID
- Age at observation (Optional)
~Dataset
- Needs spec’d out
~Outcome/Encounter/Event
- Needs to be spec’d out

Change particpant_id type in sample model

The participant_id foreign key constraint should be changed to match the kf_id format, String(8), otherwise postgres will fail to create the tables.

Sync updates to Gen3

When relevant updates come in, they should be synchronously updated to Gen3 to provide authn/authz for the files.

Move terraform from Jenkinsfile to service type definition

The Jenkins file currently exposes environment variables and other information about our infrastructure. This content should be relocated to the kid-first/aws-ecs-service-type-1 repository inside a dataservice directory.

Rename person to participant

After surveying a few people, including the AOC, a better term for people whose data are in Kids First is participant. This aligns with what people in clinical trails and population studies are caleld.

This will also help distinguish between people who are users versus people whose data is in the Kids First dataset (although these two could overlap in the future).

Confirm Swagger Specs for Resources created in Sprint 0

-Person
-Sample

Create initial SequencingExperiment Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with an Experiement.

Tasks

Create Model
Resource
API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create FamilyRelationship Model

For the vast majority of current Kids First data, it is trio based (proband/mother/father). However, some of the cohorts are more generally family based and potentially would not have the mother/father, but perhaps two participants that are siblings. So while a rare use case, it is important to support because it can impact analysis, especially for rare diseases where perhaps more unusual family relationships were the only available.

The initial model in #43 has a structure like this:

This would let us capture all of the existing relationships to help verify that the above is actually true. However, it makes a lot of the typical queries of just getting the trio or getting everyone in a family a bit more complex. A proposed structure to support those queries might look like:

But then that can't support relationships where the mother/father isn't present. One could imagine a hybrid of the two, perhaps leaving mother/father in the participant and relegating the FamilyRelationship to only non parent/child relationships.

Need to determine how we want to move forward for an initial MVP.

Automatically generate ERD from database schema

We could generate erds manually and commit them to an internal documents folder, or maybe have a diagram automatically created and served over an endpoint.

Here are some tools people use:
dbvis
ERAlchemy

Create initial 'Demographic' Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Demographic.

Tasks

Create Model
Resource
API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create ID service/mechanism to assign kf_id's

A standard should be defined for what a Kid's First id looks like and how it is generated.

Fix "/" path to return 200

Create list of DevOPS questions for OICR

Create a list of DevOps questions to get AWS ready for dev deployment. We would like to bring this to the Wednesday tech meetings.

Add pagination to participant resource

The top level resources that return lists should return items in a paginated format using a configurable number of results.

This should be implemented using good practices for pagination against the database such as use of a cursor.

Information about the page number, total number of results, etc. should also be returned either through the header or in a standardized envelope.

Examples:

GET /persons

{
  "_links": {
    "prev": "/persons?page=1",
    "self": "/persons?page=2",
    "next": "/persons?page=3"
  },
  "total": 23,
  "limit": 10,
  "results": {
    {"kf_id": "001"},
    {"kf_id": "002"},
    ...
    {"kf_id": "010"}
  }
}

List Desired Enums

Participant Fields

Race
Gender
Ethnicity

Diagnosis Fields

diagnosis_category

Research Spike: Other Industry Research Papers

Research other industries with larger data sets to see how they are tackling some of the problems we are solving.

Create Initial 'Person' Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Person.

Tasks

Create Model
Resource
API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Add Jenkinsfile for CI/CD

We need to define a Jenkins file that will outline the pipeline for testing, building, and deployment in Jenkins. It should look something like the expample @alubneuski laid out here

Implement optional routing parameters to consolidate resources

Most resources will require routes with optional parameters, for example:

POST /persons - to create a person
GET /persons - to get a list of persons
GET /persons/1 - to get person with id 1

This requires a resource persons with optional parameter id. Our Resource, Person should
be able to support these different parameters using a single Resource class definition.

Abstract common models to the base API

Many responses are formed inside common envelopes, for example, status messages and pagination.

These should be defined on the api level so that they may be shared among all resources.

Create Initial Outcome Model

User Story
As a Kids First developer, I would like basic CRUD functionality to work with an Outcome/encounter/event. Note - this needs to be spec'd out from a data model perspective. We also need to define the name.

Tasks

Create Model
Resource
API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create initial Dataset Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Dataset. Note - this needs to be spec'd out from a data model perspective.

Tasks

Create Model
Resource
API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Deploy multiple branches into dev

It would be nice to have the latest commit from every active branch deployed inside the dev environment, as the deployment process is somewhat slow and different feature branches will likely overwrite one-another.

Eg:
I want to view the api with the new add-entities branch, I should be able to navigate to
add-entities.kf-api-dataservice-dev.kids-first.io and view that api deployment.
Alternatively, I may want to go to the hash directly, such as:
abc1234.kf-api-dataservice-dev.kids-first.io

Issues to consider:
Any features that make changes to the data model and therefor the database will break other branches. Perhaps we need to create a new database for each branch?

Create Initial Diagnosis Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Diagnosis.

Tasks

Create Model
Resource
API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Get CI Working for our data service API

As a Data Engineering developer, we need to get continuous integration working for this API. @dankolbman to work with @alubneuski

Create Dummy kf_id service

Create a dummy ID service to assign kf_id until a more robust service can be created.

Re-write serialization layer in Marshmallow

Flask-RESTPlus gets in the way with regards to error handling and doesn't provide very powerful request parsing. Marshmallow is a popular (de)serialization library that supports inheriting schemas from sqlalchemy models. There is also apispec which will generate much of the swagger documentation using marshmallow schemas.
Replacing Flask-RESTPlus should be straightforward in terms of functionality and will give us much more power in parsing requests and responses as well as reduce the number of hacky work-arounds needed.

Create & Configure Repo

Create Initial 'Demographic' Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with Person Demographics.

Tasks
[ ] Create Model
[ ] Resource
[ ] API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create Initial Phenotype Model

User Story
As a Kids First developer, I would like basic CRUD functionality to work with Phenotypes. Note - this needs to be spec'd out from a data model perspective.

Tasks

Create Model
Resource
API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create initial GenomicFile Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a File.

Tasks

Create Model
Resource
API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Switch to use Postgres as primary backend

Although SQLite reduces dependency requirements for development, we'll need to move to Postgres for our deployments. We should at least support this in the ProductionConfig SQLALCHEMY_DATABASE_URI.

Create initial Workflow Model

Create initial Study model

The externally existing concept is one of a "dbGaP Study". In dbGaP two major use cases for the study are:

Access control, once you're approved for access you have access to all data in the study
Consent groups for DUL and reviewing of access requests

The second use case is typically not discussed outside of dbGaP but is an important component for Kids First. In reality, consent is tied to a specific study protocol as determined by an IRB. So someone can participate in multiple different studies resulting in physical samples, genomic data, clinical data, etc. that are covered under the consent of one or more of studies they participated in.

Separately in portal design discussions we had developed the concept of Dataset being
"a set of data created by a particular entity tied to access, data use limitations, IRB/institutions (specific X01 cohorts (dbGaP))".

So we need to resolve the above into what we're going to track in our data model. The immediate use cases I see are:

Tracking what dbGaP study each file (or perhaps entity as well) belongs to for authorized access
Tracking a participants consent tied to a specific IRB study protocol and which of the entities related to that participant are covered under that consent, especially if a participant is under more than one dbGaP study

There are some foreseeable use cases if we want to bring datasets that aren't managed under the purview of dbGaP. This would include designating "data ownership" and the ability for that data owner to grant access (this is really how I think we should think about dbGaP, it just happens in an automated way from our point of view). For example, including consortium-based or foundation based datasets. This is longer term, but perhaps important to consider so we don't get backed into only supporting the dbGaP use cases.

Research Best Methods for Swagger Documentation

Switching away from RESTPlus cost us the automatic swagger documentation.
There is still a couple tools to make documentation easier with Marshmallow:

apispec
flasgger

Time box research for Flask-APIspec to no more than 4 hours. If it is not straight forward- then we need to proceed with manual documentation for MVP - which would require a ticket for each resource ~2 pts each.

Create Postgres in AWS

We need our development Postgres Stood up for the dataservice.

Change uuid fields to postgres UUID type

Once we are using postgres, the uuid field should be changed to use the UUID type.

kids-first / kf-api-dataservice Goto Github PK

kf-api-dataservice's People

Contributors

Stargazers

Watchers

Forkers

kf-api-dataservice's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs