kids-first / kf-api-dataservice Goto Github PK
View Code? Open in Web Editor NEW:file_cabinet: Primary API for interacting with the Kids First data
Home Page: http://kf-api-dataservice.kidsfirstdrc.org
License: Apache License 2.0
:file_cabinet: Primary API for interacting with the Kids First data
Home Page: http://kf-api-dataservice.kidsfirstdrc.org
License: Apache License 2.0
Problem:
When trying to put/delete a participant that does not exist (kf_id does not exist in db), an exception is thrown because these methods are not returning when they should be.
Solution:
Add return statement before self._not_found() method in both put and delete methods in:
dataservice.api.participant resources.py
Validate:
Write tests for put and delete to verify that put/delete for a kf_id that does not exist, will return the correct response (content and code = 404)
Example code change:
@participant.expect(participant_fields)
def put(self, kf_id):
"""
Update an existing participant
"""
body = request.json
participant = models.participant.query.filter_by(kf_id=kf_id).one_or_none()
if not participant:
return self._not_found(kf_id)
participant.external_id = body.get('external_id')
db.session.commit()
return kf_response(participant, 201, 'participant updated')
This is depending on Alex getting a DB up. Trying to figure out how to mark a ticket in another repo as a blocker... But for now - you get the point.
Since we will continuously be needing to generate dummy data, we will create an app that goes with the data service to populate our data model/service with Dummy Data.
Make sure we populate data model with dummy data when we're done.
User Story
As a Kids First developer, I would like basic CRUD functionality to work with an Aliquot.
Tasks
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
Objects are still being created by manually parsing request bodies. We should move to using marshmallow's load
to do input validation and construction of orm objects.
Users should be able to select which fields they want the api to return by specifying them in the request.
Eg:
GET /participants?fields=kf_id,demographic,diagnoses.pathological_diagnosis
{
"results": [
{
"kf_id": "AABB1123",
"demographic": {
"race": "white",
"age": 324,
"gender": "male"
},
"diagnoses": {
"pathological_diagnosis": "medulloblastoma"
}
}
]
}
Here, the user specifies they want the kf_id
of the participant, the participant's demographic with default fields, and the pathological_diagnosis
field from the participant's diagnosis.
To be able to connect to Postgres in our deployment environments, we will need to maintain a password to Postgres. We are using Vault, so will need to incorporate the retrieval of our password with the Vault cluster, probably using the hvac package.
Per AOC feedback of the portal mockups:
"When they do a query can they?
They want to know who the investigators are
They also want to do a query based on an investigator"
This will require us to add investigator to the data model.
A dockerfile is needed to build a container with our api so that it may be shipped to our container registry during our deployment.
Acceptance Criteria:
Be deployable to our environment
Be usable by other groups (IE OICR) to set up a local dev environment.
For each entity type we'll want to capture a history of updates. Exact implementation TBD as we should lay out some of the key use cases first.
Some builtin concepts that might help with the TBD are trigger functionality to provide similar features to postgres' old time travel feature: https://www.postgresql.org/docs/current/static/contrib-spi.html or temporal tables extensions: https://github.com/arkhipov/temporal_tables
Should use care, as time travel was originally removed because of storage and performance: https://www.postgresql.org/docs/6.3/static/c0503.htm
This needs more conversation with Alex & Dan.
Need Sub domain... Route53 etc
User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Sample.
Tasks
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
Note: This is due EOD Thursday 1/25/18
We need to create an ERD for OICR of the MVP entities
The participant_id
foreign key constraint should be changed to match the kf_id
format, String(8)
, otherwise postgres will fail to create the tables.
When relevant updates come in, they should be synchronously updated to Gen3 to provide authn/authz for the files.
The Jenkins file currently exposes environment variables and other information about our infrastructure. This content should be relocated to the kid-first/aws-ecs-service-type-1 repository inside a dataservice
directory.
After surveying a few people, including the AOC, a better term for people whose data are in Kids First is participant
. This aligns with what people in clinical trails and population studies are caleld.
This will also help distinguish between people who are users versus people whose data is in the Kids First dataset (although these two could overlap in the future).
-Person
-Sample
User Story
As a Kids First developer, I would like basic CRUD functionality to work with an Experiement.
Tasks
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
For the vast majority of current Kids First data, it is trio based (proband/mother/father). However, some of the cohorts are more generally family based and potentially would not have the mother/father, but perhaps two participants that are siblings. So while a rare use case, it is important to support because it can impact analysis, especially for rare diseases where perhaps more unusual family relationships were the only available.
The initial model in #43 has a structure like this:
This would let us capture all of the existing relationships to help verify that the above is actually true. However, it makes a lot of the typical queries of just getting the trio or getting everyone in a family a bit more complex. A proposed structure to support those queries might look like:
But then that can't support relationships where the mother/father isn't present. One could imagine a hybrid of the two, perhaps leaving mother/father in the participant and relegating the FamilyRelationship to only non parent/child relationships.
Need to determine how we want to move forward for an initial MVP.
User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Demographic.
Tasks
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
A standard should be defined for what a Kid's First id looks like and how it is generated.
Create a list of DevOps questions to get AWS ready for dev deployment. We would like to bring this to the Wednesday tech meetings.
The top level resources that return lists should return items in a paginated format using a configurable number of results.
This should be implemented using good practices for pagination against the database such as use of a cursor.
Information about the page number, total number of results, etc. should also be returned either through the header or in a standardized envelope.
Examples:
GET /persons
{
"_links": {
"prev": "/persons?page=1",
"self": "/persons?page=2",
"next": "/persons?page=3"
},
"total": 23,
"limit": 10,
"results": {
{"kf_id": "001"},
{"kf_id": "002"},
...
{"kf_id": "010"}
}
}
Participant Fields
Diagnosis Fields
Research other industries with larger data sets to see how they are tackling some of the problems we are solving.
User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Person.
Tasks
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
We need to define a Jenkins file that will outline the pipeline for testing, building, and deployment in Jenkins. It should look something like the expample @alubneuski laid out here
Most resources will require routes with optional parameters, for example:
POST /persons
- to create a person
GET /persons
- to get a list of persons
GET /persons/1
- to get person with id 1
This requires a resource persons
with optional parameter id
. Our Resource
, Person
should
be able to support these different parameters using a single Resource
class definition.
Many responses are formed inside common envelopes, for example, status messages and pagination.
These should be defined on the api level so that they may be shared among all resources.
User Story
As a Kids First developer, I would like basic CRUD functionality to work with an Outcome/encounter/event. Note - this needs to be spec'd out from a data model perspective. We also need to define the name.
Tasks
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Dataset. Note - this needs to be spec'd out from a data model perspective.
Tasks
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
It would be nice to have the latest commit from every active branch deployed inside the dev environment, as the deployment process is somewhat slow and different feature branches will likely overwrite one-another.
Eg:
I want to view the api with the new add-entities
branch, I should be able to navigate to
add-entities.kf-api-dataservice-dev.kids-first.io
and view that api deployment.
Alternatively, I may want to go to the hash directly, such as:
abc1234.kf-api-dataservice-dev.kids-first.io
Issues to consider:
Any features that make changes to the data model and therefor the database will break other branches. Perhaps we need to create a new database for each branch?
User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Diagnosis.
Tasks
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
As a Data Engineering developer, we need to get continuous integration working for this API. @dankolbman to work with @alubneuski
Create a dummy ID service to assign kf_id until a more robust service can be created.
Flask-RESTPlus gets in the way with regards to error handling and doesn't provide very powerful request parsing. Marshmallow is a popular (de)serialization library that supports inheriting schemas from sqlalchemy models. There is also apispec which will generate much of the swagger documentation using marshmallow schemas.
Replacing Flask-RESTPlus should be straightforward in terms of functionality and will give us much more power in parsing requests and responses as well as reduce the number of hacky work-arounds needed.
User Story
As a Kids First developer, I would like basic CRUD functionality to work with Person Demographics.
Tasks
[ ] Create Model
[ ] Resource
[ ] API: Put, Patch, Delete, Get
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
User Story
As a Kids First developer, I would like basic CRUD functionality to work with Phenotypes. Note - this needs to be spec'd out from a data model perspective.
Tasks
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
User Story
As a Kids First developer, I would like basic CRUD functionality to work with a File.
Tasks
Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API
Although SQLite reduces dependency requirements for development, we'll need to move to Postgres for our deployments. We should at least support this in the ProductionConfig
SQLALCHEMY_DATABASE_URI
.
The externally existing concept is one of a "dbGaP Study". In dbGaP two major use cases for the study are:
The second use case is typically not discussed outside of dbGaP but is an important component for Kids First. In reality, consent is tied to a specific study protocol as determined by an IRB. So someone can participate in multiple different studies resulting in physical samples, genomic data, clinical data, etc. that are covered under the consent of one or more of studies they participated in.
Separately in portal design discussions we had developed the concept of Dataset
being
"a set of data created by a particular entity tied to access, data use limitations, IRB/institutions (specific X01 cohorts (dbGaP))".
So we need to resolve the above into what we're going to track in our data model. The immediate use cases I see are:
There are some foreseeable use cases if we want to bring datasets that aren't managed under the purview of dbGaP. This would include designating "data ownership" and the ability for that data owner to grant access (this is really how I think we should think about dbGaP, it just happens in an automated way from our point of view). For example, including consortium-based or foundation based datasets. This is longer term, but perhaps important to consider so we don't get backed into only supporting the dbGaP use cases.
Switching away from RESTPlus cost us the automatic swagger documentation.
There is still a couple tools to make documentation easier with Marshmallow:
Time box research for Flask-APIspec to no more than 4 hours. If it is not straight forward- then we need to proceed with manual documentation for MVP - which would require a ticket for each resource ~2 pts each.
We need our development Postgres Stood up for the dataservice.
Once we are using postgres, the uuid
field should be changed to use the UUID type.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.