GithubHelp home page GithubHelp logo

nmdp-bioinformatics / phycus Goto Github PK

View Code? Open in Web Editor NEW
7.0 22.0 21.0 59.98 MB

Service used for curation of Haplotype Frequency

License: GNU Lesser General Public License v3.0

Shell 0.35% Java 92.28% Perl 4.00% Makefile 0.26% Python 2.93% Dockerfile 0.17%

phycus's Introduction

Haplotype Frequency Curation

Service to help curate Haplotype Frequencies.

Development Overview

Pre-requisites:

Build the application

Note: must be java version 8.

mvn clean package

Setup Database

The project is setup to use mysql Docker instance for local development.

cd db/
docker-compose up -d

The phpMyAdmin page should be available at http://localhost:9999/. For Windows, you will need to use the IP of the docker VM instead of localhost. You can retrieve the IP with docker-machine ip. Usually it's 192.168.99.100. Login with hfcus_user user and hfcus_user1 as password.

Use docker-compose stop and docker-compose rm to stop and remove the db containers.

Run the application

Special step on windows

On Windows, you need to start the server using the IP address Docker has assigned. Use docker-machine ip to retrieve it. You'll use it to replace localhost in jdbc:mysql://localhost:3306/hfcusdb when you start your server. (See the command below.)

Starting the application

Start your server as a simple java application
Check your JDK version with java -version

With Java 8:

 java -jar target/service-haplotype-frequency-curation-0.0.1.jar

With Java 8 on Windows:

 java  -Dspring.datasource.url="jdbc:mysql://your-ip-address-here:3306/hfcusdb" -jar target/service-haplotype-frequency-curation-0.0.1.jar

With Java 9:

java --add-modules java.xml.bind -jar target/service-haplotype-frequency-curation-0.0.1.jar

You can view the api documentation in swagger-ui by pointing to http://localhost:8080

Using the service

See client/ directory for examples on using it from various languages. Eg. for Perl

Read the User Guide

See docs/ directory for a user guide and examples of how the service is used in practice.

phycus's People

Contributors

dependabot[bot] avatar fscheel avatar hofmannj avatar hpeberhard avatar jbrelsf2-nmdp avatar kaeaton avatar lgragert avatar mmaiers-nmdp avatar mpresteg avatar pbashyal-nmdp avatar sauter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phycus's Issues

Missing Access Control

Currently, there is no type of authentication or authorization on the service. Adding access control might lead to higher quality of the data and should stop forms of spam if they pose a problem. It can also make sure that everyone using the frequencies agreed to certain terms. On the other hand, it poses a higher barrier of entry that could block people from contributing or using this service.

We should decide:

  • whether we need to add those kinds of access control
  • which kinds of access control we want
  • how we implement them
  • how we manage them
  • which endpoints should be secured

perl client

The description of client/perl should read Perl client and not Perl cleint.

SAM_POP should be an attribute of the population not an HFC

SAM_POP is a quality metric that reflects the size of the actual population. For example US African American has a (census) size of 3.77E+07. This should be removed as a quality metric and added as an attribute of population (optional in POST /population

How to handle database migrations?

In connection with #74, I thought about how to handle database migrations. Changes in the database are not a problem for now as we are still in a development stage, butin the future with a dockerized setup, there's no way to handle migrations without manually running docker exec or removing the whole database.

One option would be to use flyway, which has spring support and allows for automated schema migration in both plain SQL and Java. I have no experience using these tools, but we may want to have a look at this (or similar) tools.

Swagger-codegen can't make api/bin #2

make

rm -rf api
swagger-codegen generate --lang python -DpackageName=pyhfcus --input-spec ../../curation-swagger-spec.yaml --output api
Available languages: [android, aspnet5, async-scala, csharp, cpprest, dart, flash, python-flask, go, groovy, java, jaxrs, jaxrs-cxf, jaxrs-resteasy, jaxrs-spec, inflector, javascript, javascript-closure-angular, jmeter, nancyfx, nodejs-server, objc, perl, php, python, qt5cpp, ruby, scala, scalatra, silex-PHP, sinatra, rails5, slim, spring, dynamic-html, html, swagger, swagger-yaml, swift, tizen, typescript-angular2, typescript-angular, typescript-node, typescript-fetch, akka-scala, CsharpDotNet2, clojure, haskell, lumen, go-server]
cp setup.py api
mkdir api/bin
mkdir: cannot create directory 'api/bin': Not a directory
make: *** [Makefile:13: generate] Error 1

How to run PHYCUS locally?

I've been trying to run the service locally to test #69, but I could not get it to run. The README say to use docker-compose up -d but I get the following error:

$ docker-compose up
Pulling web (curation:latest)...
ERROR: The image for the service you're trying to recreate has been removed. If you continue, volume data could be lost. Consider backing up your data before continuing.

Continue with the new image? [yN]y
Pulling web (curation:latest)...
ERROR: pull access denied for curation, repository does not exist or may require 'docker login'

I could use docker build --tag curation:latest . to generate the image, but that results in

curation-interface | standard_init_linux.go:190: exec user process caused "no such file or directory"
curation-interface | standard_init_linux.go:190: exec user process caused "no such file or directory"
curation-interface | standard_init_linux.go:190: exec user process caused "no such file or directory"
curation-interface exited with code 1

No documentation

Currently, there is no documentation about setting the service up, which dependencies are needed or how to call the service. Even the swagger spec is hard to find.

Add constraints to labels

  1. we want to constrain label types
    For example, a labelType of "ICCBBA ION" could refer to the data here
    We don't want label types of ICCCBBAAA ION etc.

  2. we want an explicit way to create new label types (and show what label types are in the database)
    Have a REST endpoint GET/POST LabelType
    Other label types:
    DOI - reference to a manuscript
    PMID - PubMed ID

error when starting db

This looks like something worth investigating:

running in /Users/mmaiers/src/git/phycus-feature/db1

$ docker-compose up -d
[+] Running 1/2
 ⠿ Container phpmyadmin   Started                                                                                                                            3.9s
 ⠿ Container hfcus_mysql  Starting                                                                                                                           3.9s
Error response from daemon: error while creating mount source path '/Users/mmaiers/src/git/phycus-feature/db': mkdir /Users/mmaiers/src: file exists

Why is it trying to mkdir on /Users/mmaiers/src ?

Consider / implement a strategy for avoiding duplicate entities

Consider / implement a strategy for avoiding duplicate entities (Population, Cohort, etc). Perhaps a hash scheme to be implemented in the service(?). Need to define default behavior when a duplicate is detected (silently use existing OR return error)

Label and method list modeling is inconvenient

Both lists define an own type that only includes the list, which leads to an awkward nesting of types when generating server and client code. We could take out the middle layer of LabelList and MethodList.

Missing input validation

The current state of the service is a prototype. As such almost no input validation, cross checks and sanity checks are performed. They need to be implemented.

Create Cohort API in the same pattern of Population API

Create a Cohort API in the same pattern as the Population API except the POST for cohort requires a one (and only one) population ID.
image

GET cohort/
POST cohort/ (name and population ID are required)
GET cohort/{cohort ID}

HFCurationRequest PopulationID required

I am struggling with the fact that the PopulationID of the HFCurationRequest is required where I believe the request either uses the PopulationID (provided by the server) or a reference to PopulationData (upload use case).

Connect an actual consuming client

We've built out some clients to interact with the service, but we should connect a client that actually uses frequencies (HaploStats, etc) to work through the consumer (user) sort of interactions and questions we'll ultimately face on usability of the service.

Allow additional information in error responses

Currently, errors only return an HTTP error number (like 404), but no body at all. Some services may fail on multiple levels to find data in the database (like get /hfc/{submissionId}/cohort may fail to find the submission or it may fail to find an associated cohort). There is no way to convey which part could not be resolved.

docker-compose error

This is strange.
Running docker-compose up -d in the db directory fails because it is trying to create a directory ~/src that already exists!

[+] Running 1/2
 ⠿ Container phpmyadmin   Running                                          0.0s
 ⠙ Container hfcus_mysql  Starting                                         0.2s
Error response from daemon: error while creating mount source path '/Users/mmaiers/src/git/phycus/db': mkdir /Users/mmaiers/src: file exists```

PHYCUS instance currenty returns error 500

When trying to use the perl client to retrieve the current submissions (perl HFCuS-client.pl --action=get_all), I get the following output:

Connecting to http://phycus.b12x.org:8080
Exception when calling DefaultApi->hfc_get: API Exception(500): Internal Server Error
{"timestamp":1523625033744,"status":500,"error":"Internal Server Error","exception":"org.springframework.transaction.CannotCreateTransactionException","message":"Could not open JPA EntityManager for transaction; nested exception is javax.persistence.PersistenceException: org.hibernate.TransactionException: JDBC begin transaction failed: ","path":"/hfc"} at lib/WWW/SwaggerClient/DefaultApi.pm line 90.

Test issue

This is a test issue to demonstrate the webhook

Quality Tags

Categorize the list (HH2016) further.
Some are just "descriptive statistics" or "features".
Some are indicators of how "good" the data is.

At DaSH8 we should implement a few of these using "AWS Lambda"

Simple examples:
RES_MISS_LOCI - depends on GT
Wn Statistic - global 2-locus pairwise LD (depends on GT)
DIV_50_REL - depends on HT only

javaGui is not in master

Somehow the master branch does not have the JavaGui client
(as a side issue this code should live under client/java)

Do we need a more general mechanism for selecting / identifying frequency sets

This started with contemplating how to assign population and cohort information to a frequency set. For example, a population could be associated with a race group (HIS, AFA, CAU, etc) which could transcend cohort boundaries. A cohort could be associated with a set of donors recruited to a registry (BTM / NMDP, etc) which could transcend population boundaries (race group in this case). This implies the need for a many to many relationship between cohort and population. Then, if we consider other means of defining a population (geography, etc), it may be useful to have a more general criteria by which a frequency set can be tagged / annotated (like a selection criteria). Perhaps the label could serve this purpose, but then it may be useful to categorize a set of 'label types'. So, the question is, how can we sufficiently define a means of annotating any frequency set, such that the frequency sets are selectable given any available/reasonable criteria (e.g. shoe size, eye color, etc)? This may be better discussed in person (DaSH 7 - 2017!!). @fscheel @sauter @mmaiers-nmdp @HofmannJ @mhalagan-nmdp @hpeberhard

Should the clients be checked in with the service?

It implies that maintenance of the clients be synchronized with the service, which increases the burden for anyone modifying the service. There is value to the synchronization, but separation seems more appropriate.

Remove "CohortData" from HFCurationRequest and make CohortID(integer) required

HFCurationRequest {
AccessData (AccessData, optional),
AccessID (integer, optional): References a access controls ,
CohortData (CohortData, optional),
CohortID (integer, optional): Cohort ID or genotype list ,
...

change to
HFCurationRequest {
AccessData (AccessData, optional),
AccessID (integer, optional): References a access controls ,
CohortID (integer): Cohort ID or genotype list ,
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.