pip install --user --upgrade https://github.com/pepkit/pepkit/archive/master.zip
pip install --user --upgrade https://github.com/pepkit/pepkit/archive/dev.zip
A web API and database for biological sample metadata
Home Page: https://pephub.databio.org
License: BSD 2-Clause "Simplified" License
The custom validator gives an error for any input:
TypeError: NetworkError when attempting to fetch resource.
The original validator was still working successfully. This only happens in the deployed instance, not when running locally, where it works. After spending several hours troubleshooting, I realized:
Network
tab in the console, it was mentioning something about a redirect, and Mixed content being returned. "POST /eido/validate/pep HTTP/1.1" 307 Temporary Redirect
whenever I would try to use the new validatorThis led me to some searching about fastapi and 307 redirects and I realized that a missing trailing slash triggers this. Sure enough:
Trailing slash:
Line 170 in cb109b6
Line 231 in cb109b6
It has something to do with the redirect not being https aware, maybe? Anyway, I think the fix is just to remove that slash in the endpoint definition.
@nsheff As I work on these endpoints, I am just returning raw data with nothing else. For example this is the API response for /pep/demo/subsample1/config
:
pep_version: "2.0.0"
sample_table: sample_table.csv
subsample_table: subsample_table.csv
looper:
output_dir: $HOME/example_results
It's the config file in yaml
, but nothing else. Should we be standardizing the return data packages? Something like...
{
"status": "ok",
"message": "success",
"data": {
"pep_version": "2.0.0",
"sample_table": "sample_table.csv",
"subsample_table": "subsample_table.csv",
"looper": {
"output_dir": "$HOME/example_results"
}
}
}
Does FastAPI have a way to standardize responses, sort of how it has a way to standardize request handling (with dependencies, route prefixes, etc)?
Right now, tables are showing Python-syntax lists when there are multiple values for a given entry.
For example, http://pephub.databio.org/pep/nfcore/demo_rna_pep/view
I think I wouldn't show embedded values like this, but think of some other way to display this.
Is there a reason you're pinning the version of every dependency here ? https://github.com/pepkit/pephub/blob/master/requirements/requirements-all.txt
I am not a fan of pinning versions in requirement files -- I prefer at least generally to use minimum bounds, as it makes it more flexible when things get upgraded, I don't have pip yelling at me
Also, this dependency list is extreme, many of these would be covered by sub-dependencies... see, for example, how we usually do it:
https://github.com/refgenie/refgenie/blob/master/requirements/requirements-all.txt
https://pephub.databio.org/pep/geo_recent/GSE179805/view?tag=raw
Now if there's 10k entries in say, the geofetch namespace, it takes a long time to load these pages because there are so many entries.
we don't need to retrieve all of those, somehow.
Right now the namespace output is an object, with keys corresponding to projects in that namspace, and values are, I guess, the filepath on the local server to the project config file?
I see how the keys are useful, because they are the identifiers that can be used to get further information about the project. But what's the point of those values?
I think this endpoint should be serving an overview about the projects. What more useful information could be served here? Maybe number of samples in the project?
Or maybe it's just a list of the keys. The local paths just confuse things.
As heavily discussed in #14. I can't seem to add route example request data to /docs
when the path parameters are defined through dependency injection. I thought I'd pull it out to its own issue here to track that progress instead of leaving the PR in purgatory.
I guess I am having an issue declaring request data examples for specific endpoint/route parameters when said endpoint/route parameters are defined through dependency injection. In the below example, I need to verify that the namespace actually exists prior to returning data about it. To do that, I followed this example in the FastAPI docs. It works great.
However, in /docs
, I'd love to give examples for namespaces. To try and do that, I followed this example in the FastAPI docs.
Visiting /docs
, however, provides the user with no example request data:
If I comment out the global dependencies like so:
router = APIRouter(
prefix="/pep/{namespace}",
# dependencies=[Depends(verify_namespace)],
)
The example is shown. I can also verify that declaring requirements in the Path()
instance like min_length
or max_length
, these requirements are honored by FastAPI (even though I can't see the requirements in /docs
).
# main.py
from fastapi import FastAPI, JSONResponse
from .routers import namespace
app = FastAPI()
app.include_router(
namespace.router
)
# routers/namespace.py
from fastapi import APIRouter, Depends
from ..dependencies import *
from ..main import _PEP_STORES
from ..route_examples import example_namespace
router = APIRouter(
prefix="/pep/{namespace}",
dependencies=[Depends(verify_namespace)],
)
@router.get("/", summary="Fetch details about a particular namespace.")
async def get_namespace(namespace: str = example_namespace):
"""Fetch namespace. Returns a JSON representation of the namespace and the projects inside it."""
return JSONResponse(content=_PEP_STORES[namespace])
# dependencies.py
from .main import _PEP_STORES
def verify_namespace(namespace: str = example_namespace) -> None:
if namespace not in _PEP_STORES:
raise HTTPException(status_code=404, detail=f"namespace '{namespace}' not found.")
# route_examples.py
from fastapi import Path
example_namespace = Path(
...,
description="A namespace that holds projects.",
example="demo",
)
Example request:
GET https://pephub.databio.org/pep/geo/GSE124224/convert?filter=csv?DATA=some/local/path&IMPORTANT_PARAMETER=value_of_this_parameter&VARIABLE=test
Please make pepdbagent and pephub use consistent loggers. Probably logmuse will help with this.
print
messages, always use loggerlogmuse
if it makes senseInvalid token type
messageI see these messages littered all over the logs:
Not all environment variables were populated in derived attribute source: $BEDBASE_DATA_PATH_HOST/bed_files/{file_name}
Not all environment variables were populated in derived attribute source: $BEDBASE_DATA_PATH_HOST/outputs/bedstat_output/bedstat_pipeline_logs/submission/{sample_name}_sample.yaml
Not all environment variables were populated in derived attribute source: $BEDBASE_DATA_PATH_HOST/openSignalMatrix_{genome}_percentile99_01_quantNormalized_round4d.txt.gz
So basically the point here is: run this service and look at the logs and make the logs consistent and useful and understandable, and correct any errors.
I can't get this endpoint to return anything. I'm getting a 404
.
It's strange to me that the links on the pephub landing page are all opening targets in new windows. is there a reason you did it that way?
To me it's frustrating, as I generally don't expect links to do that. I don't see a reason for doing it here.
parameter tag
is not added to most fo the link that are in pephub
e.g. : https://pephub.databio.org/pep/geo/GSE175232/convert?filter=csv || should be: https://pephub.databio.org/pep/geo/GSE175232/convert?filter=csv&tag=series
I need a place to keep track of endpoints that are working with the new PepAgent
class.
PEP:
/pep/view
Namespace:
/pep/{namespace}
/pep/{namespace}/view
/pep/{namespace}/projects
Project:
/pep/{namespace}/{project}
/pep/{namespace}/{project}/view
/pep/{namespace}/{project}/zip
/pep/{namespace}/{project}/config
/pep/{namespace}/{project}/samples
/pep/{namespace}/{project}/samples/{sample_name}
/pep/{namespace}/{project}/samples/{sample_name}/view
/pep/{namespace}/{project}/subsamples
/pep/{namespace}/{project}/convert
Eido:
Complete
The PepAgent
class is nice and mature for the project
level data, bet less to for metadata about PEPs and then for namespaces.
I thought it would be efficient to have a list of all endpoints we should have for pephub in an issue. This is what I have so far:
/<namespaceid>/<projectid>
/<namespaceid>/<projectid>
/<namespaceid>/<projectid>/config
/<namespaceid>/<projectid>/samples
/<namespaceid>/<projectid>/samples/<sampleid>
/<namespaceid>/<projectid>/subsamples/<sampleid>
/<namespaceid>/<projectid>/zip
@nsheff Feel free to add more that you see as necessary.
@nsheff I noticed that when hitting /v1/ChangLab/PEP_1
you get an internal server error as the PEP isn't compatible with PEP 2.0.0
:
NotImplementedError: The attribute implications section (implied_columns) follows the old format.
Reformatting is not implemented.
Edit the config file manually (add 'sample_modifiers.imply') to comply with PEP 2.0.0 specification:
http://pep.databio.org/en/latest/specification/#sample-modifier-imply
Is there an incentive to support both versions of PEP? As well, how would that be implemented?
Just so I don't forget --
We need to look into performance of peppy. It takes too long to process/return a large project. It needs to be very fast.
Right now if you hit a /samples
endpoint, you get the samples, but each sample includes a pointer back to the project under _project
.
We shouldn't duplicate the project information like that.
Line 1 in 317fe5d
this should be #!
not !#
, right?
Right now, pephub shows the version of peppy and Python used, but not the version of pephub
This sample has asterisks in the file name: http://pephub.databio.org/pep/changlab/pep_2/samples/ACCx-2A5AE757-20D5-49B6-95FF-CAE08E8197A0-X012-S05-L033-B1-T1-P024
These can't be populated by pephub.
Derived attributes with reference to other sample attributes are correctly derived: http://pephub.databio.org/pep/demo/derive/samples
I think the splash page for a project should have links to show how to actually get some stuff.
for example, this page:
http://pephub.databio.org/pep/nfcore/demo_rna_pep/view
should have a link to :
part of the point of this is to make it easy to access the particular endpoints for a given project, by browsing to the project page and then having those links be there, already populated for you.
Two suggestions:
Eido provides a CLI to convert a PEP into different output formats: http://eido.databio.org/en/latest/filters/
pephub should provide an API to retrieve sample metadata in various formats by specifying an eido filter.
Thus we should provide:
eido filter
(obviously from within python, not on the CLI)eido convert config.yaml -f <filter>
.With use authentication and more features, we decided that it would make more sense to have a better, more stylish landing page that just makes more sense.
The original idea was to emulate docker hub, but this is flexible.
@nsheff Should we enable searching for peps? @khoroshevskyi said it would be nice to search through peps. We could add a /search
endpoint?
Maybe this is an issue for pepdb
, but originally it was quite easy to serve a config file for any PEP since the actual files were stored next to the server. Now that we are migrating to the database representation of PEPs, it is not so trivial to serve config files through the API.
What can we do to retain the file-serving capability of pephub? Some ideas:
peppy
Currently, trying to access a /namespace/project
endpoint only works if the case matches. E.g. "biocproject" != "BiocProject"
. I think that the project endpoints (at least for identifiers) should be case-insensitive.
http://pephub.databio.org/pep/geo/gse74180/convert?filter=yaml
Output of the yaml filter is an escaped string representation of a yaml inside of json:
Instead, it should just return the yaml file.
It has been discussed a lot that we should open an endpoint for submitting PEPs to PEPhub. However, it needs to involve some sort of authentication. I believe the plan is to use Auth0 to do this. We should protect the submission endpoint to only allow authenticated users to submit a new PEP.
There are two endpoints now to help with PEP submission:
GET /pep/{namespace}/submit
:
This returns a webpage with a form that user's can use to name their PEP, upload files, and add a tag. I believe the idea is the username of the user will be used to populate the namespace.
POST /pep/{namespace}/submit
:
This accepts a multipart/form-data
POST
request and utilizes the submission to insert a new PEP into the database using pepdbagent
. At present, this is only available if their is a SERVER_ENV=development
environment variable set when running the server.
I am assigning this to @nsheff since he has experience with Auth0 and next.js
deployment. @rafalstepien since you are familiar with Django, I thought I'd assign you too as you may have insight here.
I am close to another release of pephub
, so once that is done, I think we can start to work on this.
Wouldn't it be cool if you could pass variables through query params to adjust attributes in the PEP?
For example, say a PEP has a derived column that uses an environment variable like $DATA, because the files are stored in $DATA/subfolder/{sample_name}.fq
, or something.
If when you hit the endpoint to get some file attribute, you could pass /endpoint?DATA=/my/local/path
, then the server could return a path that was useful for your local environment.
I just added some new PEPs to the pephub.github.org repo. I did it blind without testing. When I try to access those endpoints, I get this error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 375, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
raise exc
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
raise exc
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 656, in __call__
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 259, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 61, in app
response = await func(request)
File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 250, in app
response = actual_response_class(response_data, **response_args)
File "/usr/local/lib/python3.8/site-packages/starlette/responses.py", line 49, in __init__
self.body = self.render(content)
File "/usr/local/lib/python3.8/site-packages/starlette/responses.py", line 157, in render
return json.dumps(
File "/usr/local/lib/python3.8/json/__init__.py", line 234, in dumps
return cls(
File "/usr/local/lib/python3.8/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/lib/python3.8/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
ValueError: Out of range float values are not JSON compliant
In addition the validating on pephub submission (a github action in pephub, see #15), we should also do some defensive programming here to catch these errors and return an informative message to the user, in case something sneaks by the validation script.
After one day of using PEPhub as loged in user tocken is expiring. After that page is not working: error pops up:
{"detail":"The token has expired, please log in again."}
This function assumes that every file uploaded is a PEP:
Lines 185 to 190 in cb109b6
It creates a Project object from each file. This is false, many PEPs are made up of multiple files.
This means the validation will only work for single-file PEPs.
Right now, pephub looks for a dotfile called ".pephub.yaml" in a folder to identify the config file, and expects it look like:
config_file: path/to/config.yaml
Interestingly, looper does something very similar...it wants a file called .looper.yaml
in the folder and wants it to look like:
config_file_path: path/to/config.yaml
We should standardize.
config_file_path
for .pephub.yaml
.looper.yaml
and .pephub.yaml
into a single file?Right now, there's no way to get to the new validators. Can you please add in some interface whereby users can discover the new validators from the home page? I believe there are multiple types of new validators, so there needs to be some way to access these various tools.
@nsheff Another thought while loading the /v1/ChangLab/PEP_2
endpoint... I see this in the server output logs:
Couldn't find matching sample for subsample: BRCA-6F22B7DA-85CA-4E9C-93A3-859878775DDB-X005-S06-L012-B1-T1-PMRG
Couldn't find matching sample for subsample: BRCA-6F22B7DA-85CA-4E9C-93A3-859878775DDB-X005-S06-L012-B1-T1-PMRG
Couldn't find matching sample for subsample: LGGx-A50A1CE2-549C-4BCF-8E04-F846B09BEA95-X006-S06-L040-B1-T1-PMRG
Couldn't find matching sample for subsample: LGGx-A50A1CE2-549C-4BCF-8E04-F846B09BEA95-X006-S06-L040-B1-T1-PMRG
Couldn't find matching sample for subsample: GBMx-CBA5FDBB-E848-4B2D-82D5-8A33D7A3D205-X005-S11-L025-B1-T2-PMRG
Couldn't find matching sample for subsample: GBMx-CBA5FDBB-E848-4B2D-82D5-8A33D7A3D205-X005-S11-L025-B1-T2-PMRG
Should this be returned to the user requesting the PEP via the API?
Piggybacking off #31
It might be nice to have an HTML, user-friendly page that shows the same info you're putting into this endpoint. So, it would be a list of the projects for a namespace, with some metadata (number of samples), and a link to the project, some API endpoint links, etc. For browsability.
I was checking how does pephub works, and I think we should disable hard querying requests, e.g. get all projects, or similar queries. We will have a lot of errors there and it can cause server overload.
One cause of this issue is loading projects to peppy itself.
I would suggest:
Looking for discussion on this issue
@nsheff @nleroy917
http://pephub.databio.org/pep/nfcore/demo_rna/view
what's wrong with this PEP?
We should think about some column in the database, eg is_private
, which will be boolean
and will handle privacy of the projects.
Following instructions for local development:
docker compose up --build
open /home/nsheff/code/pephub/.env: no such file or directory
It seems that this code:
app.mount(
"/",
StaticFiles(directory=STATICS_PATH),
name="static",
)
breaks the routing. When this code is run, I can no longer access endpoints like /pep/{namespace}
or /pep/{namespace}/{project}
. Base routes remain in tact, but the routers no longer function. It seems like they are mutually exclusive, as in you can mount or server routers? I am not familiar enough with FastAPI to know. @nsheff any ideas?
With the rapid changes to pephub and pepdbagent, some features and API's got misaligned and when navigating the UI, it seems that sometimes features of PEPs are missing. As such, we need to go through each page and ensure that there are no missing values and that all links work well.
Right now, any uploaded PEP is publicly visible. Instead, a user should be able to mark PEPs as public or private.
As a simple permission system to start:
To do this will require:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.