GithubHelp home page GithubHelp logo

crim-ca / weaver Goto Github PK

View Code? Open in Web Editor NEW
24.0 11.0 6.0 98.5 MB

Weaver: Workflow Execution Management Service (EMS); Application, Deployment and Execution Service (ADES); OGC API - Processes; WPS; CWL Application Package

Home Page: https://pavics-weaver.readthedocs.io

License: Apache License 2.0

Makefile 1.04% Python 97.61% Common Workflow Language 1.20% Mako 0.11% Shell 0.03% Dockerfile 0.01%
ogc wps cwl workflow remote-execution ems ades common-workflow-language web-application web-processing-service

weaver's People

Contributors

cehbrecht avatar chaamc avatar cwcummings avatar davidcaron avatar dbyrns avatar dependabot[bot] avatar elacoursiere-crim avatar f-plt avatar fbachandcrim avatar fderue avatar fmigneault avatar fmigneault-crim avatar francisplt avatar mishaschwartz avatar perronld avatar pyup-bot avatar trapsidanadir avatar zvax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

weaver's Issues

Route of job inputs, outputs and result

When a job is completed, it is currently difficult to know where the result came from as we do not have any input indication or any methodology to retrieve this information from the REST API.

We should add jobs/{id}/inputs (and all other job's route variants) that would return the list of input values received during the job submission.

Corresponding outputs route should also be added to match the inputs/outputs fields specified in the process description. This would make it more natural to search for the result as field name mapping would be literally the same keyword.

These routes should be provided in links as per #58.

Route jobs/{id}/result[s] (with and without s) should be kept as aliases to remain back-compatible with existing specifications. They themselves use both variants interchangeably:

Also add a 'request' link for saving the literal body received for the job execution.
Highlighted by @matprov

Provide "builtin" apps as process building blocs

builtin apps are loaded directly for the moment, with no way to know they exist from the API.

todo

  • builtin apps must be "converted" to Process that will be returned via GET /processes
  • builtin apps should be automatically added to db if not already present

implementation notes:

  • Process.type = "builtin"
  • Process.package = <loaded CWL in weaver/process/builtin>
  • Process.visibility = "public"
  • remove deprecated if in wps_package to resolved builtin apps, as they will be auto resolved and loaded by Process.id.

Use class enums instead of segregated constants

applicable classes:

  • status
  • status compliant
  • status category
  • sort
  • order
  • formats
  • content-type
  • execute mode
  • execute control option
  • execute response
  • execute transmission mode
  • visibility

for "status" enums, also use improvements from other projects

TES-API server routes to allow CWL from Rabix-composer UI

Description

Rabix composer (http://rabix.io/) allows to build CWL workflows with a convivial UI and to test the job execution of such workflows. Furthermore, TES-API (Task Execution Schema) servers are supported to dispatch remote jobs.

To investigate

If weaver supports TES schemas, it could theoretically be possible to execute CWL workflow jobs from the composer's run command.

platform

We would need to allow adding a new "Platform" type "OGC", since only limited and fixed values are permitted currently (see image)
The OGC WPS-REST API standard could be provided as new feature to

image

remote fetching

Possibility to import CWL definition if feature to fetch from URL is implemented:
rabix/composer#393
weaver processes could then be imported to build the workflow, and deploy them with the resulting code (from "Code" tab).

temporary testing

Re-configure the local files of composer to load remote weaver processes CWL as if they where local files.

Required endpoints for TES

(see TES swagger-ui)
image

References Packages

simplify the whole projet

remove supervisor nginx mongodb // only run pyramid app

will save a LOT of space for image size
will simplify by a LOT the image/config setup

todo

  • remove buildout and related mongodb, supervisor, etc.. parts
  • use directly gunicorn instance in docker
  • link to external mongodb
  • remove installs in following make according to removed dependencies
Makefile:181
"$(CONDA_HOME)/bin/conda" install -y -n "$(CONDA_ENV)" "setuptools=$(SETUPTOOLS_VERSION)" supervisor nginx mongodb
  • figure out how to handle celery... external one would duplicate code a lot...
  • figure out how to serve pywps endpoint, pywps-data-output dir endpoint and weaver app all under the same pyramid app (maybe using [composite:main] in weaver.ini ?)
  • update compose of other repos to update weaver.ini file directly instead of custom.cfg since it won't be used anymore + add link / other images required

WPS hostname not replaced in XML response when behind server proxy

For example, calling :
${SERVER_HOSTNAME}/ows/wps?request=GetCapabilities&service=WPS

The returned XML document will have https://localhost:4000 in place of ${SERVER_HOSTNAME} instead of the actual server hostname, as defined by weaver.url configuration parameter.

Process Package generation from Jupyter Notebook definition

Explore the possibility to simplify the generation of new process packages directly from a notebook.

Points to explore (non-exhaustive list) :

  • using typing definition in a main function to define/auto-detect the I/O of the CWL package, and therefore the WPS process.
  • defining automatically a new docker image with added python package requirements as specified by the notebook

Good candidate for notebook โ†’ CWL generation

IPython2CWL

To Do

  • provide deploy from Jupyter Notebook URL or Git URL
  • add README references to relevant OGC Testbed-16: Earth Observation Application Packages with Jupyter Notebooks Engineering Report (http://docs.opengeospatial.org/per/20-035.html)

Migrate to uvicorn-gunicorn-fastapi Docker Image

Pre-built, optimized image base using all of the latest FastAPI improvements:
https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker

  • async calls & concurrency
  • faster json encode/decode
  • builtin OpenAPI /docs (Swagger-UI) and /redocs (ReDoc) endpoints
  • enforce Python 3.6+ (in line with deprecated 2.7)
  • automatic input validation
  • Flask-like flavor for defining routes

Requires to migrate to FastAPI application structure.

cookiecutter with all above + setups for :
https://github.com/tiangolo/full-stack-fastapi-couchbase

  • Web admin page (port 8091)
  • Celery workers (Flower task monitor, port 5555)
  • Traefuk UI (route usage monitoring, port 8090)
  • Vue.js frontend
  • Email notifications integrated
  • Backend integrated test execution (on running instance)

Conformance route

Conformance route should be implemented to link standards that the application conforms to and/or supports.

GET /conformance

see example:
https://github.com/opengeospatial/oapi_common/blob/master/standard/examples/conformance_response_JSON_1.adoc

Also update docs reference links to return (old pdf ref seem to be missing) :

Following is done for now:

def api_frontpage_body(settings):
# type: (SettingsType) -> JSON
"""Generates the JSON body describing the Weaver API and documentation references."""
# import here to avoid circular import errors
from weaver.config import get_weaver_configuration
from weaver.wps import get_wps_url
weaver_url = get_weaver_url(settings)
weaver_config = get_weaver_configuration(settings)
weaver_api = asbool(settings.get("weaver.wps_restapi"))
weaver_api_url = get_wps_restapi_base_url(settings) if weaver_api else None
weaver_api_def = weaver_api_url + sd.api_swagger_ui_service.path if weaver_api else None
weaver_api_spec = weaver_api_url + sd.api_swagger_json_service.path if weaver_api else None
weaver_wps = asbool(settings.get("weaver.wps"))
weaver_wps_url = get_wps_url(settings) if weaver_wps else None
weaver_conform_url = weaver_url + sd.api_conformance_service.path
weaver_process_url = weaver_url + sd.processes_service.path
weaver_links = [
{"href": weaver_url, "rel": "self", "type": CONTENT_TYPE_APP_JSON, "title": "This document"},
{"href": weaver_conform_url, "rel": "conformance", "type": CONTENT_TYPE_APP_JSON,
"title": "WPS conformance classes implemented by this service."},
]
if weaver_api:
weaver_links.extend([
{"href": weaver_api_url,
"rel": "service", "type": CONTENT_TYPE_APP_JSON,
"title": "WPS REST API endpoint of this service."},
{"href": weaver_api_def,
"rel": "swagger", "type": CONTENT_TYPE_TEXT_HTML,
"title": "WPS REST API definition of this service."},
{"href": weaver_api_spec,
"rel": "OpenAPI", "type": CONTENT_TYPE_APP_JSON,
"title": "WPS REST API specification of this service."},
{"href": "https://raw.githubusercontent.com/opengeospatial/wps-rest-binding/develop/docs/18-062.pdf",
"rel": "documentation", "type": CONTENT_TYPE_APP_PDF,
"title": "API documentation about this service."},
{"href": "https://app.swaggerhub.com/apis/geoprocessing/WPS/",
"rel": "wps-rest-swagger", "type": CONTENT_TYPE_TEXT_HTML,
"title": "API reference specification of this service."},
{"href": weaver_process_url,
"rel": "processes", "type": CONTENT_TYPE_APP_JSON,
"title": "Processes offered by this service."}
])
if weaver_wps:
weaver_links.extend([
{"href": weaver_wps,
"rel": "wps", "type": CONTENT_TYPE_APP_XML,
"title": "WPS 1.0.0/2.0 XML endpoint of this service."},
{"href": "http://docs.opengeospatial.org/is/14-065/14-065.html",
"rel": "wps-xml-specification", "type": CONTENT_TYPE_TEXT_HTML,
"title": "WPS 1.0.0/2.0 definition of this service."},
{"href": "http://schemas.opengis.net/wps/",
"rel": "wps-xml-schema", "type": CONTENT_TYPE_APP_XML,
"title": "WPS 1.0.0/2.0 XML validation schemas."}
])
return {
"message": "Weaver Information",
"configuration": weaver_config,
"parameters": [
{"name": "api", "enabled": weaver_api,
"url": weaver_api_url,
"api": weaver_api_def},
{"name": "wps", "enabled": weaver_wps,
"url": weaver_wps_url},
],
"links": weaver_links,
}
@sd.api_versions_service.get(tags=[sd.TAG_API], renderer=OUTPUT_FORMAT_JSON,
schema=sd.VersionsEndpoint(), response_schemas=sd.get_api_versions_responses)
def api_versions(request): # noqa: F811
# type: (Request) -> HTTPException
"""Weaver versions information."""
weaver_info = {"name": "weaver", "version": __meta__.__version__, "type": "api"}
return HTTPOk(json={"versions": [weaver_info]})
@sd.api_conformance_service.get(tags=[sd.TAG_API], renderer=OUTPUT_FORMAT_JSON,
schema=sd.ConformanceEndpoint(), response_schemas=sd.get_api_conformance_responses)
def api_conformance(request): # noqa: F811
# type: (Request) -> HTTPException
"""Weaver specification conformance information."""
# TODO: follow updates with https://github.com/geopython/pygeoapi/issues/198
conformance = {"conformsTo": [
# "http://www.opengis.net/spec/wfs-1/3.0/req/core",
# "http://www.opengis.net/spec/wfs-1/3.0/req/oas30",
# "http://www.opengis.net/spec/wfs-1/3.0/req/html",
# "http://www.opengis.net/spec/wfs-1/3.0/req/geojson",
"http://schemas.opengis.net/wps/1.0.0/",
"http://schemas.opengis.net/wps/2.0/",
"http://www.opengis.net/spec/WPS/2.0/req/service/binding/rest-json/core",
# "http://www.opengis.net/spec/WPS/2.0/req/service/binding/rest-json/oas30",
# "http://www.opengis.net/spec/WPS/2.0/req/service/binding/rest-json/html"
"https://github.com/opengeospatial/wps-rest-binding",
]}
return HTTPOk(json=conformance)

Following still must be added :

Slightly more official/recent conformance definitions:
https://htmlpreview.github.io/?https://github.com/opengeospatial/ogcapi-processes/blob/master/docs/18-062.html#_conformance

Handle 'default' values and "minOccurs=0" in CWL from WPS-1

ref #12, #11

CWL: how to set default values
(1) https://www.biostars.org/p/221531/

Common Workflow Language User Guide - Essential Input Parameters
(2) https://www.commonwl.org/user_guide/03-input/

More details here:
(3) https://www.biostars.org/p/286135/
"default" generates "type": ["null", <other-real-type>] and "<type>?" is a shorthand for this same type definition

todo

  • add test when "default" is present for a LiteralData value
  • add test when "default" is present for a "File" reference
  • add test when "minOccurs"=0 is present for a LiteralData value ("default" as "null")
  • add test when "minOccurs"=0 is present for a "File" reference ("default" as "null")
  • add test when "maxOccurs">1 is present for a LiteralData value (coverted to "type": "array")
  • add test when "maxOccurs">1 is present for a "File" reference (coverted to "type": "array")
  • add test for shorthand "<type>?" by itself, and with every combination of above cases, as well as with multiple cases combined (validate all situations are handled correctly)
  • add test when "minOccurs">1, type should be converted to array automatically, maxOccurs should be updated accordingly to array methods as above.

extra

  • add test when "allowedValues" is present with a list of allowed values, which should resolve as enum type (relates to #41)
  • add test when "allowedValues" + ("default" and/or "minOccurs"=0 and/or "maxOccurs">1) are present, to validate that everything still resolves properly [see ref (3) for more details about potential failure for "enum?"]

edit
Could introduce new CWL methods to define int/float Range or string AllowedAalues (similar to enum) following outcome of common-workflow-language/common-workflow-language#764

For Workflows, see developments related to following spec:

Support of BoundingBox I/O

Processes that use BoundingBox inputs or outputs will raise a NotImplementedError at the moment as only Literal and Complex I/O are implemented.

CWL conversion of this format also has to be considered to define how the information should be passed down to an application. Maybe an array of string/floats? Otherwise a "polygon" file that use the same conversion as Complex I/O (see in link below for example).

Probably also need to support WKT format which defines a bounding box as string (also in below link).

examples:
https://github.com/bird-house/emu/pull/93/files/0328215027ba7cdbdf4f07a800b5b34ccacd5f42..e588febe06eaf96ffa3a8ba0b05b0db697de5ea0

Adjust the dodgy values for valid ones

During OGC-Testbed14, some values where left as is to allow conformance or interoperability tests to work, but they are not valid ones to used in such situations. Following should be corrected:

  • 200 => 201 status from successful process deployment (see #87)
  • allow int for minOccurs/maxOccurs (while preserving str to allow "unbounded")
  • (?)
    202 status (accepted) on POST job when async mode since not guaranteed to be running yet
    201 status (created) kept on created sync job [current status applied for any mode]
    in either case, the job is created and can be fetched with GET and job ID
  • ...?

[gitCret] Sensitive Credentials Detected [1349cb7053068b2783011cec0bc24cc6f6835ece]

{"line":" - secure: MteZFKkISMErROvLnjoZipPFBREDACTED/oK59O4pxSYN6+yqP8KEp4K1maNOsV2BcE5WGkH5R6jfdezVfvxKbKIoIECWhgK2e0rR4Cp4833cw4F6maSqcLkJL5zDeHyHbofzLLVd/1qugz07vZx2GJGTgt2KJtN+j0zeVp0SC7BQwY8RZZ0BSZB06S...","offender":"REDACTED","commit":"74114473f4472f5dc866a70f65f3e1e2bd537457","repo":"workspace","rule":"Generic Credential","commitMessage":"Merge pull request #76 from crim-ca/docker-gunicorn\n\nfailing tests/actions are expected","author":"Francis Charette Migneault","email":"[email protected]","file":".travis.yml","date":"2020-02-11T14:28:16-05:00","tags":"key, API, generic"}
{"line":" - secure: MteZFKkISMErROvLnjoZipPFBPWoaF0xxncTD76qN1AO2N/oK59O4pxSYN6+yqP8KEp4K1maNOsV2BcE5WGkH5R6jfdezVfvxKbKIoIECWhgK2e0rR4Cp4833cw4F6maSqcLkJL5zDeHyHbofzLLVd/1qugz07vZx2GJGTgt2KJtN+j0zeVp0SC7BQw...","offender":"REDACTED","commit":"74114473f4472f5dc866a70f65f3e1e2bd537457","repo":"workspace","rule":"Generic Credential","commitMessage":"Merge pull request #76 from crim-ca/docker-gunicorn\n\nfailing tests/actions are expected","author":"Francis Charette Migneault","email":"[email protected]","file":".travis.yml","date":"2020-02-11T14:28:16-05:00","tags":"key, API, generic"}
{"line":" # ref: https://github.com/svdarren/REDACTED/.github/workflows/secrets.yml","offender":"REDACTED","commit":"a028564f379cb04d9fe0acee7fd6547614b55044","repo":"workspace","rule":"Generic Credential","commitMessage":"Merge pull request #69 from crim-ca/actions\n\nadd github util actions for validation","author":"Francis Charette Migneault","email":"[email protected]","file":".github/workflows/secret-scan.yml","date":"2020-01-30T13:56:00-05:00","tags":"key, API, generic"}
{"line":" - secure: BZBVvKMZaMuMBq3LvdwpBEnyk2OO9yDu+Izxqvoba3/QT2zD0lA8bOFFUYErjuhKecmda1OyNnjdejs2wpGxEEal21AR/BoNHeVTH7euaQzjhWo9MaBnnpFqsKABxy4EFdHZHGD90LrrFoCj7HgI4w+BxABQSsdSuWNYHl4oCyAF9TsPnKGFIlW0HlA...","offender":"REDACTED","commit":"25ab1492d6f99ab6c5e9988fc63a08d2aec239a3","repo":"workspace","rule":"Generic Credential","commitMessage":"Merge pull request #28 from crim-ca/codacy\n\ncodacy config and analysis","author":"Francis Charette Migneault","email":"[email protected]","file":".travis.yml","date":"2019-03-26T17:46:19-04:00","tags":"key, API, generic"}
{"line":" AnyValue, AnyREDACTED, AnyRegistryContainer, AnyHeadersContainer,","offender":"REDACTED","commit":"fc51045de645613755122f007b21dff78913836f","repo":"workspace","rule":"Generic Credential","commitMessage":"Merge pull request #12 from crim-ca/wps1-auto-deploy\n\nWPS1 auto deploy\r\ncloses #2, #20 \r\nref #3, #11, #17 ","author":"Francis Charette Migneault","email":"[email protected]","file":"weaver/utils.py","date":"2019-03-26T17:09:21-04:00","tags":"key, API, generic"}
{"line":"twitcher.REDACTED","offender":"REDACTED","commit":"e22ad8c7cc9bf18002f893ed8d7f76455bf90c81","repo":"workspace","rule":"Generic Credential","commitMessage":"setting username and password for rpc interface\n","author":"Carsten Ehbrecht","email":"[email protected]","file":"templates/twitcher.ini","date":"2015-11-26T17:37:28+01:00","tags":"key, API, generic"}
{"line":"pywpsproxy.REDACTED","offender":"REDACTED","commit":"e57adf9fdc603c37e0425f6a2a727994619551cf","repo":"workspace","rule":"Generic Credential","commitMessage":"renamed pywps-proxy to twitcher\n","author":"Carsten Ehbrecht","email":"[email protected]","file":"templates/pywpsproxy.ini","date":"2015-10-29T12:52:27+01:00","tags":"key, API, generic"}
{"line":"twitcher.REDACTED","offender":"REDACTED","commit":"e57adf9fdc603c37e0425f6a2a727994619551cf","repo":"workspace","rule":"Generic Credential","commitMessage":"renamed pywps-proxy to twitcher\n","author":"Carsten Ehbrecht","email":"[email protected]","file":"templates/twitcher.ini","date":"2015-10-29T12:52:27+01:00","tags":"key, API, generic"}
{"line":"pywpsproxy.REDACTED","offender":"REDACTED","commit":"3f636aabb1ee4caeb48d19e66522fe7939f76085","repo":"workspace","rule":"Generic Credential","commitMessage":"added security checks\n","author":"Carsten Ehbrecht","email":"[email protected]","file":"templates/pywpsproxy.ini","date":"2015-10-28T18:18:06+01:00","tags":"key, API, generic"}
{"line":" 'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"af59643f46d0b97921259a3f3499bed901f485d0","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"Remove wps_workflow which is not usefull for now\n","author":"David Byrns","email":"[email protected]","file":"twitcher/processes/wps_workflow.py","date":"2018-10-10T10:58:20-04:00","tags":"key, Facebook"}
{"line":" 'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"1ffa2ece9161df5484f4e5a49e6852135be316ae","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"Synch with dynamic-wps_processes\n","author":"David Byrns","email":"[email protected]","file":"twitcher/processes/wps_workflow.py","date":"2018-10-10T10:24:32-04:00","tags":"key, Facebook"}
{"line":" 'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"02c2cdc4e1fca2154eca6e22801985b3f7c9e787","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"Use the package handler as workflow execution entrypoint\n","author":"David Byrns","email":"[email protected]","file":"twitcher/processes/wps_workflow.py","date":"2018-10-03T21:06:10-04:00","tags":"key, Facebook"}
{"line":" 'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"02c2cdc4e1fca2154eca6e22801985b3f7c9e787","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"Use the package handler as workflow execution entrypoint\n","author":"David Byrns","email":"[email protected]","file":"twitcher/wps_restapi/processes/workflows.py","date":"2018-10-03T21:06:10-04:00","tags":"key, Facebook"}
{"line":" 'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"6786ef08939194d1a80d57cc7ed7ffdce91aac9b","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"Add a job type switch ready to process local workflow, local process or proxy process\n","author":"David Byrns","email":"[email protected]","file":"twitcher/wps_restapi/processes/workflows.py","date":"2018-10-02T15:41:14-04:00","tags":"key, Facebook"}
{"line":" cookie = {'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"d5e7467500b14bdafbcbd0afc4ea2ce6dd9c3871","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"Workflows are working with still some stub and TODOs\n","author":"David Byrns","email":"[email protected]","file":"twitcher/processes/wps_process.py","date":"2018-10-12T16:48:34-04:00","tags":"key, Facebook"}
{"line":" cookie = {'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"5ffd7a454a887086de4adbac749d75cec04af8a9","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"Work in progress\n","author":"David Byrns","email":"[email protected]","file":"twitcher/cwl_wps_workflows/workflow_runner.py","date":"2018-10-09T11:31:56-04:00","tags":"key, Facebook"}
{"line":" cookie = {'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"5ffd7a454a887086de4adbac749d75cec04af8a9","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"Work in progress\n","author":"David Byrns","email":"[email protected]","file":"twitcher/cwl_wps_workflows/wps_process.py","date":"2018-10-09T11:31:56-04:00","tags":"key, Facebook"}
{"line":" cookie = {'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"5ffd7a454a887086de4adbac749d75cec04af8a9","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"Work in progress\n","author":"David Byrns","email":"[email protected]","file":"twitcher/processes/wps_process.py","date":"2018-10-09T11:31:56-04:00","tags":"key, Facebook"}
{"line":" cookie = {'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"4232674cb7e0b9038e7bfc69326e599a5bc97670","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"CWL Workflow engine adapter for remote execution through WPS\n","author":"Eric Lacoursiรจre","email":"[email protected]","file":"twitcher/cwl_wps_workflows/workflow_runner.py","date":"2018-09-13T16:22:56-04:00","tags":"key, Facebook"}
{"line":" cookie = {'auth_tkt': 'd7890d6644880ae5ca30c6663b345694b5b90073d3dec2a6925e888b37d3211aa10168d15b441ef2d2cd8f70064519fda06REDACTED!userid_type:int;',","offender":"REDACTED","commit":"4232674cb7e0b9038e7bfc69326e599a5bc97670","repo":"workspace","rule":"Facebook Secret Key","commitMessage":"CWL Workflow engine adapter for remote execution through WPS\n","author":"Eric Lacoursiรจre","email":"[email protected]","file":"twitcher/cwl_wps_workflows/wps_process.py","date":"2018-09-13T16:22:56-04:00","tags":"key, Facebook"}
{"line":" AnyValue, AnyREDACTED, AnyRegistryContainer, AnyHeadersContainer,","offender":"REDACTED","commit":"8a69556cae199bd2ceb62ea2d44568701cc7376b","repo":"workspace","rule":"Generic Credential","commitMessage":"API utils/fixes\n","author":"Francis Charette Migneault","email":"[email protected]","file":"weaver/utils.py","date":"2019-03-19T17:04:33-04:00","tags":"key, API, generic"}
{"line":" - secure: BZBVvKMZaMuMBq3LvdwpBEnyk2OO9yDu+Izxqvoba3/QT2zD0lA8bOFFUYErjuhKecmda1OyNnjdejs2wpGxEEal21AR/BoNHeVTH7euaQzjhWo9MaBnnpFqsKABxy4EFdHZHGD90LrrFoCj7HgI4w+BxABQSsdSuWNYHl4oCyAF9TsPnKGFIlW0HlA...","offender":"REDACTED","commit":"7ec5789111efaf2b932aa00de42f13eeebacaeef","repo":"workspace","rule":"Generic Credential","commitMessage":"codacy config and analysis\n","author":"Francis Charette Migneault","email":"[email protected]","file":".travis.yml","date":"2019-03-14T21:17:09-04:00","tags":"key, API, generic"}
๏ฟฝ[33mWARN๏ฟฝ[0m[2020-04-06T18:07:10Z] 22 leaks detected. 1902 commits audited in 55 seconds 217 milliseconds 984 microseconds

Complete usage documentation

Complete todo in weaver/docs (see also #101)
(use make fixme-list to find them)

  • Introduction (the basics)

    • Foreword about WPS-1/2/3, XML/JSON, and Transactional/Quote/Billing additions.
    • Foreword about EMS/ADES.
    • Basic notes/pointers about informative endpoints
      • /api
      • /version
      • frontpage
    • Interfaces (ie: PyWPS-1/2 route vs WPS-REST / OGC-API)
  • Type of Processes

    • What are builtin processes, and which ones are available? (#20)
    • What are WPS-1/2 processes (remote/provider processes pointer).
    • What are ESGF-CWT processes? (#23) [@matprov maybe can fill better at a later date]
    • CWL Application Package (docker app)
    • Other remote providers? (#130)
    • WPS endpoints (WPS-1/2 XML/JSON vs WPS-REST JSON).
      Maybe even bidirectional support? (#125, #126)
    • EMS-specific execution using EOImage with AOI/TOI/CollectionId for OpenSearch
  • CWL-related (Some kind of tutorial My first CWL for WPS Applcation Package)

    • How are WPS I/O mapped/merged with CWL I/O + some important differences
      • add note about arrays (most confusing since CWL are a list under the same ID but WPS repeat the ID for each individual entry of the list)
      • variants of enum/supported values/formats and to which type they apply
      • WPS Complex/Literal vs CWL File/others
    • Add details/example about typical steps: deploy app, update visibility, execute, get status
    • Add details related to CWL Application within WPS (where to put the definition - many places, variations supported - href vs json unit, ...)
    • Add details related to CWL Workflow. How are steps related to WPS processes IDs?
  • Configuration (see: configuration)

    • how does this run? (gunicorn/pserve/celery)
    • details about each setting in weaver.ini
      • comment about each 'group' of settings (e.g. path/url/dir of various wps config)
      • details about what is the result of any item
      • add any missing ones?
    • details about data_sources.json and wps_processes.yml
      • what do they do?
      • their respective schemas (one of them here)
      • how to provide them via weaver.ini (ie: none, single file, directory auto-listing)
      • default behavior (copy .example) + supported formats (full path vs relative vs name only looking in app/config directory)
    • Request Caching settings, invalidation, regions, etc.
      Current docs only describe generic Request Options but not specific INI settings for caching which are mostly different, but could also be combined with cache=True|False via request_extra handling.
    • Example Request Options
      • Fix broken link
      • Embed example within the page
      • Provide more parameters details
  • Quotation/Billing documentation
    (see #531)

  • Utils and Use-Cases

  • Other

    • Add extra OGC-TB15 & TB16 reference notes to README (bottom).

Support embedded step definition in CWL

It is not currently possible to embed steps in a CWL workflow definition in weaver.
According to the CWL spec, it is possible to have step definitions in the workflow itself.

A job's 'percentCompleted' is not the same percent as in the status message

During a WPS1 execution, we can see the percentage at 2 places: in the percentCompleted field, and in the message field. The value in the message field seems to be the good one.

At one point, the percent shown in the message was 90% (right at the end of the processing, so REMOTE_JOB_PROGRESS_FETCH_OUT was the right value), and in the percentCompleted field, it was 30%.

After a quick look at the code, I think that the weaver.process.utils.map_progress function is called twice (in 2 different places) before being outputted in percentCompleted, but once for the value in the status message.

So this one is ok:

map_progress(execution.percentCompleted,

But this one should treat the progress value as the final percentage I believe (or at least in this case):

progress=map_progress(progress, start_step_progress, end_step_progress),

How to handle "multiple output"

According to owslib.wps, minOccurs/maxOccurs can only be parsed for inputs, outputs are always a single reference or literal data.

In pywps, the outputs have a reference to min_occurs/max_occurs because of inheritance from base classes. They are also returned in their json property.

For CWL, there is no restriction to have multiple outputs (array of string, file, etc.).
Actually, when using a regex glob: *.<ext> pattern, a list of results is automatically created with all matches.

How can we produce the CWL <-> WPS conversion in these cases?
Also, should we enforce explicit values of minOccurs/maxOccurs :

  • minOccurs="1" enforced since there should always be an output
    update: since CWL allows optional outputs (eg: File?), minOccurs=0|1 should be set according to nullable detection
  • maxOccurs="1" | "<number>" | "unbounded", with "1" being the default (same reason as min), or one of "<number>" | "unbounded" if specified during deployment

ref #17
relates to opengeospatial/ogcapi-processes#37

To Do

  • check any test with commented references to #25

Properly handle glob patterns and corresponding WPS apps

When a CWL output glob pattern (ex: glob: "*.nc") is employed, the output of the WPS app should automatically define maxOccurs="unbounded", or any overriding maxOccurs value specified in the Deploy body.

Also, this output should be considered as an array (list), so that wps_package collects multiple files/data here:

def make_outputs(self, cwl_result):
# type: (CWL_Results) -> None
"""
Maps `CWL` result outputs to corresponding `WPS` outputs.
"""
for output_id in self.request.outputs: # iterate over original WPS outputs, extra such as logs are dropped
# TODO: adjust output for glob patterns (https://github.com/crim-ca/weaver/issues/24)
if isinstance(cwl_result[output_id], list) and not isinstance(self.response.outputs[output_id], list):
if len(cwl_result[output_id]) > 1:
self.logger.warning(
"Dropping additional output values (%s total), only 1 supported per identifier.",
len(cwl_result[output_id])
)
# provide more details than poorly descriptive IndexError
if not len(cwl_result[output_id]):
raise PackageExecutionError(
"Process output '{}' expects at least one value but none was found. "
"Possible incorrect glob pattern definition in CWL Application Package.".format(output_id)
)
cwl_result[output_id] = cwl_result[output_id][0] # expect only one output
if "location" not in cwl_result[output_id] and os.path.isfile(str(cwl_result[output_id])):
raise PackageTypeError("Process output '{}' defines CWL type other than 'File'. ".format(output_id) +
"Application output results must use 'File' type to return file references.")
if "location" in cwl_result[output_id]:
self.make_location_output(cwl_result, output_id)
continue
# data output
self.response.outputs[output_id].data = cwl_result[output_id]
self.response.outputs[output_id].as_reference = False
self.logger.info("Resolved WPS output [%s] as literal data", output_id)

We could explore the use of a builtin app array2single for cases where glob is used to collect an expected unique output.

When an unique file glob pattern is used (ex: glob: "output.nc"), then the process should automatically define maxOccurs=1 and the wps_package output should resolve any CWL glob list (as required) to an unique reference.

Furthermore, the specific extension could be employed to automatically set mediaType if not provided with more specific values.

Relates to #25

Back-propagate CWL changes from converted WPS fields

Description

When complementary details are provided in the WPS deploy body, they are correctly merged with CWL definitions to update the resulting (combined) WPS inputs/outputs information.

On the other hand, these complementary details are not back-propagated into the CWL package definition. Even worst, updated I/O that have their type overriden (ex: from single value to array type because of min/max values), incorrectly preserve the "old" type [#17].

WPS fields that provide missing or overriding information on the CWL side should be applied to update the resulting CWL package.

Adresses issues #17, #31

Extra considerations

  • When multiple formats are specified on the WPS side, updating the CWL formats could be challenging as unknown schema references (i.e.: IANA/EDAM not found) cannot be applied (raises validation error). This means that if any format cannot be found, they cannot be back-propagated from WPS supported_formats definitions as we cannot allow partial definition on CWL side (guaranteed error if the given file is one of the unknown schema that is represented as valid on WPS side).

To Do

  • check any tests with comments referring to #17
  • check any tests with comments referring to #31
  • check any tests with comments referring to #50

WPS XML respose are missing many metadata field

When calling GetCapabilities, most of the metadata that should be filled by PyWPS configuration are missing (or more often replaced by generic defaults).

Examples :

<ows:ServiceContact>
  <ows:IndividualName>Lastname, Firstname</ows:IndividualName>
  <ows:PositionName>Position Title</ows:PositionName>
  <ows:ContactInfo>
    <ows:Phone>
      <ows:Voice>+xx-xxx-xxx-xxxx</ows:Voice>
      <ows:Facsimile />
    </ows:Phone>
    <ows:Address>
      <ows:DeliveryPoint />
      <ows:City>City</ows:City>
      <ows:AdministrativeArea />
      <ows:PostalCode>Zip or Postal Code</ows:PostalCode
      <ows:Country>Country</ows:Country>
      <ows:ElectronicMailAddress>Email Address</ows:ElectronicMailAddress>
    </ows:Address>
  </ows:ContactInfo>
</ows:ServiceContact>
<wps:Languages>
  <wps:Default>
    <ows:Language>en-US</ows:Language>
  </wps:Default>
  <wps:Supported>
    <ows:Language>lang</ows:Language>
  </wps:Supported>
</wps:Languages>

We need to fill this information from provided metadata config.ini file (done via below code), but maybe we need to enforce it, or otherwise log it with warnings when missing. Maybe some can also be inferred from other configuration values.

weaver/weaver/wps.py

Lines 120 to 128 in 3372fb3

for setting_name, setting_value in settings.items():
if setting_name.startswith("weaver.wps_metadata"):
WEAVER_PYWPS_CFG.set("metadata:main", setting_name.replace("weaver.wps_metadata", ""), setting_value)
# add weaver configuration keyword if not already provided
wps_keywords = WEAVER_PYWPS_CFG.get("metadata:main", "identification_keywords")
weaver_mode = get_weaver_configuration(settings)
if weaver_mode not in wps_keywords:
wps_keywords += ("," if wps_keywords else "") + weaver_mode
WEAVER_PYWPS_CFG.set("metadata:main", "identification_keywords", wps_keywords)

unicode in wps status

Using testbed14-twitcher, I got a mysterious result from a wps request. The status was successful, but the results contained an empty list.

I found the bug in wps_restapi.processes.processes.execute_process(), when calling job.save_log(logger=task_logger), it raises an exception when the job.status_message contains unicode characters.

Also, I think it would be a good idea to refactor the nested try ... except in the monitoring a bit. The error was hard to find because it wasn't logged anywhere except in a debug log that should have been an exception log.

Support `/results/{id}` and `/outputs/{id}` routes

Only "/result" is currently supported (same as "/outputs") due to Testbed14 conformance.
To ensure back-compatibility with similar WPS REST interfaces, these routes should be added.

see: https://52north.github.io/tamis-rest-api/

See /conf/core/job-results-async-one in https://github.com/opengeospatial/ogcapi-processes/blob/d011fe92750a0843106fcb87b6292db6e2b0fa51/core/requirements/core/REQ_job-results-success-async-one.adoc

  • add endpoint definitions
  • add conformance entries
  • test that individual items can be retrieved individually as defined by the specification (consider File vs literals and which content they must support/return)
  • ensure #511 is respected

relates to #240, GD-534

Create WPS3-ADES remote instance for testing

  • use ogc-ades.crim.ca instance to allow full workflow evaluations (#3)
  • run make start directly on vm (bare metal) to avoid docker-in-docker deployments
  • adjust proxy to map /weaver to bare metal instance

AllowedValues missing from process descriptions

The literal inputs' structure is not the same in the database as the one used in testbed14.

See:

  • weaver.wps_restapi.swagger_definitions.LiteralInputType: testbed14
  • weaver.wps_restapi.swagger_definitions.ProcessInputDescriptionSchema: the one in the database (it's missing the allowedValues parameter, but adding it here will not add it in the process description)

So it basically comes down to 3 options I think:

  • Ignore the inputs description in the testbed14's format, and use ours instead
  • Convert our database schema to conform to testbed14's schema
  • Write a json function to support both schemas when sending the process description

I'm not sure how much work would be involved in any of these cases...

functional tests for process deployment

relates to #3

  • add test to deploy process with CWL DockerRequirement
    • reference .cwl in href instead of ProcessDescription (see DockerRequirement note)
    • reference .cwl in owsContext within ProcessDescription -> Process
    • reference .cwl in executionUnit (combined with deploymentProfileName) (see DockerRequirement note)
  • add test to deploy process with CWL WPS1Requirement (added in #372)
    (must provide process in requirement)
    • reference .cwl in href instead of ProcessDescription
    • reference .cwl in owsContext within ProcessDescription -> Process
    • reference .cwl in executionUnit (combined with deploymentProfileName)
  • add test to deploy process with WPS1 DescribeProcess endpoint
    • reference WPS1 in href instead of ProcessDescription
    • reference WPS1 in owsContext within ProcessDescription -> Process
    • reference WPS1 in executionUnit (combined with deploymentProfileName)
  • add test to deploy process with WPS1 service GetCapabilities endpoint
    (must include identifier=... to derive DescribeProcess)
    • reference WPS1 in href instead of ProcessDescription
    • reference WPS1 in owsContext within ProcessDescription -> Process
    • reference WPS1 in executionUnit (combined with deploymentProfileName)
  • add test to deploy process with WPS3 (REST) GET Process endpoint
    • reference WPS3 in href instead of ProcessDescription
    • reference WPS3 in owsContext within ProcessDescription -> Process
    • reference WPS3 in executionUnit (combined with deploymentProfileName)
  • #434
    • package directly provided as JSON
    • reference to remote CWL package
  • test with Workflow using deployed sub-processes

DockerRequirement note

Implementation missing from below tests but feature is tested across many other tests since it is the most common use case.
Should implement just to make sure it is always evaluated for consistency.

# FIXME: implement
@pytest.mark.skip(reason="not implemented")
def test_deploy_process_CWL_DockerRequirement_href(self):
raise NotImplementedError
# FIXME: implement
@pytest.mark.skip(reason="not implemented")
def test_deploy_process_CWL_DockerRequirement_owsContext(self):
raise NotImplementedError
# FIXME: implement
@pytest.mark.skip(reason="not implemented")
def test_deploy_process_CWL_DockerRequirement_executionUnit(self):
raise NotImplementedError

Remove double process execution monitoring layers

Context & Feature

At the moment, processes are executed via monitoring the WPS-1 endpoint provided by pywps with an owslib.wps.WebProcessingService. This pywps also monitors the sub-job execution, creating unnecessary layers of job monitoring, harder debugging, and complicated logging.

A single celery job monitor should be implemented, directly monitoring the real job execution of the process (cwltool). Also, because celery only monitors the status/result of the WPS Execution sent to pywps which resides under the same app as the API, the cwltool operations are actually executed by a thread-worker of the API (weaver manager) instead of a celery worker. This is simply not the desired behavior, cannot be scaled, and doesn't use full worker/queue capabilities.

The full stack of process/monitor is as follows:

  1. [EMS] POST Weaver /job, submits job to db
  2. Celery worker picks job and calls WPSExecute on PyWPS and monitors status until success/failure
  3. PyWPS receives the job (Weaver manager API side) and creates its interal workers to run it
  4. PyWPS handler runs job (cwltool or remote wps)

In the case of EMS-workflow, this whole stack is repeated for the underlying ADES called the same way for each step.

Considerations

We need to consider what to do about validation of WPS I/O. This was the reason the pywps layer was originally employed.

  • Since executed cwltool or remote WPS service called already do this kind of validation on their side by validating the received inputs against the package/process definition, we could simply report the invalid input received by them when the ADES/EMS attempts to execute it. We would only need to return this error from the job execution, but no preemptive validation of I/O would be done.
  • other approach???

The WPS-1/2 endpoint with pywps should be preserved for the sole purpose of providing back-compatibility with WPS-1/2 by redirecting the job submission just as the WPS-3 REST-API does, using following correspondance (see also #126):

  • GetCapabilities => GET /processes
  • DescribeProcess => GET /processes/{id}
  • Job Execute => POST /processes/{id}/jobs
  • Job GetStatus (n/a) use GET /processes/{id}/jobs/{id} instead

With celery>=4.3, the Task Result has result_extended option which allows to store additional metadata about received inputs/function-name/etc. This should be enabled to have even better tracking of executed/pending tasks.

Helpful References

Paging response results and metadata

Results returned from following routes should support paging and add additional metadata:

  • GET /processes
  • GET /proceses/{id}/jobs

Additional paging metadata should be defined as in follow:

{
    <...>, 
    "links": [
    { "href": "<this_url_path>?page=<page>&limit=<limit>",
      "rel": "self", 
      "type": "application/json", 
      "title": "This page results" },
    { "href": "<this_url_path>?page=0&limit=<limit>",
      "rel": "first", 
      "type": "application/json", 
      "title": "This page results" },
    { "href": "<this_url_path>?page=<page-1>&limit=<limit>",
      "rel": "prev", 
      "type": "application/json", 
      "title": "Previous page results" },
    { "href": "<this_url_path>?page=<page+1>&limit=<limit>",
      "rel": "next", 
      "type": "application/json", 
      "title": "Next page results" },
    { "href": "<this_url_path>?page=<max(page)>&limit=<limit>",
      "rel": "last", 
      "type": "application/json", 
      "title": "Last page results" }
  ]
}

see example:
https://github.com/opengeospatial/oapi_common/blob/master/standard/examples/collections_metadata_JSON_1.adoc

WPS Execute error due to Twitcher proxy

Execution of WPS-1 ColibriFlyingpigeon_SubsetBbox process fails due to POST Execute request (XML body, default by owslib) not authorized by Twitcher. Submitting the same request directly to the WPS URI without the proxy works just fine.

Somehow, looking at permissions of flyingpigeon (see image) not even GetCapabilities should be allowed, but it is. The GET Execute request (KVP encoding in URI) is permitted, even though the permission is explicitly not allowed for anonymous.

things to check

  • out of date Magpie [0.7.4] (master is 0.9.x) / Twitcher [pavics-0.3.17 /w MagpieAdapter 0.7.3]
    Will an upgrade fix the issue?
  • maybe incorrectly parsed wps typed Magpie service (Ouranosinc/Magpie#157)

colibri magpie permissions (anonymous)

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.