attx-project / graph-component Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 440 KB

Graph Manager component that handles state of the internal graph/data.

graph-component's People

Contributors

Watchers

graph-component's Issues

Extends the linkstrategy API to include more data from the individual strategy

Description

Currently /linkstrategy return only "uri" field. This should be extended with at least "title" information so it could be more easily used in UI (see ATTX-project/workflow-component#43)

DoD

/linkstrategy return also "title" value for every strategy (and perhaps something else too).

Testing

Update integration tests.

First version of RMLService

Description

Describe the issue or task at hand with dependencies and possible links to code/external/internal sources.

DoD

Docker image
Swagger
Pipeline for building
Pipeline for image
[ ]

Testing

How one would test that the things/artifacts have been achieved. Options (Peer review, Unit Test, BDD testing, Integration Test).

Implement Graph Manager V1

Description

Graph Manager needs to work with the WF-API and insert data in the Graph Store in a scheduled manner.

DoD

Cron script which provides the functionality:

getting data from WF-API in a schedule manner
updating data in Graph Store specific context

Testing

Unit and BDD tests

Implement asynchronous-like features to GMApi for calling services

Description

GM needs to call long-running services such as RMLService and ProvService and needs to be able to do this efficiently.

DoD

Non-blocking service orchestration.

Testing

Automatic provenance graph update

Description

There is an endpoint in the GMapi to update the state of the provenance data graph with the data from the UV's database. Currently the responsibility to call that endpoint is outside of GMapi.

We should come up with a way to schedule or trigger the same endpoint within the GMApi. Simplest solution would be scheduled execution as planned earlier.

DoD

Update to the docs and possibly some related implementation

Testing

Testing depends on what we decide to do.

Document/implement dataset operations

Description

Graphmanager provides the following operations for datasets:

replace
query
add
retrieve

DoD

Documented Input and output of the operations. New version of the service.

Testing

Unit testing

Implement ID clustering in Graph Manager

Description

Current implementation of ID clustering is done in a UnifiedViews DPU. Move that implementation to Graph Manager. DPU implementation uses hardcoded data graphs, this implementation we should work on all graphs. Such information can be queried from the provenance graph.

Add this functionality to the GM API.

DoD

Content of the IDs graph is updated automatically based on a ~~schedule~~ (on API call, schedule will be done from Unified Views). a.k.a. extending the graph manager.

Testing

Unit test and/or BDD tests.

PoC DPU that provides workflow api functionality

Description

Implement a dpu that provides not just the working data, but also the execution related provenance data to the graph store.

This is related other DPU issues.

DoD

Working DPU implemenation & documentation in the wiki.

Testing

Peer-review

Ontology extension for Linking Strategy for GM API

Description

Follow up for #17
Current implementation of the Linking Strategy does not support the extension to multiple types of strategies (e.g. IDs strategy, NER, etc.), each type of strategy having multiple and distinct types of parameters (e.g. for SPARQL we can add parameters for the type of skos predicate to be used; for NER we can add multiple languages or training sets etc.).

These changes need to be also reflected in the workflow, more precisely the steps (e.g. a workflow and associated steps might contain additional parameters for each strategy). The implementation also needs to take into account that some ETL systems cannot provide this type of data.

DoD

Extend the current Ontology to adjust for the changes described above and add follow up issue for the implementation in the GM API.

Testing

Peer Review and Unit Tests

Swagger documentation of the Graph API

Decision on what should be exposed from the graph adjacent to the SPARQL endpoint.
This API will be consumed by the the Public APis component.

DoD:
Swagger specifications file and wiki documentation.

Testing:
Peer review

Provide examples for platform data model

Examples should combine data from worflow, provenance, working and dissemination data.
How are different THINGS named? Where does the identifiers come from?

Outcomes:
Documentation
Hand-crafted example documents

Investigate Prov endpoint issues in connection with Graph Store

Description

Sometimes activites cannot be retrieved from WF API also the GM API cannot connect to a specific dataset test or ds

DoD

Investigate and solve the issues

Testing

Unit/BDD tests

JSON-LD Elasticsearch Indexing JSON-LD framing interpreter/editor

Description

DoD

Testing

Unit Test and BDD Tests

Persistence GM API

Description

GM api DB needs to have persistence.
This will affect deployment and everything down the line.

DoD

Image and GM api are ready to have persistent data

Testing

Integration tests

Investigate/decide/plan asynchronous version of the GMapi

Description

DoD

Documentation

Testing

Peer-review

Plan GM linking functionality in the API

Description

We need to investigate how the linking functionality would be implemented in the short and long run. For this we need to indentify parameters and options by taking into consideration the strategy https://github.com/ATTX-project/project-management/wiki/Linking-graphs

DoD

Update swagger definition and wiki documentation.

Testing

Peer review.

GM linking functionality

Description

Provide and endpoint to create and retrieve links between Working Data Graphs based on different linking implementation.
Described at: https://github.com/ATTX-project/project-management/wiki/Linking-graphs

GM API and Link DPU specific endpoint (ATTX-project/workflow-component#43).

DoD

Implementation of the functionality in the GM component

Testing

Unit test and BDD tests

GM mapping implementation based on the existing code

Description

Component that turns RDF Data into JSON documents based on a mapping.

DoD

Transformation can be called using the update the GM API.

Testing

Unit Tests, BDD tests

Configure Fuseki to fit our Graph Store structure

Description:

We need to customise Fuseki to fit our configuration of the graph store.
There should be both in-memory and TDB backed services available.

DoD:

All the graphs will be merged into default graph via the configuration.
Add ontology data model to Fuseki container by default:

Design the ontology
include ontology part of the graph
include missing ontologies such as provenance, working data, etc.

Testing:

BDD, SPARQL test

JSON-LD based mapping implementation of GM

Description

Using compact and flattened JSON-LD as the output format it is ingested to the latest version of the ES.
Implementation has:

Mapping capabilities
API endpoints
ES indexing

Require investigation of the old mapping config.

DoD

Transformation can be called using the update the GM API.

Testing

Unit Tests, BDD tests

GM and GM-API improvements to JSON-LD like structure in Elasticsearch Indexing

Description

GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:
Elasticsearch

indexing 1 ID per document. The Document will be constructed based on all the properties related to a subject
new IDs structure (using the URL of the subject unless specifies otherwise)
new post parameter which specifies the property to be used for the ID (retrieve the value of that property).

DoD

Complete Functionality and points addressed as described above.

Testing

Additional Unit and BDD tests

Linking strategies loaded to the Graph Store

Description

The graph store will store the linking strategies as discussed in #19 and for that to happen the graph store needs to be initialised with the linking strategies in the config.ttl.

DoD

Graph store contains linking strategies.

Testing

Peer review and to be tested via the /linkstrategygmapi endpoint

Document the v1 of the internal data model for UC1

Description

Document the v1 of the internal data model. Document also the 1st use case.

DoD

Wiki page updated/created for the documentation. See ATTX-project/workflow-component#28

Testing

Peer Review

GM Ontology based ID clustering

Description

GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:

Clustering:

cluster data after a specific date
additional options in the post requests (related to scheduling and endpoints) - get the parameters from the mapping. Clustering should happen after mapping.
ontology based clustering of IDs

DoD

Perform Ontology based ID clustering.

Testing

Additional Unit and BDD tests

Provenance service Implementation derived from GM-API

Description

Moving to architecture v2.0 requires changes to the GM-API and adding new services to the Semantic Broker.

DoD

Changes to GM-API and first version of the provenance service as illustrated in https://attx-project.github.io/ATTX-Architecture-Overview.html

Testing

Unit tests and Feature tests.

Add RDF mapping functionality to the GraphManager swagger specification

Description

We need to extend the functionality of the GM with mapping of RDF data to public data (JSON, JSON-LD).

DoD

We need to design the API for supporting this functionality.
updated Swagger specification.

Testing

Peer Review

GM-API: Handle outputs and inputs from connected workflow steps

Description

Wait for the output data from the previous step to become available before calling the next step that uses it for its input. Or throw an error.

DoD

New version of the GM-API image that can handle outputs from new services.

Testing

GM API to support java implementation

Description

GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:

Functionality:

Running *.jar from Python for the Java based mapping
Docker image with Java8 support

DoD

API can run Java jar in the specification.
Bundle jar to gm-API with gradle and docker file change.

Testing

Additional Unit and BDD tests

GM Managing update Provenance in Graph Store

Description

GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:

Related to WF-API:

update Prov from Worfklow to Graph Store
standardise IDs and URIs for some of the ontologies endpoints
Check if the provenance graph is in the Graph Store - BDD Test

DoD

Complete Functionality and points addressed as described above.
Two HTTP requests to respective endpoints.

Testing

Additional Unit and BDD tests

Public and Private Workflows and Activities

Description

We are now handling both public and private workflows and we need a way to manage them in the graph component and in the graph store and last but not least in the ontology.

DoD

~~Add necessary parameters to the ontology.~~
Configure Graph Store.
Determine Graph Manager changes.

Testing

Peer review

Add license data to datasets

Description

Every dataset in the platform must have some kind of licensing information attached to it.
TODO:

initial vocabulary/model for licensing data
changes to the platform's data model
changes to ATTX components
- WF API
- new WF API tests
- ATTX Transformer

DoD

First version of documentation and implementation of license information handling in ATTX platform.

Testing

Tests for licensing information when creating new datasets.