GithubHelp home page GithubHelp logo

graph-component's People

Contributors

attx-github avatar blankdots avatar jkesanie avatar

Watchers

 avatar  avatar  avatar

graph-component's Issues

First version of RMLService

Description

Describe the issue or task at hand with dependencies and possible links to code/external/internal sources.

DoD

  • Docker image
  • Swagger
  • Pipeline for building
  • Pipeline for image
  • [ ]

Testing

How one would test that the things/artifacts have been achieved. Options (Peer review, Unit Test, BDD testing, Integration Test).

Implement Graph Manager V1

Description

Graph Manager needs to work with the WF-API and insert data in the Graph Store in a scheduled manner.

DoD

Cron script which provides the functionality:

  • getting data from WF-API in a schedule manner
  • updating data in Graph Store specific context

Testing

Unit and BDD tests

Automatic provenance graph update

Description

There is an endpoint in the GMapi to update the state of the provenance data graph with the data from the UV's database. Currently the responsibility to call that endpoint is outside of GMapi.

We should come up with a way to schedule or trigger the same endpoint within the GMApi. Simplest solution would be scheduled execution as planned earlier.

DoD

Update to the docs and possibly some related implementation

Testing

Testing depends on what we decide to do.

Document/implement dataset operations

Description

Graphmanager provides the following operations for datasets:

  • replace
  • query
  • add
  • retrieve

DoD

Documented Input and output of the operations. New version of the service.

Testing

Unit testing

Implement ID clustering in Graph Manager

Description

Current implementation of ID clustering is done in a UnifiedViews DPU. Move that implementation to Graph Manager. DPU implementation uses hardcoded data graphs, this implementation we should work on all graphs. Such information can be queried from the provenance graph.

Add this functionality to the GM API.

DoD

Content of the IDs graph is updated automatically based on a schedule (on API call, schedule will be done from Unified Views). a.k.a. extending the graph manager.

Testing

Unit test and/or BDD tests.

PoC DPU that provides workflow api functionality

Description

Implement a dpu that provides not just the working data, but also the execution related provenance data to the graph store.

This is related other DPU issues.

DoD

Working DPU implemenation & documentation in the wiki.

Testing

Peer-review

Ontology extension for Linking Strategy for GM API

Description

Follow up for #17
Current implementation of the Linking Strategy does not support the extension to multiple types of strategies (e.g. IDs strategy, NER, etc.), each type of strategy having multiple and distinct types of parameters (e.g. for SPARQL we can add parameters for the type of skos predicate to be used; for NER we can add multiple languages or training sets etc.).

These changes need to be also reflected in the workflow, more precisely the steps (e.g. a workflow and associated steps might contain additional parameters for each strategy). The implementation also needs to take into account that some ETL systems cannot provide this type of data.

DoD

Extend the current Ontology to adjust for the changes described above and add follow up issue for the implementation in the GM API.

Testing

Peer Review and Unit Tests

Swagger documentation of the Graph API

Decision on what should be exposed from the graph adjacent to the SPARQL endpoint.
This API will be consumed by the the Public APis component.

DoD:
Swagger specifications file and wiki documentation.

Testing:
Peer review

Provide examples for platform data model

Examples should combine data from worflow, provenance, working and dissemination data.
How are different THINGS named? Where does the identifiers come from?

Outcomes:
Documentation
Hand-crafted example documents

Persistence GM API

Description

GM api DB needs to have persistence.
This will affect deployment and everything down the line.

DoD

Image and GM api are ready to have persistent data

Testing

Integration tests

Configure Fuseki to fit our Graph Store structure

Description:

We need to customise Fuseki to fit our configuration of the graph store.
There should be both in-memory and TDB backed services available.

DoD:

All the graphs will be merged into default graph via the configuration.
Add ontology data model to Fuseki container by default:

  • Design the ontology
  • include ontology part of the graph
  • include missing ontologies such as provenance, working data, etc.

Testing:

BDD, SPARQL test

JSON-LD based mapping implementation of GM

Description

Using compact and flattened JSON-LD as the output format it is ingested to the latest version of the ES.
Implementation has:

  • Mapping capabilities
  • API endpoints
  • ES indexing

Require investigation of the old mapping config.

DoD

Transformation can be called using the update the GM API.

Testing

Unit Tests, BDD tests

GM and GM-API improvements to JSON-LD like structure in Elasticsearch Indexing

Description

GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:
Elasticsearch

  • indexing 1 ID per document. The Document will be constructed based on all the properties related to a subject
  • new IDs structure (using the URL of the subject unless specifies otherwise)
  • new post parameter which specifies the property to be used for the ID (retrieve the value of that property).

DoD

Complete Functionality and points addressed as described above.

Testing

Additional Unit and BDD tests

Linking strategies loaded to the Graph Store

Description

The graph store will store the linking strategies as discussed in #19 and for that to happen the graph store needs to be initialised with the linking strategies in the config.ttl.

DoD

Graph store contains linking strategies.

Testing

Peer review and to be tested via the /linkstrategygmapi endpoint

GM Ontology based ID clustering

Description

GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:

Clustering:

  • cluster data after a specific date
  • additional options in the post requests (related to scheduling and endpoints) - get the parameters from the mapping. Clustering should happen after mapping.
  • ontology based clustering of IDs

DoD

Perform Ontology based ID clustering.

Testing

Additional Unit and BDD tests

GM API to support java implementation

Description

GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:

Functionality:

  • Running *.jar from Python for the Java based mapping
  • Docker image with Java8 support

DoD

API can run Java jar in the specification.
Bundle jar to gm-API with gradle and docker file change.

Testing

Additional Unit and BDD tests

GM Managing update Provenance in Graph Store

Description

GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:

Related to WF-API:

  • update Prov from Worfklow to Graph Store
  • standardise IDs and URIs for some of the ontologies endpoints
  • Check if the provenance graph is in the Graph Store - BDD Test

DoD

Complete Functionality and points addressed as described above.
Two HTTP requests to respective endpoints.

Testing

Additional Unit and BDD tests

Public and Private Workflows and Activities

Description

We are now handling both public and private workflows and we need a way to manage them in the graph component and in the graph store and last but not least in the ontology.

DoD

Add necessary parameters to the ontology.
Configure Graph Store.
Determine Graph Manager changes.

Testing

Peer review

Add license data to datasets

Description

Every dataset in the platform must have some kind of licensing information attached to it.
TODO:

  • initial vocabulary/model for licensing data
  • changes to the platform's data model
  • changes to ATTX components
    • WF API
    • new WF API tests
    • ATTX Transformer

DoD

First version of documentation and implementation of license information handling in ATTX platform.

Testing

Tests for licensing information when creating new datasets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.