attx-project / graph-component Goto Github PK
View Code? Open in Web Editor NEWGraph Manager component that handles state of the internal graph/data.
Graph Manager component that handles state of the internal graph/data.
Currently /linkstrategy return only "uri" field. This should be extended with at least "title" information so it could be more easily used in UI (see ATTX-project/workflow-component#43)
/linkstrategy return also "title" value for every strategy (and perhaps something else too).
Update integration tests.
Describe the issue or task at hand with dependencies and possible links to code/external/internal sources.
How one would test that the things/artifacts have been achieved. Options (Peer review, Unit Test, BDD testing, Integration Test).
Graph Manager needs to work with the WF-API and insert data in the Graph Store in a scheduled manner.
Cron script which provides the functionality:
Unit and BDD tests
GM needs to call long-running services such as RMLService and ProvService and needs to be able to do this efficiently.
Non-blocking service orchestration.
There is an endpoint in the GMapi to update the state of the provenance data graph with the data from the UV's database. Currently the responsibility to call that endpoint is outside of GMapi.
We should come up with a way to schedule or trigger the same endpoint within the GMApi. Simplest solution would be scheduled execution as planned earlier.
Update to the docs and possibly some related implementation
Testing depends on what we decide to do.
Graphmanager provides the following operations for datasets:
Documented Input and output of the operations. New version of the service.
Unit testing
Current implementation of ID clustering is done in a UnifiedViews DPU. Move that implementation to Graph Manager. DPU implementation uses hardcoded data graphs, this implementation we should work on all graphs. Such information can be queried from the provenance graph.
Add this functionality to the GM API.
Content of the IDs graph is updated automatically based on a schedule (on API call, schedule will be done from Unified Views). a.k.a. extending the graph manager.
Unit test and/or BDD tests.
Implement a dpu that provides not just the working data, but also the execution related provenance data to the graph store.
This is related other DPU issues.
Working DPU implemenation & documentation in the wiki.
Peer-review
Follow up for #17
Current implementation of the Linking Strategy does not support the extension to multiple types of strategies (e.g. IDs strategy, NER, etc.), each type of strategy having multiple and distinct types of parameters (e.g. for SPARQL we can add parameters for the type of skos predicate to be used; for NER we can add multiple languages or training sets etc.).
These changes need to be also reflected in the workflow, more precisely the steps (e.g. a workflow and associated steps might contain additional parameters for each strategy). The implementation also needs to take into account that some ETL systems cannot provide this type of data.
Extend the current Ontology to adjust for the changes described above and add follow up issue for the implementation in the GM API.
Peer Review and Unit Tests
Decision on what should be exposed from the graph adjacent to the SPARQL endpoint.
This API will be consumed by the the Public APis component.
DoD:
Swagger specifications file and wiki documentation.
Testing:
Peer review
Examples should combine data from worflow, provenance, working and dissemination data.
How are different THINGS named? Where does the identifiers come from?
Outcomes:
Documentation
Hand-crafted example documents
Sometimes activites cannot be retrieved from WF API also the GM API cannot connect to a specific dataset test
or ds
Investigate and solve the issues
Unit/BDD tests
Unit Test and BDD Tests
GM api DB needs to have persistence.
This will affect deployment and everything down the line.
Image and GM api are ready to have persistent data
Integration tests
Documentation
Peer-review
We need to investigate how the linking functionality would be implemented in the short and long run. For this we need to indentify parameters and options by taking into consideration the strategy https://github.com/ATTX-project/project-management/wiki/Linking-graphs
Update swagger definition and wiki documentation.
Peer review.
Provide and endpoint to create and retrieve links between Working Data Graphs based on different linking implementation.
Described at: https://github.com/ATTX-project/project-management/wiki/Linking-graphs
GM API and Link DPU specific endpoint (ATTX-project/workflow-component#43).
Implementation of the functionality in the GM component
Unit test and BDD tests
Component that turns RDF Data into JSON documents based on a mapping.
Transformation can be called using the update the GM API.
Unit Tests, BDD tests
We need to customise Fuseki to fit our configuration of the graph store.
There should be both in-memory and TDB backed services available.
All the graphs will be merged into default graph via the configuration.
Add ontology data model to Fuseki container by default:
BDD, SPARQL test
Using compact and flattened JSON-LD as the output format it is ingested to the latest version of the ES.
Implementation has:
Require investigation of the old mapping config.
Transformation can be called using the update the GM API.
Unit Tests, BDD tests
GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:
Elasticsearch
Complete Functionality and points addressed as described above.
Additional Unit and BDD tests
The graph store will store the linking strategies as discussed in #19 and for that to happen the graph store needs to be initialised with the linking strategies in the config.ttl
.
Graph store contains linking strategies.
Peer review and to be tested via the /linkstrategy
gmapi endpoint
Document the v1 of the internal data model. Document also the 1st use case.
Wiki page updated/created for the documentation. See ATTX-project/workflow-component#28
Peer Review
GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:
Clustering:
Perform Ontology based ID clustering.
Additional Unit and BDD tests
Moving to architecture v2.0 requires changes to the GM-API and adding new services to the Semantic Broker.
Changes to GM-API and first version of the provenance service as illustrated in https://attx-project.github.io/ATTX-Architecture-Overview.html
Unit tests and Feature tests.
We need to extend the functionality of the GM with mapping of RDF data to public data (JSON, JSON-LD).
We need to design the API for supporting this functionality.
updated Swagger specification.
Peer Review
Wait for the output data from the previous step to become available before calling the next step that uses it for its input. Or throw an error.
New version of the GM-API image that can handle outputs from new services.
GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:
Functionality:
API can run Java jar in the specification.
Bundle jar to gm-API with gradle and docker file change.
Additional Unit and BDD tests
GM and GM api are lacking in some of the needed features to complete the whole data processing pipeline.
Missing and not included in the original implementation definition:
Related to WF-API:
Complete Functionality and points addressed as described above.
Two HTTP requests to respective endpoints.
Additional Unit and BDD tests
We are now handling both public and private workflows and we need a way to manage them in the graph component and in the graph store and last but not least in the ontology.
Add necessary parameters to the ontology.
Configure Graph Store.
Determine Graph Manager changes.
Peer review
Every dataset in the platform must have some kind of licensing information attached to it.
TODO:
First version of documentation and implementation of license information handling in ATTX platform.
Tests for licensing information when creating new datasets.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.