jvalue / ods Goto Github PK

View Code? Open in Web Editor NEW

36.0 36.0 24.0 13.12 MB

Open Data Service - Make consuming open data easy, safe, and reliable

License: GNU Affero General Public License v3.0

Dockerfile 1.76% JavaScript 22.60% Java 18.25% TypeScript 41.75% Shell 0.89% HTML 0.09% Vue 14.34% Python 0.32%

cloud-database microservices opendata

ods's People

Contributors

Stargazers

Watchers

ods's Issues

Scheduler: Integration test failing occasionally even when tests succeed

The problem seems to be related to the last line of the integration test
afterAll(() => setTimeout(() => process.exit(), 1000)).

UI: Serve as static content

Currently we use a development server in the Docker image to start the UI. The goal is to serve the UI as static resources via a webserver, like nginx.

https://daten-und-bass.io/blog/serving-a-vue-cli-production-build-on-a-sub-path-with-nginx/

Please check if this solves the issue that the UI is not served correctly on Kubenetes cluster

Document/Specify GitHub workflow guidelines

After migrating from GitLab to GitHub we need to explicitly define the work processes of the ODS development, i.e. usage of issue board, project board, pull requests, etc.
A first draft of the guidelines has to be added which can then be discussed and modified.

Core/Transformation: Move remaining pipelineConfig to transformation service

Currently, configurations for data transformation and notifications are stored and handled at the core service. This task should be handled by the transformation service. The exact implementation is highly dependent on whether we already have rabbitMQ or not. I suggest implementing rabbitMQ first so we do not have to implement event handling twice.

Fix System-Test README

Startup of system-test as described in system-test/README.md does not work.

Adapter Datasource Deserializing Test Sometimes Fails

Sometime the datasource deserializing unit tests fails for whatever reason (sometimes runs through successfully).


expected: org.jvalue.ods.adapterservice.datasource.model.Datasource<Datasource{id=123, protocol=AdapterProtocolConfig {type='HTTP', parameters='{location=http://www.the-inder.net}'}, format=AdapterFormatConfig {type='XML', parameters='{}'}, metadata=PipelineMetadata{displayName='TestName', author='icke', license='none', creationTimestamp=13 May 2020 14:13:39 GMT, description='Describing...'}, trigger=PipelineTriggerConfig{periodic=true, firstExecution=Fri Dec 01 03:30:00 CET 1905, interval=50000}}> but was: org.jvalue.ods.adapterservice.datasource.model.Datasource<Datasource{id=123, protocol=AdapterProtocolConfig {type='HTTP', parameters='{location=http://www.the-inder.net}'}, format=AdapterFormatConfig {type='XML', parameters='{}'}, metadata=PipelineMetadata{displayName='TestName', author='icke', license='none', creationTimestamp=13 May 2020 14:13:39 GMT, description='Describing...'}, trigger=PipelineTriggerConfig{periodic=true, firstExecution=Fri Dec 01 03:30:00 CET 1905, interval=50000}}>
java.lang.AssertionError: expected: org.jvalue.ods.adapterservice.datasource.model.Datasource<Datasource{id=123, protocol=AdapterProtocolConfig {type='HTTP', parameters='{location=http://www.the-inder.net}'}, format=AdapterFormatConfig {type='XML', parameters='{}'}, metadata=PipelineMetadata{displayName='TestName', author='icke', license='none', creationTimestamp=13 May 2020 14:13:39 GMT, description='Describing...'}, trigger=PipelineTriggerConfig{periodic=true, firstExecution=Fri Dec 01 03:30:00 CET 1905, interval=50000}}> but was: org.jvalue.ods.adapterservice.datasource.model.Datasource<Datasource{id=123, protocol=AdapterProtocolConfig {type='HTTP', parameters='{location=http://www.the-inder.net}'}, format=AdapterFormatConfig {type='XML', parameters='{}'}, metadata=PipelineMetadata{displayName='TestName', author='icke', license='none', creationTimestamp=13 May 2020 14:13:39 GMT, description='Describing...'}, trigger=PipelineTriggerConfig{periodic=true, firstExecution=Fri Dec 01 03:30:00 CET 1905, interval=50000}}>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:144)
	at org.jvalue.ods.adapterservice.datasource.model.DatasourceTest.testDeserialization(DatasourceTest.java:29)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
	at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
	at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:412)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
	at java.base/java.lang.Thread.run(Thread.java:830)

Fix picture embedding in README.md

Pictures embedded in the root level README.md currently have relative file-system paths, e.g. doc/configuration-example/01_overview.jpg.

This worked when the repository was on GitLab but does not work on GitHub anymore.

The relative paths have to be replaced with absolute paths to where the pictures are accessible via GitHub, e.g. for the example above: https://github.com/jvalue/open-data-service/blob/master/doc/configuration-example/01_overview.jpg.

Add README for example requests

explain how to submit example requests using vscode and why

Combine multiple data sources

We should enable the combination of data from multiple sources. This can be done either in the adapter service or in the transformation service. I suggest doing it in the latter since data combination is modeled better by a transformation than by an adaptation.

It is probably best to refactor the data flow in the ODS. Currently, a pipeline consists of one adapter, followed by multiple transformations, storage, and optional notifications. That is, in pseudo-EBNF: A T* S N*. What we want are multiple stages of one adapter each followed by one transformation each and in the end optional storage and optional notifications, i.e.: {A T}+ [S] N*. Since the transformations are touring complete, modeling multiple transformations after one adapter is redundant and could be simplified by using just one. If we find out that it is more convenient to split data transformation into multiple parts, we can still offer it in the UI and just concatenate the transformations before passing it to the scheduler. The amse projects revealed that a common use case for the ODS is getting URLs from one source that should be used to get data from another. To enable this process we need a dynamic adapter configuration. The easiest implementation of this should be adding the parameters for subsequent adapters to the data field so they can be used there.

Enable rabbitMQ management console

The rabbitMQ management console is not accessible. This can be fixed by changing the rabbitMQ image in the docker-compose file.

Integration of Event-Driven Architecture PR

Just went through all the changes and came up with the following consecuting work packages in order to have as small PRs as possible.

WP A: Events for Notification Trigger

Notification Service reacts on trigger event (already implemented in #120)
Scheduler sends event via RabbitMQ to Notification instead of using trigger endpoint

WP B: Decouple Storage via Events

Integrate storage_mq from #102
Scheduler sends event via RabbitMQ to trigger storage_mq

WP C: Trigger Storage and Notification by Transformation

Move trigger functionality from scheduler to transformation to trigger notification and storage

WP D: Remove Core

Move pipeline config logic from core to transformation
Add trigger endpoint to transformation service
Remove pipeline config polling from scheduler
Use trigger endpoint for pipelines in scheduler instead of sending whole pipeline configurations for execution
Note: stateless execution interface stays untouched!
Note: Integraiton tests can be copied from core to transformation, should still run through after URL change

WP E: Trigger Pipelines via Trigger Event

Publish event after successful adapter execution in adapter service
Listen to adapter events in transformation service to trigger pipelines
Remove trigger functionality from scheduler

Up next:

rename transformation service to pipeline service
event-commnication between adapter and scheduler

Add GitHub Action Badges to README

Let's have badges on the README that show if the microservices on the master branch build and tests run successfully.

Run CI on PRs

Currently, when PRs are coming from a forked repo, it seems like the CI pipeline is not triggered in our repo. Probably we have to adjust the trigger

Links:

Adapter/Scheduler RabbitMQ

The scheduler event polling at the adapter service needs to be replaced by rabbitMQ communication.
This entails:

Integrating rabbitmq in the scheduler
Data import triggering via datasource id instead of sending the whole configuration

System-Test failing on master

Specifically system-test1: create non-periodic pipeline without transformation.
Find the reason and fix.

Adapter: Include datasource id in imported data

API documentation with Swagger UI

There shall be a swagger documentation for every microservice and a common swagger ui

Docs for Deployment Environment

We need a documentation about how we set up the Kubernetes Cluster.
Can be copied from old GitLab Wiki..

Remove top-level package-lock.json

From old times there is still the /package-lock.json file in the root of the repository. This file is not necessary anymore and should be removed.

Notification: poll instead of sleep in integration tests

Instead of sleeping, we could poll for the notification to arrive like it is done in the system-test

Originally posted by @mathiaszinnen in #120

RabbitMQ integration adapter

Integration tests and rabbitMQ

We need to decide on how to perform end to end tests using rabbitMQ instead of http calls, suggestions for frameworks etc. are welcome

UI: Use imported data in transformation stepper

Instead of displaying unrelated sample data, pipeline transformation input data should be preview data from the corresponding adapter (if there is one). The newly added manual adapter trigger could be used for that.

Try to import data from covid19 api (https://covid19api.com/)

Add resulting configuration to example requests in doc folder

UI: Paginate pipeline data

Currently, all data a pipeline has produced in its lifetime is loaded once the user clicks on the "data" button. For longer running pipelines, this is very slow. We need some kind of pagination to limit the amount of data to be loaded at once.

Scheduler sends event via RabbitMQ to Notification instead of using trigger endpoint

Scheduler/Adapter: Deleted datasources still triggered

After deleting a periodical datasource, the corresponding data imports are still periodically triggered. We need to ensure deleted datasources are not executed anymore

Rafactor UI: datasources to own directory

In PR #28 we separated datasources and pipelines in the ui. Since we kept the changes to a minimum, we still need to extract the datasource related files into a separate directory. This means a lot of import path fixing and file renaming.

Rename docker-compose files for system-test and integration-tests.

Currently, the docker-compose files for system-test and integration testing inside the ci environment are named inconsistently.
We should rename them to make the workflow easier to understand.

Update README to new UI

The pictures in the howTo show an outdated UI. We need to update the pictures and description to match the current workflow for pipeline creation (Datasource and Pipeline separated).

(Re-)add origin information to stored pipeline storage

Was temporarily removed due to a refactoring

Originally posted by @georg-schwarz in https://github.com/jvalue/open-data-service-ms/diffs

Event communication core/scheduler

The current HTTP based polling of the scheduler has to be replaced by events using rabbitMQ

Restructure README / documentation

I think our README could need a little more structuring.
Let's collect here first what should be part of the top-level README and what will go to other files, like the service READMEs..

A little inspiration: https://github.com/gentics/mesh/blob/dev/README.md

Remove GitLab CI File

We migrated to GitHub, so the gitlab-ci file is obsolete.

Notification: Firebase Unit Test

There is a unit test for firebase notifications missing, but I am the one to blame here, since it did not exist before ;-).
We should create another issue for that.

Originally posted by @mathiaszinnen in #77

Transformation: Pass data references

Instead of passing the actual data to the transformation service, the scheduler should pass a reference to the data. The transformation service should then fetch the data, transform it and pass a reference to the transformed data as a result. We need to decide whether the transformation needs an own persistence to save its results or if we want to use some kind of a shared solution.

Storage: Data references

Adapter: Improve Exception Logging

Currently, exceptions are just being dumped to the console. Instead, we should log a meaningful message including context information

System-test does not depend on successful notification service build

We need to add the notification service build to the system test dependencies.

Link to communication channels

Add a gitter badge to the README

Cleanup package.json

Over time a lot of clutter was added to the package.json(s) of the project. Remove all unnecessary stuff and update the repo url to GitHub.

Add Linter to Integration and System Tests

Currently, there is no eslint enabled for our integration tests and system tests. However, we should do that in order to keep it maintainable.

Reference deployment in README

System-Test: Service logs are only shown for succesful test

With our current CI configuration, in integration and system tests, logs of services other than the test are only shown when tests are successful. Actually, we need them when tests fail to simplify debugging.
This is because after failing system/integration-test the ci job is abandoned (--exit-code-from flag).
We should think of a way to show logs for failing tests.

Adapter: Get imports and imported data of datasource via API

Import does not contain data, but only id, timestamp, link to data (url)

API suggestion:

/datasources/{id}/imports -> map of all imports (id, timestamp, link to data url)
/datasources/{id}/imports/{id} -> import (id, timestamp, link to data url)
/datasources/{id}/imports/latest -> latest data import
/datasources/{id}/imports/{id}/data -> the real data

Implementation:

managing imports and import data should be handled in sub-package of datasources (not adapters). Probably requires a fair bit of refactoring.
/dataImport should return imported data directly, no caching of these kind of data imports

Originally posted by @georg-schwarz in #56 (comment)

When should notifications trigger?

When should notifications trigger? After a pipeline execution or after the data is stored?

jvalue / ods Goto Github PK

ods's People

Contributors

Stargazers

Watchers

Forkers

ods's Issues

WP A: Events for Notification Trigger

WP B: Decouple Storage via Events

WP C: Trigger Storage and Notification by Transformation

WP D: Remove Core

WP E: Trigger Pipelines via Trigger Event

Recommend Projects

Recommend Topics

Recommend Org

Jobs