GithubHelp home page GithubHelp logo

jvalue / ods Goto Github PK

View Code? Open in Web Editor NEW
36.0 36.0 24.0 13.12 MB

Open Data Service - Make consuming open data easy, safe, and reliable

License: GNU Affero General Public License v3.0

Dockerfile 1.76% JavaScript 22.60% Java 18.25% TypeScript 41.75% Shell 0.89% HTML 0.09% Vue 14.34% Python 0.32%
cloud-database microservices opendata

ods's People

Contributors

9dt avatar acasadoquijada avatar andreas-bauer avatar f3l1x98 avatar felix-oq avatar georg-schwarz avatar hmartinez69 avatar jenswaechtler avatar jsone-studios avatar ke45xumo avatar kexplx avatar knusperkrone avatar lechodecho avatar lunedis avatar mathiaszinnen avatar nxmyoz avatar sonallux avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ods's Issues

Document/Specify GitHub workflow guidelines

After migrating from GitLab to GitHub we need to explicitly define the work processes of the ODS development, i.e. usage of issue board, project board, pull requests, etc.
A first draft of the guidelines has to be added which can then be discussed and modified.

Core/Transformation: Move remaining pipelineConfig to transformation service

Currently, configurations for data transformation and notifications are stored and handled at the core service. This task should be handled by the transformation service. The exact implementation is highly dependent on whether we already have rabbitMQ or not. I suggest implementing rabbitMQ first so we do not have to implement event handling twice.

Adapter Datasource Deserializing Test Sometimes Fails

Sometime the datasource deserializing unit tests fails for whatever reason (sometimes runs through successfully).


expected: org.jvalue.ods.adapterservice.datasource.model.Datasource<Datasource{id=123, protocol=AdapterProtocolConfig {type='HTTP', parameters='{location=http://www.the-inder.net}'}, format=AdapterFormatConfig {type='XML', parameters='{}'}, metadata=PipelineMetadata{displayName='TestName', author='icke', license='none', creationTimestamp=13 May 2020 14:13:39 GMT, description='Describing...'}, trigger=PipelineTriggerConfig{periodic=true, firstExecution=Fri Dec 01 03:30:00 CET 1905, interval=50000}}> but was: org.jvalue.ods.adapterservice.datasource.model.Datasource<Datasource{id=123, protocol=AdapterProtocolConfig {type='HTTP', parameters='{location=http://www.the-inder.net}'}, format=AdapterFormatConfig {type='XML', parameters='{}'}, metadata=PipelineMetadata{displayName='TestName', author='icke', license='none', creationTimestamp=13 May 2020 14:13:39 GMT, description='Describing...'}, trigger=PipelineTriggerConfig{periodic=true, firstExecution=Fri Dec 01 03:30:00 CET 1905, interval=50000}}>
java.lang.AssertionError: expected: org.jvalue.ods.adapterservice.datasource.model.Datasource<Datasource{id=123, protocol=AdapterProtocolConfig {type='HTTP', parameters='{location=http://www.the-inder.net}'}, format=AdapterFormatConfig {type='XML', parameters='{}'}, metadata=PipelineMetadata{displayName='TestName', author='icke', license='none', creationTimestamp=13 May 2020 14:13:39 GMT, description='Describing...'}, trigger=PipelineTriggerConfig{periodic=true, firstExecution=Fri Dec 01 03:30:00 CET 1905, interval=50000}}> but was: org.jvalue.ods.adapterservice.datasource.model.Datasource<Datasource{id=123, protocol=AdapterProtocolConfig {type='HTTP', parameters='{location=http://www.the-inder.net}'}, format=AdapterFormatConfig {type='XML', parameters='{}'}, metadata=PipelineMetadata{displayName='TestName', author='icke', license='none', creationTimestamp=13 May 2020 14:13:39 GMT, description='Describing...'}, trigger=PipelineTriggerConfig{periodic=true, firstExecution=Fri Dec 01 03:30:00 CET 1905, interval=50000}}>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:144)
	at org.jvalue.ods.adapterservice.datasource.model.DatasourceTest.testDeserialization(DatasourceTest.java:29)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
	at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
	at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:412)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
	at java.base/java.lang.Thread.run(Thread.java:830)


Fix picture embedding in README.md

Pictures embedded in the root level README.md currently have relative file-system paths, e.g. doc/configuration-example/01_overview.jpg.

This worked when the repository was on GitLab but does not work on GitHub anymore.

The relative paths have to be replaced with absolute paths to where the pictures are accessible via GitHub, e.g. for the example above: https://github.com/jvalue/open-data-service/blob/master/doc/configuration-example/01_overview.jpg.

Combine multiple data sources

We should enable the combination of data from multiple sources. This can be done either in the adapter service or in the transformation service. I suggest doing it in the latter since data combination is modeled better by a transformation than by an adaptation.

It is probably best to refactor the data flow in the ODS. Currently, a pipeline consists of one adapter, followed by multiple transformations, storage, and optional notifications. That is, in pseudo-EBNF: A T* S N*. What we want are multiple stages of one adapter each followed by one transformation each and in the end optional storage and optional notifications, i.e.: {A T}+ [S] N*. Since the transformations are touring complete, modeling multiple transformations after one adapter is redundant and could be simplified by using just one. If we find out that it is more convenient to split data transformation into multiple parts, we can still offer it in the UI and just concatenate the transformations before passing it to the scheduler. The amse projects revealed that a common use case for the ODS is getting URLs from one source that should be used to get data from another. To enable this process we need a dynamic adapter configuration. The easiest implementation of this should be adding the parameters for subsequent adapters to the data field so they can be used there.

Integration of Event-Driven Architecture PR

Just went through all the changes and came up with the following consecuting work packages in order to have as small PRs as possible.

WP A: Events for Notification Trigger

  • Notification Service reacts on trigger event (already implemented in #120)
  • Scheduler sends event via RabbitMQ to Notification instead of using trigger endpoint

WP B: Decouple Storage via Events

  • Integrate storage_mq from #102
  • Scheduler sends event via RabbitMQ to trigger storage_mq

WP C: Trigger Storage and Notification by Transformation

  • Move trigger functionality from scheduler to transformation to trigger notification and storage

WP D: Remove Core

  • Move pipeline config logic from core to transformation
  • Add trigger endpoint to transformation service
  • Remove pipeline config polling from scheduler
  • Use trigger endpoint for pipelines in scheduler instead of sending whole pipeline configurations for execution
    Note: stateless execution interface stays untouched!
    Note: Integraiton tests can be copied from core to transformation, should still run through after URL change

WP E: Trigger Pipelines via Trigger Event

  • Publish event after successful adapter execution in adapter service
  • Listen to adapter events in transformation service to trigger pipelines
  • Remove trigger functionality from scheduler

Up next:

  • rename transformation service to pipeline service
  • event-commnication between adapter and scheduler

Adapter/Scheduler RabbitMQ

The scheduler event polling at the adapter service needs to be replaced by rabbitMQ communication.
This entails:

  • Integrating rabbitmq in the scheduler
  • Data import triggering via datasource id instead of sending the whole configuration

Remove top-level package-lock.json

From old times there is still the /package-lock.json file in the root of the repository. This file is not necessary anymore and should be removed.

Integration tests and rabbitMQ

We need to decide on how to perform end to end tests using rabbitMQ instead of http calls, suggestions for frameworks etc. are welcome

UI: Use imported data in transformation stepper

Instead of displaying unrelated sample data, pipeline transformation input data should be preview data from the corresponding adapter (if there is one). The newly added manual adapter trigger could be used for that.

UI: Paginate pipeline data

Currently, all data a pipeline has produced in its lifetime is loaded once the user clicks on the "data" button. For longer running pipelines, this is very slow. We need some kind of pagination to limit the amount of data to be loaded at once.

Rafactor UI: datasources to own directory

In PR #28 we separated datasources and pipelines in the ui. Since we kept the changes to a minimum, we still need to extract the datasource related files into a separate directory. This means a lot of import path fixing and file renaming.

Update README to new UI

The pictures in the howTo show an outdated UI. We need to update the pictures and description to match the current workflow for pipeline creation (Datasource and Pipeline separated).

Transformation: Pass data references

Instead of passing the actual data to the transformation service, the scheduler should pass a reference to the data. The transformation service should then fetch the data, transform it and pass a reference to the transformed data as a result. We need to decide whether the transformation needs an own persistence to save its results or if we want to use some kind of a shared solution.

Storage: Data references

Instead of passing the actual data to the transformation service, the scheduler should pass a reference to the data. The transformation service should then fetch the data, transform it and pass a reference to the transformed data as a result. We need to decide whether the transformation needs an own persistence to save its results or if we want to use some kind of a shared solution.

Adapter: Improve Exception Logging

Currently, exceptions are just being dumped to the console. Instead, we should log a meaningful message including context information

Cleanup package.json

Over time a lot of clutter was added to the package.json(s) of the project. Remove all unnecessary stuff and update the repo url to GitHub.

System-Test: Service logs are only shown for succesful test

With our current CI configuration, in integration and system tests, logs of services other than the test are only shown when tests are successful. Actually, we need them when tests fail to simplify debugging.
This is because after failing system/integration-test the ci job is abandoned (--exit-code-from flag).
We should think of a way to show logs for failing tests.

Adapter: Get imports and imported data of datasource via API

Import does not contain data, but only id, timestamp, link to data (url)

API suggestion:

  • /datasources/{id}/imports -> map of all imports (id, timestamp, link to data url)
  • /datasources/{id}/imports/{id} -> import (id, timestamp, link to data url)
  • /datasources/{id}/imports/latest -> latest data import
  • /datasources/{id}/imports/{id}/data -> the real data

Implementation:

  • managing imports and import data should be handled in sub-package of datasources (not adapters). Probably requires a fair bit of refactoring.
  • /dataImport should return imported data directly, no caching of these kind of data imports

Reload UI sites

In #56 we already made the first step towards make the UI reloadable on every site.

Top-level reloading on e.g. /datasources works now, but not to datasources/new.
So I guess, therefore, the issue is somewhere else? Any guesses?

Originally posted by @georg-schwarz in #56 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.