GithubHelp home page GithubHelp logo

jeffersonlab / epics2kafka Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 1.05 MB

Kafka Connect Source Connector for EPICS CA

License: MIT License

Java 86.93% Dockerfile 1.76% Shell 11.31%
ace epics jaws kafka kafka-connect

epics2kafka's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

epics2kafka's Issues

Investigate ways to handle no channels configured scenario

The connector reads the command topic for a list of channels to monitor, but if the list is empty (starts empty or perhaps becomes empty after users remove all channels) then currently the connector becomes very upset complaining about no work to be done and is unable to divide up the empty set of tasks among Connect Workers. This might be a reasonable thing, but it might be something we can handle more gracefully. Perhaps we can provide more concise error messages, and determine if there is a way to avoid requiring users to maintain at least one channel or else stop (or pause) the connector. Perhaps we could programmatically pause the connector and resume once a non-empty list of channels is once again provided? The real issue here is that when the connector becomes unhappy it moves to a FAILED status. For a hacky solution, see:

https://rmoff.net/2019/06/06/automatically-restarting-failed-kafka-connect-tasks/

Embedded IOC Doesn't Support Dynamic UDP Port

epics-base/jca#62

  • might need to create my own fork of JCA or something. Right now if something is running on standard EPICS ports (like the compose project) and try to run the build it will fail due to embedded IOC test failing to bind to fixed ports (already used).

Consider moving outkey field to command message key

Currently outkey is an optional field of the command topic message in the value portion of the key=value. We should consider moving it to the key portion (it might be possible for it to remain optional - need to dig a little on optional fields in a key - could be problematic as empty field does not equal channel name, which is the logical default - maybe doesn't matter). The benefit would be that you could have the same PV map to multiple unique keys in a given output topic. You can already do the inverse, where multiple PVs map to the same key in the output topic.

Investigate whether to use message header timestamp vs payload timestamp

Currently we place the EPICS monitor event timestamp in the Kafka message payload. This works. However, should we be using the built-in timestamp found in the Kafka message header to avoid an unnecessary additional field in each message? See:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message

Looks like even if you choose the native "CreateTime" timestamp type, the timestamp still may have some implications on topic compaction / partitioning. More investigation is needed.

Common Errors Stop Connector Task

Currently common errors will result in the Connector Task stopping (requiring fixing bad PV then manual restart). What we need is for the connector to log errors (possibly to Kafka topic, otherwise to log file), but keep working.

An example that causes the Connector task to stop is configuring a PV that doesn't exist or otherwise can't be reached. Doing so results in:

ERROR Error while trying to create CAJContext (org.jlab.kafka.connect.CASourceTask:218)
ERROR WorkerSourceTask{id=ca-source-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:187)
org.apache.kafka.connect.errors.ConnectException: gov.aps.jca.TimeoutException: pendIO timed out

Problems with transferring large arrays

I try to deal with large arrays. But the maximum array size that can be accessed through a channel is limited to the default value of 16384 in bytes. Is it possible to provide the max_array_bytes property in the ca-source.json file to make it easier to change the JCA default value, or is there any other good way to recommend?

Cleanup distribution

We may want to figure out how to limit the distribution plugin to only package the dependency jars:

  • epics2kafka.jar
  • jca-2.4.6.jar
  • kafka-common.jar

All the other dependencies are "provided" (already in the Kafka libs directory). It doesn't appear to cause an issue, but is unnecessary and possibly could be a problem.

monitor.addr.list can not be modified.

monitor.addr.list can not be modified. I changed softioc into the real IP address of IOC, but the connection cannot be established successfully. Is its value a fixed string, or an array of strings? Base7.0.5 and Base3.15.8 were used in the test respectively.

Startup Timeout needs to be configurable

Currently the connector will fail if it takes too long to startup. This is hard-coded to 10 poll attempts without reaching command topic high water mark:

if(++tries > 10) {
// We only poll a few times before saying enough is enough.
throw new RuntimeException("Took too long to obtain initial list of channels");
}

This needs to be configurable as Kafka may limit batch polls to 1000 messages for example, which would mean 25,000 commands (25,000 EPICS PVs) would take 25 polls.

Quick start examples should not rely on auto-created topics

The new bitnami container used by the quick start does not honor CREATE_TOPICS env and example topics are auto generated with a warning. An entrypoint script should probably handle this now missing capability to avoid auto-create warnings.

How to fill in the IP address of multiple IOCs

If I have PVs on multiple IOCs that need to be written to KAFKA, how should the property monitor.addr.list be filled? Spaces between multiple IP addresses? It doesn't seem to support this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.