jeffersonlab / epics2kafka Goto Github PK
View Code? Open in Web Editor NEWKafka Connect Source Connector for EPICS CA
License: MIT License
Kafka Connect Source Connector for EPICS CA
License: MIT License
The connector reads the command topic for a list of channels to monitor, but if the list is empty (starts empty or perhaps becomes empty after users remove all channels) then currently the connector becomes very upset complaining about no work to be done and is unable to divide up the empty set of tasks among Connect Workers. This might be a reasonable thing, but it might be something we can handle more gracefully. Perhaps we can provide more concise error messages, and determine if there is a way to avoid requiring users to maintain at least one channel or else stop (or pause) the connector. Perhaps we could programmatically pause the connector and resume once a non-empty list of channels is once again provided? The real issue here is that when the connector becomes unhappy it moves to a FAILED status. For a hacky solution, see:
https://rmoff.net/2019/06/06/automatically-restarting-failed-kafka-connect-tasks/
If I try to add a large set of channels (25,500) the Connector will run (status RUNNING), but the tasks array will be empty (no task will be started). I'm not seeing any errors in the log file either. The workaround is to use a patch code branch config-from-file that writes/reads the configuration from a file. Using a smaller config works too.
See:
https://stackoverflow.com/questions/72296071/kafka-connect-tasks-empty-with-large-config
The SnapShotConsumer class could save a few lines of code using kafka-common EventSourceTable.
Currently outkey is an optional field of the command topic message in the value portion of the key=value. We should consider moving it to the key portion (it might be possible for it to remain optional - need to dig a little on optional fields in a key - could be problematic as empty field does not equal channel name, which is the logical default - maybe doesn't matter). The benefit would be that you could have the same PV map to multiple unique keys in a given output topic. You can already do the inverse, where multiple PVs map to the same key in the output topic.
Currently we place the EPICS monitor event timestamp in the Kafka message payload. This works. However, should we be using the built-in timestamp found in the Kafka message header to avoid an unnecessary additional field in each message? See:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message
Looks like even if you choose the native "CreateTime" timestamp type, the timestamp still may have some implications on topic compaction / partitioning. More investigation is needed.
Currently common errors will result in the Connector Task stopping (requiring fixing bad PV then manual restart). What we need is for the connector to log errors (possibly to Kafka topic, otherwise to log file), but keep working.
An example that causes the Connector task to stop is configuring a PV that doesn't exist or otherwise can't be reached. Doing so results in:
ERROR Error while trying to create CAJContext (org.jlab.kafka.connect.CASourceTask:218)
ERROR WorkerSourceTask{id=ca-source-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:187)
org.apache.kafka.connect.errors.ConnectException: gov.aps.jca.TimeoutException: pendIO timed out
I try to deal with large arrays. But the maximum array size that can be accessed through a channel is limited to the default value of 16384 in bytes. Is it possible to provide the max_array_bytes property in the ca-source.json file to make it easier to change the JCA default value, or is there any other good way to recommend?
We may want to figure out how to limit the distribution plugin to only package the dependency jars:
All the other dependencies are "provided" (already in the Kafka libs directory). It doesn't appear to cause an issue, but is unnecessary and possibly could be a problem.
monitor.addr.list can not be modified. I changed softioc into the real IP address of IOC, but the connection cannot be established successfully. Is its value a fixed string, or an array of strings? Base7.0.5 and Base3.15.8 were used in the test respectively.
Since moving to bitnami container some scripts throw warnings about missing links to logger jars.
Currently the connector will fail if it takes too long to startup. This is hard-coded to 10 poll attempts without reaching command topic high water mark:
epics2kafka/src/main/java/org/jlab/kafka/connect/ChannelManager.java
Lines 114 to 117 in 8318e95
This needs to be configurable as Kafka may limit batch polls to 1000 messages for example, which would mean 25,000 commands (25,000 EPICS PVs) would take 25 polls.
The new bitnami container used by the quick start does not honor CREATE_TOPICS env and example topics are auto generated with a warning. An entrypoint script should probably handle this now missing capability to avoid auto-create warnings.
If I have PVs on multiple IOCs that need to be written to KAFKA, how should the property monitor.addr.list be filled? Spaces between multiple IP addresses? It doesn't seem to support this?
The Docker image should include JLab cert to avoid having to embed the cert in the jaws-epics2kafka image:
https://github.com/JeffersonLab/jaws-effective-processor/blob/main/Dockerfile
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.