jeffersonlab / epics2kafka Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 1.05 MB

Kafka Connect Source Connector for EPICS CA

License: MIT License

Java 86.93% Dockerfile 1.76% Shell 11.31%

ace epics jaws kafka kafka-connect

epics2kafka's People

Stargazers

Watchers

epics2kafka's Issues

Investigate ways to handle no channels configured scenario

The connector reads the command topic for a list of channels to monitor, but if the list is empty (starts empty or perhaps becomes empty after users remove all channels) then currently the connector becomes very upset complaining about no work to be done and is unable to divide up the empty set of tasks among Connect Workers. This might be a reasonable thing, but it might be something we can handle more gracefully. Perhaps we can provide more concise error messages, and determine if there is a way to avoid requiring users to maintain at least one channel or else stop (or pause) the connector. Perhaps we could programmatically pause the connector and resume once a non-empty list of channels is once again provided? The real issue here is that when the connector becomes unhappy it moves to a FAILED status. For a hacky solution, see:

https://rmoff.net/2019/06/06/automatically-restarting-failed-kafka-connect-tasks/

Connect taskConfigs appears to choke on large config

If I try to add a large set of channels (25,500) the Connector will run (status RUNNING), but the tasks array will be empty (no task will be started). I'm not seeing any errors in the log file either. The workaround is to use a patch code branch config-from-file that writes/reads the configuration from a file. Using a smaller config works too.

See:
https://stackoverflow.com/questions/72296071/kafka-connect-tasks-empty-with-large-config

SnapShotConsumer should probably use EventSourceTable

The SnapShotConsumer class could save a few lines of code using kafka-common EventSourceTable.

Embedded IOC Doesn't Support Dynamic UDP Port

epics-base/jca#62

might need to create my own fork of JCA or something. Right now if something is running on standard EPICS ports (like the compose project) and try to run the build it will fail due to embedded IOC test failing to bind to fixed ports (already used).

Consider moving outkey field to command message key

Currently outkey is an optional field of the command topic message in the value portion of the key=value. We should consider moving it to the key portion (it might be possible for it to remain optional - need to dig a little on optional fields in a key - could be problematic as empty field does not equal channel name, which is the logical default - maybe doesn't matter). The benefit would be that you could have the same PV map to multiple unique keys in a given output topic. You can already do the inverse, where multiple PVs map to the same key in the output topic.

Investigate whether to use message header timestamp vs payload timestamp

Currently we place the EPICS monitor event timestamp in the Kafka message payload. This works. However, should we be using the built-in timestamp found in the Kafka message header to avoid an unnecessary additional field in each message? See:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message

Looks like even if you choose the native "CreateTime" timestamp type, the timestamp still may have some implications on topic compaction / partitioning. More investigation is needed.

Common Errors Stop Connector Task

Currently common errors will result in the Connector Task stopping (requiring fixing bad PV then manual restart). What we need is for the connector to log errors (possibly to Kafka topic, otherwise to log file), but keep working.

An example that causes the Connector task to stop is configuring a PV that doesn't exist or otherwise can't be reached. Doing so results in:

ERROR Error while trying to create CAJContext (org.jlab.kafka.connect.CASourceTask:218)
ERROR WorkerSourceTask{id=ca-source-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:187)
org.apache.kafka.connect.errors.ConnectException: gov.aps.jca.TimeoutException: pendIO timed out

Problems with transferring large arrays

I try to deal with large arrays. But the maximum array size that can be accessed through a channel is limited to the default value of 16384 in bytes. Is it possible to provide the max_array_bytes property in the ca-source.json file to make it easier to change the JCA default value, or is there any other good way to recommend?

Cleanup distribution

We may want to figure out how to limit the distribution plugin to only package the dependency jars:

epics2kafka.jar
jca-2.4.6.jar
kafka-common.jar

All the other dependencies are "provided" (already in the Kafka libs directory). It doesn't appear to cause an issue, but is unnecessary and possibly could be a problem.

monitor.addr.list can not be modified.

monitor.addr.list can not be modified. I changed softioc into the real IP address of IOC, but the connection cannot be established successfully. Is its value a fixed string, or an array of strings? Base7.0.5 and Base3.15.8 were used in the test respectively.

Scripts such as list-monitored.sh reference missing jar files

Since moving to bitnami container some scripts throw warnings about missing links to logger jars.

Startup Timeout needs to be configurable

Currently the connector will fail if it takes too long to startup. This is hard-coded to 10 poll attempts without reaching command topic high water mark:

epics2kafka/src/main/java/org/jlab/kafka/connect/ChannelManager.java

Lines 114 to 117 in 8318e95

 if(++tries > 10) { 

 // We only poll a few times before saying enough is enough. 

 throw new RuntimeException("Took too long to obtain initial list of channels"); 

 }

This needs to be configurable as Kafka may limit batch polls to 1000 messages for example, which would mean 25,000 commands (25,000 EPICS PVs) would take 25 polls.

	if(++tries > 10) {
	// We only poll a few times before saying enough is enough.
	throw new RuntimeException("Took too long to obtain initial list of channels");
	}

jeffersonlab / epics2kafka Goto Github PK

epics2kafka's People

Stargazers

Watchers

epics2kafka's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs