Comments (14)
The main difficulty in supporting collection types is supporting non-frozen types. In Scylla there are two types of collections/UDTs: frozen and non-frozen. When you update a frozen collection, its entire contents after the update are stored in the CDC log. On the other hand, you can partially update non-frozen collections (such as appending items to a list). In the CDC log, only the added/removed elements would be saved in such a case.
We (cc: @haaawk) have decided to not overcomplicate the generated Kafka message to accommodate those different operations in case of non-frozen collections (appending, removing, overwriting), especially since this is not what the Debezium model expects and most Sink Connectors would not support it. However, if we implemented support for postimages (#8 which we plan to do), a state of non-frozen collection/UDT after an update would be known (at the additional requirement that you have to enable postimages on your CDC table) - that way adding support for non-frozen collection types.
(You can read https://docs.scylladb.com/using-scylla/cdc/cdc-advanced-types/ for more info)
In the meantime, I have pushed (a very early) implementation of support of frozen collections: #12. To support post-images, we plan to implement a higher-level abstraction in scylla-cdc-java repo, that combines pre-images, delta and post-image rows and parses delta information of non-frozen collection updates.
from scylla-cdc-source-connector.
@avelanarius and @Lorak-mmk are working on support for frozen and non-frozen collection
from scylla-cdc-source-connector.
I’d be interested in assisting with this if no one else is.
from scylla-cdc-source-connector.
(apologies for issue title rename, wrong browser tab -> please ignore)
from scylla-cdc-source-connector.
Hi @avelanarius is there an ETA for post-image support?
Alternatively could the support of frozen collections #12 be completed and merged any time soon?
from scylla-cdc-source-connector.
To support post-images, we plan to implement a higher-level abstraction in scylla-cdc-java repo, that combines pre-images, delta and post-image rows and parses delta information of non-frozen collection updates.
@avelanarius is this already in the making, are you also looking for contributors?
Are there any dependencies on an upcoming Scylla release? (4.6+/5.0)
from scylla-cdc-source-connector.
@avelanarius @hartmut-co-uk
can we merge #12 to have support for collection type / UDT? Is there something blocking us to go ahead with this?
from scylla-cdc-source-connector.
I have done more code changes on my fork last week to accommodate using UDT with Avro, but haven't had time to test them yet.
I'll try to make time this week to progress this further.
from scylla-cdc-source-connector.
hi @hartmut-co-uk @avelanarius
can I create a fork out of #12 and use it? Did you do any testing for this or shall I do it?
from scylla-cdc-source-connector.
If I remember correctly #12 contains a performance problem - if you want to use non-merged version, then #21 should be better. It is based on #12 , supports non-frozen collections too, and doesn't have the performance problem I mentioned.
from scylla-cdc-source-connector.
track
from scylla-cdc-source-connector.
Hi, is there a plan to merge #21? we can really use this feature.
from scylla-cdc-source-connector.
+1. This is an important feature
from scylla-cdc-source-connector.
+1
from scylla-cdc-source-connector.
Related Issues (20)
- Runtime error while starting connector HOT 5
- Detected performance impact query HOT 23
- CDC log stream state (cdc$time) persisted via connect topic `connect-offsets` HOT 4
- feature request: allow to define initial ChangeAgeLimit
- Add Integration Tests using testcontainers + EmbeddedConnectCluster
- Add / use `poll.interval.ms` config option (!= `scylla.query.time.window.size`)
- Upgrade to latest and greatest log4j version
- Throw exception when integrate with io.debezium.embedded
- Pulsar compatibility
- Kafka MaskField transform problem HOT 2
- NoSuchMethodException HOT 3
- before field is null in Debezium format message
- Kafka Connect Scylla Connector tasks getting deleted from status topic HOT 6
- java.lang.NoSuchFieldError: tlm with kafka 3.*
- Kafka Connector Vulnerabilities HOT 2
- Add support for streaming preimage in kafka
- Scylla CDC Connector Tests HOT 1
- Better errors in case automatic topic creation is disabled
- MultiDC cluster, observing 100% CPU when connecting kafka debezium source conector HOT 20
- confuent-hub can't install with version 1.2.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scylla-cdc-source-connector.