Comments (2)
According to https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll
When there is a partial BQ write failure, "a list of insertErrors" will be return, which contains the index of the failed rows(I think the "index" here means index in insert row list, rather than row number in BQ table). So, we can always filter out succeed rows in our "retry logic" to eliminate duplication.
from kafka-connect-bigquery.
Another interesting source provided by @criccomini, about how BQ internally handle deduplicate
https://cloud.google.com/blog/big-data/2017/06/life-of-a-bigquery-streaming-insert
from kafka-connect-bigquery.
Related Issues (20)
- Timestamp conversion issue for big query connector HOT 2
- Support for exactly-once HOT 3
- Timestamp Partition For Date Outside Range
- GCSToBQLoadRunnable doesn't respect GCS folder HOT 2
- Allow disabling of best effort de-duplication
- BigQueryConnectException should log offsets for failed rows
- de-duplicate the data from Kafka into Bigquery on the fly HOT 1
- Replace Docker-based integration tests with embedded integration tests
- MicroTimestampConverter drops leading zero when calculating the microRemainder
- Failed to update table schema HOT 6
- Too strong config validation for clustering option
- Support for renaming fields HOT 3
- debezium.time.Time Integer cast failure.
- Topicname as tablename but kafka meta data does not show originating topic names in 2.0
- Error getting access token for service account: oauth2.googleapis.com HOT 1
- Convert io.debezium.time.MicroTimestamp to DATETIME HOT 2
- Ignore additional field in kafka record if not present in BQ table HOT 1
- Resource usage limits via configuration - Tasks crashing due to high load
- GCSToBQLoadRunnable does not detect error during load and removes blobs even though they were not loaded
- Not able to send kafka json messages to bigquery
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-connect-bigquery.