GithubHelp home page GithubHelp logo

apache / flink-cdc Goto Github PK

View Code? Open in Web Editor NEW
5.3K 5.3K 1.8K 34.75 MB

Flink CDC is a streaming data integration tool

Home Page: https://nightlies.apache.org/flink/flink-cdc-docs-stable

License: Apache License 2.0

Java 99.32% JavaScript 0.29% Dockerfile 0.04% Shell 0.13% C 0.16% PLSQL 0.06%
batch cdc change-data-capture data-integration data-pipeline distributed elt etl flink kafka mysql paimon postgresql real-time schema-evolution

flink-cdc's Introduction

Flink CDC

Test Release Build License

Flink CDC is a distributed data integration tool for real time data and batch data. Flink CDC brings the simplicity and elegance of data integration via YAML to describe the data movement and transformation in a Data Pipeline.

The Flink CDC prioritizes efficient end-to-end data integration and offers enhanced functionalities such as full database synchronization, sharding table synchronization, schema evolution and data transformation.

Flink CDC framework desigin

Getting Started

  1. Prepare a Apache Flink cluster and set up FLINK_HOME environment variable.
  2. Download Flink CDC tar, unzip it and put jars of pipeline connector to Flink lib directory.
  3. Create a YAML file to describe the data source and data sink, the following example synchronizes all tables under MySQL app_db database to Doris :
  source:
     type: mysql
     name: MySQL Source
     hostname: 127.0.0.1
     port: 3306
     username: admin
     password: pass
     tables: adb.\.*
     server-id: 5401-5404
  
  sink:
    type: doris
    name: Doris Sink
    fenodes: 127.0.0.1:8030
    username: root
    password: pass
  
  pipeline:
     name: MySQL to Doris Pipeline
     parallelism: 4
  1. Submit pipeline job using flink-cdc.sh script.
 bash bin/flink-cdc.sh /path/mysql-to-doris.yaml
  1. View job execution status through Flink WebUI or downstream database.

Try it out yourself with our more detailed tutorial. You can also see connector overview to view a comprehensive catalog of the connectors currently provided and understand more detailed configurations.

Join the Community

There are many ways to participate in the Apache Flink CDC community. The mailing lists are the primary place where all Flink committers are present. For user support and questions use the user mailing list. If you've found a problem of Flink CDC, please create a Flink jira and tag it with the Flink CDC tag.
Bugs and feature requests can either be discussed on the dev mailing list or on Jira.

Contributing

Welcome to contribute to Flink CDC, please see our Developer Guide and APIs Guide.

License

Apache 2.0 License.

Special Thanks

The Flink CDC community welcomes everyone who is willing to contribute, whether it's through submitting bug reports, enhancing the documentation, or submitting code contributions for bug fixes, test additions, or new feature development.
Thanks to all contributors for their enthusiastic contributions.

flink-cdc's People

Contributors

aiwenmo avatar amber1990zhang avatar ashulin avatar banmoy avatar cleverdada avatar e-mhui avatar fsk119 avatar fuyun2024 avatar gong avatar goodboy008 avatar gtk96 avatar jiabao-sun avatar joycurry30 avatar leonardbang avatar loserwang1024 avatar luoyuxia avatar lvyanquan avatar minchowang avatar molsionmo avatar patrickren avatar paul8263 avatar ruanhang1993 avatar shawn-hx avatar snuyanzin avatar teckick avatar whhe avatar wuchong avatar yuxiqian avatar zhaomin1423 avatar zhongqishang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flink-cdc's Issues

mysql 8.0.* 版本数据库加密插件 caching_sha2_password 同步失败

mysql 8.0.* 版本数据库加密插件 caching_sha2_password 同步失败
是不支持 caching_sha2_password 加密插件吗?

Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1719)
at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:74)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1699)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1681)
at org.datacloud.flinksql.cdc.FlinkMysqlCDC.main(FlinkMysqlCDC.java:28)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
at org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.lambda$getJobExecutionResult$2(PerJobMiniClusterFactory.java:186)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:892)
at akka.dispatch.OnComplete.internal(Future.scala:264)
at akka.dispatch.OnComplete.internal(Future.scala:261)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116)
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192)
at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185)
at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179)
at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503)
at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:386)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
... 4 more
Caused by: org.apache.kafka.connect.errors.ConnectException: Failed to authenticate to the MySQL database at dcmaster02:3306 with user 'root'
at io.debezium.connector.mysql.BinlogReader.doStart(BinlogReader.java:441)
at io.debezium.connector.mysql.AbstractReader.start(AbstractReader.java:116)
at io.debezium.connector.mysql.ChainedReader.startNextReader(ChainedReader.java:206)
at io.debezium.connector.mysql.ChainedReader.readerCompletedPolling(ChainedReader.java:158)
at io.debezium.connector.mysql.AbstractReader.cleanupResources(AbstractReader.java:309)
at io.debezium.connector.mysql.AbstractReader.poll(AbstractReader.java:288)
at io.debezium.connector.mysql.ChainedReader.poll(ChainedReader.java:146)
at io.debezium.connector.mysql.MySqlConnectorTask.doPoll(MySqlConnectorTask.java:443)
at io.debezium.connector.common.BaseSourceTask.poll(BaseSourceTask.java:131)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:779)
at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:170)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.github.shyiko.mysql.binlog.network.AuthenticationException: Client does not support authentication protocol requested by server; consider upgrading MySQL client
at com.github.shyiko.mysql.binlog.BinaryLogClient.authenticate(BinaryLogClient.java:728)
at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:515)
at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:860)
... 1 more

flink cdc如何保存state?

flink是如何保存大量的所有数据的?
从头到尾全部放内存吗?
如果会部分持久化到statebackend 那怎么再次用到的时候如何读取出来的呢?

Received DML 'xxx' for processing, binlog probably contains events generated with statement or mixed based replication format

Startup the job normally, but inserting or updating data, get the following exception:

2020-09-30 10:46:37.607 [debezium-engine] ERROR com.alibaba.ververica.cdc.debezium.DebeziumSourceFunction  - Reporting error:
org.apache.kafka.connect.errors.ConnectException: Received DML 'update orders set product_id = 1122 where order_number = 10001' for processing, binlog probably contains events generated with statement or mixed based replication format
at io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:230)
at io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:207)
at io.debezium.connector.mysql.BinlogReader.handleEvent(BinlogReader.java:600)
at com.github.shyiko.mysql.binlog.BinaryLogClient.notifyEventListeners(BinaryLogClient.java:1130)
at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:978)
at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:581)
at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:860)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.ConnectException: Received DML 'update orders set product_id = 1122 where order_number = 10001' for processing, binlog probably contains events generated with statement or mixed based replication format
at io.debezium.connector.mysql.BinlogReader.handleQueryEvent(BinlogReader.java:785)
at io.debezium.connector.mysql.BinlogReader.handleEvent(BinlogReader.java:583)
... 5 common frames omitted

flink-cdc sql-client

云邪老师好,我想使用sql-client flink-cdc 只用写sql的将我们上游的多张表进行聚合,打到下游mysql中,但目前测试下来,发现sql-client 即使checkpoint保存了,也无法指定checkpoint重启,那我执行group by因为其他原因需要重启,是否会导致数据出问题?这种情况有木有什么好的解决方案

如何实现精准一次性读取数据

#24 请问:我的疑问其实跟这个issue类似,我在实际运行mysql-cdc时,发现每次重启,cdc都会去把表中数据全量读取一遍,即使我已经在代码里设置了
properties.setProperty("debezium.snapshot.mode", "never"); //schema_only也是一样的
完整代码如下:
`
import com.alibaba.ververica.cdc.connectors.mysql.MySQLSource;
import com.alibaba.ververica.cdc.debezium.DebeziumSourceFunction;
import com.alibaba.ververica.cdc.debezium.StringDebeziumDeserializationSchema;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.util.Properties;

public class MySqlBinlogSourceExample {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
// properties.setProperty("snapshot.new.tables", "parallel");
// properties.setProperty("offset.flush.interval.ms", "0");
// properties.setProperty("debezium.snapshot.mode", "schema_only");
properties.setProperty("debezium.snapshot.mode", "never");

    DebeziumSourceFunction<String> sourceFunction = MySQLSource.<String>builder()
            .debeziumProperties(properties)
            .hostname("localhost")
            .port(3306)
            .databaseList("sensor_offset")
            .username("root")
            .password("123")
            .deserializer(new StringDebeziumDeserializationSchema()) // converts SourceRecord to String
            .build();

    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env
            .addSource(sourceFunction)
            .print().setParallelism(1); // use parallelism 1 for sink to keep message ordering

    env.execute();
}

}
`

我在log里发现了几个值得注意的地方,不知是否跟我的配置有关
[debezium-engine] INFO io.debezium.connector.mysql.MySqlConnectorTask - Found no existing offset, so preparing to perform a snapshot
image

mysql-cdc读取数据 多表join后 通过jdbc写入mysql报错

TaskManager log报错如下:
java.io.IOException: Writing records to JDBC failed.
at org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.writeRecord(JdbcBatchingOutputFormat.java:157)
at org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.invoke(OutputFormatSinkFunction.java:87)
at org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:52)
at org.apache.flink.table.runtime.operators.sink.SinkOperator.processElement(SinkOperator.java:86)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:717)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
at StreamExecCalc$4504.processElement(Unknown Source)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:717)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:53)
at org.apache.flink.table.runtime.operators.join.stream.StreamingJoinOperator.output(StreamingJoinOperator.java:305)
at org.apache.flink.table.runtime.operators.join.stream.StreamingJoinOperator.processElement(StreamingJoinOperator.java:278)
at org.apache.flink.table.runtime.operators.join.stream.StreamingJoinOperator.processElement1(StreamingJoinOperator.java:115)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processRecord1(StreamTwoInputProcessor.java:132)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.lambda$new$0(StreamTwoInputProcessor.java:99)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor$StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessor.java:364)
at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178)
at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:179)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.flink.table.data.GenericRowData.getInt(GenericRowData.java:149)
at org.apache.flink.table.data.RowData.lambda$createFieldGetter$245ca7d1$6(RowData.java:334)
at org.apache.flink.connector.jdbc.table.JdbcDynamicOutputFormatBuilder.getPrimaryKey(JdbcDynamicOutputFormatBuilder.java:216)
at org.apache.flink.connector.jdbc.table.JdbcDynamicOutputFormatBuilder.lambda$createRowKeyExtractor$7(JdbcDynamicOutputFormatBuilder.java:193)
at org.apache.flink.connector.jdbc.table.JdbcDynamicOutputFormatBuilder.lambda$createKeyedRowExecutor$3fd497bb$1(JdbcDynamicOutputFormatBuilder.java:128)
at org.apache.flink.connector.jdbc.internal.executor.KeyedBatchStatementExecutor.executeBatch(KeyedBatchStatementExecutor.java:71)
at org.apache.flink.connector.jdbc.internal.executor.BufferReduceStatementExecutor.executeBatch(BufferReduceStatementExecutor.java:99)
at org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.attemptFlush(JdbcBatchingOutputFormat.java:200)
at org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.flush(JdbcBatchingOutputFormat.java:171)
at org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.writeRecord(JdbcBatchingOutputFormat.java:154)
... 32 more

Print connector 打印upsert流数据如下:
image
其中,第2列为唯一键

是否因为多线程upsert流对唯一键争抢导致的?

多个sink对于checkpoint的影响

我们目前从MySQL拉取增量数据,处理后分别写入到kafka和pika,但是由于pika写入的性能远小于kafka,导致在checkpoint的时候,kafka sink会等待pika sink很长一段时间。考虑过把kafka sink和pika sink拆分成2个不同的job,但是拆分成2个job意味着有2个debezium连接mysql,会对mysql造成较大的压力。请问这种情况有什么好的解法吗,谢谢。 @wuchong

Flink Connector Postgres CDC sql-client模式下报错

env version:
flink 1.11.2
flink-connector-postgres-cdc 1.0 & 1.1

DDL:

CREATE TABLE order_cdc(
    id BIGINT,
    user_id BIGINT,
    create_time TIMESTAMP(0),
    operate_time TIMESTAMP(0),
    province_id INT,
    order_status STRING,
    total_amount DECIMAL(10, 5)
  ) WITH (
    'connector' = 'postgres-cdc',
    'hostname' = '***',
    'port' = '5432',
    'username' = 'postgres',
    'password' = '***',
    'schema-name' = 'public',
    'database-name' = 'cdc',
    'table-name' = 'order_cdc'
);

进行查询时会报错:

[ERROR] Could not execute SQL statement. Reason:
java.lang.ClassNotFoundException: com.alibaba.ververica.cdc.connectors.postgres.table.PostgresValueValidator

尝试更换了1.0以及1.1的connector jar包,都会报这个错误,请大佬帮忙看一下

Postgres Join 同步报错

Flink SQL

  • 把同一个数据库两张表使用flink join起来,结果集放到另一个表
--源表
create table store_caches
(
    id      bigint primary key,
    geo0    bigint,
    geo1    bigint,
    geo2    bigint,
    branch0 bigint,
    branch1 bigint,
    branch2 bigint
)
WITH (
    
'connector' = 'postgres-cdc',
'hostname' = '127.0.0.1',
'port' = '5432',
'username' = 'postgres',
'password' = 'postgres',
'database-name' = 'seesaw_boh_test_report',
'schema-name' = 'public'
,
    'table-name' = 'store_caches',
    'debezium.slot.name' = 'store_caches'
);

create table sales_ticket_amounts
(
    id              bigint        primary key,
    partner_id      bigint          ,
    scope_id        bigint          ,
    bus_date        date            ,
    bus_date_week   date            ,
    bus_date_month  date            ,
    bus_date_year   date            ,
    store_id        bigint          ,
    eticket_id      bigint          ,
    channel_id      bigint          ,
    channel_name    varchar(50)     ,
    order_type      varchar(15)     ,
    order_type_name varchar(15)     ,
    order_time      timestamp       ,
    refunded        boolean         ,
    gross_amount    numeric(38, 16) ,
    net_amount      numeric(38, 16) ,
    discount_amount numeric(38, 16) ,
    tip             numeric(38, 16) ,
    package_fee     numeric(38, 16) ,
    delivery_fee    numeric(38, 16) ,
    service_fee     numeric(38, 16) ,
    tax_fee         numeric(38, 16) ,
    other_fee       numeric(38, 16) ,
    pay_amount      numeric(38, 16) ,
    rounding        numeric(38, 16) ,
    overflow_amount numeric(38, 16) ,
    change_amount   numeric(38, 16) ,
    product_count   bigint          ,
    accessory_count bigint          ,
    eticket_count   bigint          ,
    created         TIMESTAMP       
)
WITH (
    
'connector' = 'postgres-cdc',
'hostname' = '127.0.0.1',
'port' = '5432',
'username' = 'postgres',
'password' = 'postgres',
'database-name' = 'seesaw_boh_test_report',
'schema-name' = 'public'
,
    'table-name' = 'sales_ticket_amounts',
    'debezium.slot.name' = 'sales_ticket_amounts'
);


--目标表


create table store_sales
(
bus_date date,
store_id bigint,
gross_amount numeric(38, 16),
net_amount numeric(38, 16),
discount_amount numeric(38, 16),
tip numeric(38, 16),
package_fee numeric(38, 16),
delivery_fee numeric(38, 16),
service_fee numeric(38, 16),
tax_fee numeric(38, 16),
other_fee numeric(38, 16),
pay_amount numeric(38, 16),
rounding numeric(38, 16),
overflow_amount numeric(38, 16),
change_amount numeric(38, 16),
order_count bigint,
product_count bigint,
accessory_count bigint,
gross_amount_returned numeric(38, 16),
net_amount_returned numeric(38, 16),
discount_amount_returned numeric(38, 16),
tip_returned numeric(38, 16),
package_fee_returned numeric(38, 16),
delivery_fee_returned numeric(38, 16),
service_fee_returned numeric(38, 16),
tax_fee_returned numeric(38, 16),
other_fee_returned numeric(38, 16),
pay_amount_returned numeric(38, 16),
rounding_returned numeric(38, 16),
overflow_amount_returned numeric(38, 16),
change_amount_returned numeric(38, 16),
order_count_returned bigint,
PRIMARY KEY(bus_date, store_id) NOT ENFORCED
)
WITH (
    
'connector' = 'jdbc',
'url' = 'jdbc:postgresql://127.0.0.1:5432/seesaw_boh_test_report',
'username' = 'postgres',
'password' = 'postgres'
,
    'table-name' = 'store_sales'
)

--Flink Job
INSERT INTO store_sales(
    bus_date,store_id,gross_amount,net_amount,discount_amount,tip,package_fee,delivery_fee,
    service_fee,tax_fee,other_fee,pay_amount,rounding,overflow_amount,change_amount,order_count,
    product_count,accessory_count,gross_amount_returned,net_amount_returned,discount_amount_returned,
    tip_returned,package_fee_returned,delivery_fee_returned,service_fee_returned,tax_fee_returned,
    other_fee_returned,pay_amount_returned,rounding_returned,overflow_amount_returned,change_amount_returned,
    order_count_returned
)
SELECT
    sales_ticket_amounts.bus_date AS bus_date,
    store_caches.id AS store_id,
    SUM(CASE WHEN refunded THEN 0 ELSE gross_amount END) AS gross_amount,
    SUM(net_amount) AS net_amount,
    SUM(CASE WHEN refunded THEN 0 ELSE discount_amount END) AS discount_amount,
    SUM(CASE WHEN refunded THEN 0 ELSE tip END) AS tip,
    SUM(CASE WHEN refunded THEN 0 ELSE package_fee END) AS package_fee,
    SUM(CASE WHEN refunded THEN 0 ELSE delivery_fee END) AS delivery_fee,
    SUM(CASE WHEN refunded THEN 0 ELSE service_fee END) AS service_fee,
    SUM(CASE WHEN refunded THEN 0 ELSE tax_fee END) AS tax_fee,
    SUM(CASE WHEN refunded THEN 0 ELSE other_fee END) AS other_fee,
    SUM(CASE WHEN refunded THEN 0 ELSE pay_amount END) AS pay_amount,
    SUM(CASE WHEN refunded THEN 0 ELSE rounding END) AS rounding,
    SUM(CASE WHEN refunded THEN 0 ELSE overflow_amount END) AS overflow_amount,
    SUM(CASE WHEN refunded THEN 0 ELSE change_amount END) AS change_amount,
    SUM(CASE WHEN refunded THEN 0 ELSE eticket_count END) AS order_count,
    SUM(CASE WHEN refunded THEN 0 ELSE product_count END) AS product_count,
    SUM(CASE WHEN refunded THEN 0 ELSE accessory_count END) AS accessory_count,
    SUM(CASE WHEN refunded THEN gross_amount ELSE 0 END) AS gross_amount_returned,
    SUM(CASE WHEN refunded THEN net_amount ELSE 0 END) AS net_amount_returned,
    SUM(CASE WHEN refunded THEN discount_amount ELSE 0 END) AS discount_amount_returned,
    SUM(CASE WHEN refunded THEN tip ELSE 0 END) AS tip_returned,
    SUM(CASE WHEN refunded THEN package_fee ELSE 0 END) AS package_fee_returned,
    SUM(CASE WHEN refunded THEN delivery_fee ELSE 0 END) AS delivery_fee_returned,
    SUM(CASE WHEN refunded THEN service_fee ELSE 0 END) AS service_fee_returned,
    SUM(CASE WHEN refunded THEN tax_fee ELSE 0 END) AS tax_fee_returned,
    SUM(CASE WHEN refunded THEN other_fee ELSE 0 END) AS other_fee_returned,
    SUM(CASE WHEN refunded THEN pay_amount ELSE 0 END) AS pay_amount_returned,
    SUM(CASE WHEN refunded THEN rounding ELSE 0 END) AS rounding_returned,
    SUM(CASE WHEN refunded THEN overflow_amount ELSE 0 END) AS overflow_amount_returned,
    SUM(CASE WHEN refunded THEN change_amount ELSE 0 END) AS change_amount_returned,
    SUM(CASE WHEN refunded THEN eticket_count ELSE 0 END) AS order_count_returned
FROM
    sales_ticket_amounts
        LEFT JOIN store_caches ON sales_ticket_amounts.store_id = store_caches.id
GROUP BY
    sales_ticket_amounts.bus_date,
    store_caches.id

版本

  • Java : 1.8.0_181
  • System: MacOS 10.15.5
  • Flink : 1.11.1
  • Flink-CDC:
flink-sql-connector-postgres-cdc-1.1.0-SNAPSHOT.jar
flink-format-changelog-json-1.0.0.jar

报错信息

image

⋊> ~/D/f/log cat flink-dusen-sql-client-dusendeMacBook-Pro.local.log                                                                 11:09:13
2020-08-26 11:08:39,060 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, localhost
2020-08-26 11:08:39,064 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2020-08-26 11:08:39,064 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2020-08-26 11:08:39,064 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2020-08-26 11:08:39,064 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 3
2020-08-26 11:08:39,064 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 3
2020-08-26 11:08:39,065 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-08-26 11:08:39,100 INFO  org.apache.flink.core.fs.FileSystem                          [] - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2020-08-26 11:08:44,160 INFO  org.apache.flink.client.cli.CliFrontend                      [] - Loading FallbackYarnSessionCli
2020-08-26 11:08:44,161 INFO  org.apache.flink.table.client.gateway.local.LocalExecutor    [] - Using default environment file: file:/Users/dusen/Downloads/flink-1.11.1/conf/sql-client-defaults.yaml
2020-08-26 11:08:44,397 INFO  org.apache.flink.table.client.config.entries.ExecutionEntry  [] - Property 'execution.restart-strategy.type' not specified. Using default value: fallback
2020-08-26 11:08:45,246 INFO  org.apache.flink.table.client.gateway.local.ExecutionContext [] - Executor config: {taskmanager.memory.process.size=1728m, jobmanager.execution.failover-strategy=region, jobmanager.rpc.address=localhost, execution.target=remote, jobmanager.memory.process.size=1600m, jobmanager.rpc.port=6123, execution.savepoint.ignore-unclaimed-state=false, execution.attached=true, execution.shutdown-on-attached-exit=false, pipeline.jars=[file:/Users/dusen/Downloads/flink-1.11.1/opt/flink-sql-client_2.11-1.11.1.jar], parallelism.default=3, taskmanager.numberOfTaskSlots=3, pipeline.classpaths=[]}
2020-08-26 11:08:45,249 INFO  org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] - Could not load factory due to missing dependencies.
2020-08-26 11:08:45,469 INFO  org.apache.flink.table.client.cli.CliClient                  [] - Command history file path: /Users/dusen/.flink-sql-history
2020-08-26 11:09:10,518 WARN  org.apache.flink.table.client.cli.CliClient                  [] - Could not execute SQL statement.
org.apache.flink.table.client.gateway.SqlExecutionException: Invalid SQL statement.
	at org.apache.flink.table.client.gateway.local.LocalExecutor.executeUpdateInternal(LocalExecutor.java:579) ~[flink-sql-client_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.client.gateway.local.LocalExecutor.executeUpdate(LocalExecutor.java:515) ~[flink-sql-client_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.client.cli.CliClient.callInsert(CliClient.java:596) ~[flink-sql-client_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.client.cli.CliClient.callCommand(CliClient.java:315) ~[flink-sql-client_2.11-1.11.1.jar:1.11.1]
	at java.util.Optional.ifPresent(Optional.java:159) [?:1.8.0_181]
	at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:212) [flink-sql-client_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) [flink-sql-client_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) [flink-sql-client_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) [flink-sql-client_2.11-1.11.1.jar:1.11.1]
Caused by: java.lang.NoSuchMethodError: com.alibaba.ververica.cdc.debezium.table.RowDataDebeziumDeserializeSchema.<init>(Lorg/apache/flink/table/types/logical/RowType;Lorg/apache/flink/api/common/typeinfo/TypeInformation;Lcom/alibaba/ververica/cdc/debezium/table/RowDataDebeziumDeserializeSchema$ValueValidator;Ljava/time/ZoneId;)V
	at com.alibaba.ververica.cdc.connectors.postgres.table.PostgreSQLTableSource.getScanRuntimeProvider(PostgreSQLTableSource.java:101) ~[flink-sql-connector-postgres-cdc-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
	at org.apache.flink.table.planner.plan.nodes.common.CommonPhysicalTableSourceScan.createSourceTransformation(CommonPhysicalTableSourceScan.scala:69) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlanInternal(StreamExecTableSourceScan.scala:91) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlanInternal(StreamExecTableSourceScan.scala:44) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlan(StreamExecTableSourceScan.scala:44) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:54) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:39) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalcBase.translateToPlan(StreamExecCalcBase.scala:38) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlanInternal(StreamExecExchange.scala:76) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlanInternal(StreamExecExchange.scala:44) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlan(StreamExecExchange.scala:44) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecJoin.translateToPlanInternal(StreamExecJoin.scala:121) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecJoin.translateToPlanInternal(StreamExecJoin.scala:49) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecJoin.translateToPlan(StreamExecJoin.scala:49) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:54) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:39) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalcBase.translateToPlan(StreamExecCalcBase.scala:38) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlanInternal(StreamExecExchange.scala:76) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlanInternal(StreamExecExchange.scala:44) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlan(StreamExecExchange.scala:44) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecGroupAggregate.translateToPlanInternal(StreamExecGroupAggregate.scala:119) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecGroupAggregate.translateToPlanInternal(StreamExecGroupAggregate.scala:52) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecGroupAggregate.translateToPlan(StreamExecGroupAggregate.scala:52) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:79) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:43) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlan(StreamExecSink.scala:43) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:67) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
	at scala.collection.Iterator$class.foreach(Iterator.scala:891) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
	at scala.collection.AbstractTraversable.map(Traversable.scala:104) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:66) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:166) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1264) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.api.internal.TableEnvironmentImpl.translateAndClearBuffer(TableEnvironmentImpl.java:1256) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.getPipeline(StreamTableEnvironmentImpl.java:327) ~[flink-table-blink_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$createPipeline$1(ExecutionContext.java:284) ~[flink-sql-client_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255) ~[flink-sql-client_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.client.gateway.local.ExecutionContext.createPipeline(ExecutionContext.java:281) ~[flink-sql-client_2.11-1.11.1.jar:1.11.1]
	at org.apache.flink.table.client.gateway.local.LocalExecutor.executeUpdateInternal(LocalExecutor.java:576) ~[flink-sql-client_2.11-1.11.1.jar:1.11.1]
	... 8 more

Postgres Delete/Update操作未正常同步

问题描述

  • 当我在 PG 中创建了对应的数据库和表,也在 Flink 中创建了对应的表,之后提交了一个Flink 的 Job。我发现Insert的数据可以正常同步,但是我通过Delete 和 Update的数据不能正常同步。

环境

Postgres 建表语句

create database seesaw_boh_test_report;

\c seesaw_boh_test_report;

create table store_caches
(
    id      bigserial not null primary key,
    geos    bigint[],
    geo0    bigint,
    geo1    bigint,
    geo2    bigint,
    branchs bigint[],
    branch0 bigint,
    branch1 bigint,
    branch2 bigint
);
create table sales_ticket_amounts
(
    id              bigserial                not null primary key,
    partner_id      bigint                   not null,
    scope_id        bigint                   not null,
    bus_date        date                     not null,
    bus_date_week   date                     not null,
    bus_date_month  date                     not null,
    bus_date_year   date                     not null,
    store_id        bigint                   not null,
    eticket_id      bigint                   not null unique,
    channel_id      bigint                   not null,
    channel_name    varchar(50)              not null,
    order_type      varchar(15)              not null,
    order_type_name varchar(15)              not null,
    order_time      timestamp                not null,
    refunded        boolean                  not null,
    gross_amount    numeric(38, 16)          not null,
    net_amount      numeric(38, 16)          not null,
    discount_amount numeric(38, 16)          not null,
    tip             numeric(38, 16)          not null,
    package_fee     numeric(38, 16)          not null,
    delivery_fee    numeric(38, 16)          not null,
    service_fee     numeric(38, 16)          not null,
    tax_fee         numeric(38, 16)          not null,
    other_fee       numeric(38, 16)          not null,
    pay_amount      numeric(38, 16)          not null,
    rounding        numeric(38, 16)          not null,
    overflow_amount numeric(38, 16)          not null,
    change_amount   numeric(38, 16)          not null,
    product_count   bigint                   not null,
    accessory_count bigint                   not null,
    eticket_count   bigint                   not null,
    created         TIMESTAMP                not null
);

-- store sales
create table store_sales
(
    bus_date date,
    store_id   bigint,
    gross_amount numeric(38, 16),
    net_amount numeric(38, 16),
    discount_amount numeric(38, 16),
    tip numeric(38, 16),
    package_fee numeric(38, 16),
    delivery_fee numeric(38, 16),
    service_fee numeric(38, 16),
    tax_fee numeric(38, 16),
    other_fee numeric(38, 16),
    pay_amount numeric(38, 16),
    rounding numeric(38, 16),
    overflow_amount numeric(38, 16),
    change_amount numeric(38, 16),
    order_count bigint,
    product_count bigint,
    accessory_count bigint,
    gross_amount_returned numeric(38, 16),
    net_amount_returned numeric(38, 16),
    discount_amount_returned numeric(38, 16),
    tip_returned numeric(38, 16),
    package_fee_returned numeric(38, 16),
    delivery_fee_returned numeric(38, 16),
    service_fee_returned numeric(38, 16),
    tax_fee_returned numeric(38, 16),
    other_fee_returned numeric(38, 16),
    pay_amount_returned numeric(38, 16),
    rounding_returned numeric(38, 16),
    overflow_amount_returned numeric(38, 16),
    change_amount_returned numeric(38, 16),
    order_count_returned bigint,
    primary key (bus_date, store_id)
)

Postgres 测试数据

INSERT INTO public.store_caches (id, geos, geo0, geo1, geo2, branchs, branch0, branch1, branch2) VALUES (4373457990955565057, '{4372623784113340417}', 4372623784113340417, 0, 0, '{4373058638818836481,4373058975772442625}', 4373058638818836481, 4373058975772442625, 0);
INSERT INTO public.store_caches (id, geos, geo0, geo1, geo2, branchs, branch0, branch1, branch2) VALUES (4374825247166169089, '{4372623784113340417}', 4372623784113340417, 0, 0, '{4373058638818836481}', 4373058638818836481, 0, 0);
INSERT INTO public.store_caches (id, geos, geo0, geo1, geo2, branchs, branch0, branch1, branch2) VALUES (4374825441484079105, '{4372623784113340417}', 4372623784113340417, 0, 0, '{4373058638818836481}', 4373058638818836481, 0, 0);
INSERT INTO public.store_caches (id, geos, geo0, geo1, geo2, branchs, branch0, branch1, branch2) VALUES (4374826876435169281, '{4372623784113340417}', 4372623784113340417, 0, 0, '{4373058638818836481}', 4373058638818836481, 0, 0);
INSERT INTO public.store_caches (id, geos, geo0, geo1, geo2, branchs, branch0, branch1, branch2) VALUES (4374828784109486081, '{4372623784113340417}', 4372623784113340417, 0, 0, '{4373058638818836481}', 4373058638818836481, 0, 0);
INSERT INTO public.store_caches (id, geos, geo0, geo1, geo2, branchs, branch0, branch1, branch2) VALUES (4374830781730652161, '{4372623784113340417}', 4372623784113340417, 0, 0, '{4373058638818836481,4373059076356046849,4395459768047632385}', 4373058638818836481, 4373059076356046849, 4395459768047632385);
INSERT INTO public.sales_ticket_amounts (id, partner_id, scope_id, bus_date, bus_date_week, bus_date_month, bus_date_year, store_id, eticket_id, channel_id, channel_name, order_type, order_type_name, order_time, refunded, gross_amount, net_amount, discount_amount, tip, package_fee, delivery_fee, service_fee, tax_fee, other_fee, pay_amount, rounding, overflow_amount, change_amount, product_count, accessory_count, eticket_count, created) VALUES (4388631539257212928, 0, 0, '0001-01-01', '0001-01-01', '0001-01-01', '0001-01-01', 4373457990955565057, 20200728042, 4378501562742341633, 'koubei', 'DINEIN', 'DINEIN', '0001-01-01 00:00:00.000000', false, 74.0000000000000000, 69.0000000000000000, 5.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 70.0000000000000000, 0.0000000000000000, 0.0000000000000000, 1.0000000000000000, 3, 3, 1, '2020-08-05 07:34:55.754688');
INSERT INTO public.sales_ticket_amounts (id, partner_id, scope_id, bus_date, bus_date_week, bus_date_month, bus_date_year, store_id, eticket_id, channel_id, channel_name, order_type, order_type_name, order_time, refunded, gross_amount, net_amount, discount_amount, tip, package_fee, delivery_fee, service_fee, tax_fee, other_fee, pay_amount, rounding, overflow_amount, change_amount, product_count, accessory_count, eticket_count, created) VALUES (4388636043784519680, 0, 0, '2020-08-05', '2020-08-07', '2020-08-01', '2020-01-01', 4373457990955565057, 2020072804211, 4378501562742341633, 'koubei', 'DINEIN', 'DINEIN', '2020-08-05 12:30:00.000000', false, 74.0000000000000000, 69.0000000000000000, 5.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 70.0000000000000000, 0.0000000000000000, 0.0000000000000000, 1.0000000000000000, 3, 3, 1, '2020-08-05 07:52:49.752072');
INSERT INTO public.sales_ticket_amounts (id, partner_id, scope_id, bus_date, bus_date_week, bus_date_month, bus_date_year, store_id, eticket_id, channel_id, channel_name, order_type, order_type_name, order_time, refunded, gross_amount, net_amount, discount_amount, tip, package_fee, delivery_fee, service_fee, tax_fee, other_fee, pay_amount, rounding, overflow_amount, change_amount, product_count, accessory_count, eticket_count, created) VALUES (4388640055850205184, 4183192445833445399, 0, '2020-08-05', '2020-08-07', '2020-08-01', '2020-01-01', 4373457990955565057, 20200728042111, 4378501562742341633, 'koubei', 'DINEIN', 'DINEIN', '2020-08-05 12:30:00.000000', false, 74.0000000000000000, 69.0000000000000000, 5.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 70.0000000000000000, 0.0000000000000000, 0.0000000000000000, 1.0000000000000000, 3, 3, 1, '2020-08-05 08:08:46.263721');
INSERT INTO public.sales_ticket_amounts (id, partner_id, scope_id, bus_date, bus_date_week, bus_date_month, bus_date_year, store_id, eticket_id, channel_id, channel_name, order_type, order_type_name, order_time, refunded, gross_amount, net_amount, discount_amount, tip, package_fee, delivery_fee, service_fee, tax_fee, other_fee, pay_amount, rounding, overflow_amount, change_amount, product_count, accessory_count, eticket_count, created) VALUES (4388632714543136768, 0, 0, '2020-08-05', '2020-08-07', '2020-08-01', '2020-01-01', 4373457990955565057, 202007280421, 4378501562742341633, 'koubei', 'DINEIN', 'DINEIN', '2020-08-05 12:30:00.000000', false, 74.0000000000000000, 69.0000000000000000, 5.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 70.0000000000000000, 0.0000000000000000, 0.0000000000000000, 1.0000000000000000, 3, 3, 1, '2020-08-05 07:39:35.997823');
INSERT INTO public.sales_ticket_amounts (id, partner_id, scope_id, bus_date, bus_date_week, bus_date_month, bus_date_year, store_id, eticket_id, channel_id, channel_name, order_type, order_type_name, order_time, refunded, gross_amount, net_amount, discount_amount, tip, package_fee, delivery_fee, service_fee, tax_fee, other_fee, pay_amount, rounding, overflow_amount, change_amount, product_count, accessory_count, eticket_count, created) VALUES (4388673806802124800, 4183192445833445399, 0, '2020-08-05', '2020-08-03', '2020-08-01', '2020-01-01', 4374830781730652161, 1234567, 4383963285460877313, 'INVALID', 'DINEIN', 'DINEIN', '2020-08-05 13:20:07.000000', false, 48.0000000000000000, 48.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 48.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 1, 0, 1, '2020-08-05 10:22:53.157638');
INSERT INTO public.sales_ticket_amounts (id, partner_id, scope_id, bus_date, bus_date_week, bus_date_month, bus_date_year, store_id, eticket_id, channel_id, channel_name, order_type, order_type_name, order_time, refunded, gross_amount, net_amount, discount_amount, tip, package_fee, delivery_fee, service_fee, tax_fee, other_fee, pay_amount, rounding, overflow_amount, change_amount, product_count, accessory_count, eticket_count, created) VALUES (4388645720765267968, 4183192445833445399, 0, '2020-08-05', '2020-08-03', '2020-08-01', '2020-01-01', 4373457990955565057, 202007280421111, 4378501562742341633, 'koubei', 'DINEIN', 'DINEIN', '2020-08-05 12:30:00.000000', false, 74.0000000000000000, 69.0000000000000000, 5.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 70.0000000000000000, 0.0000000000000000, 0.0000000000000000, 1.0000000000000000, 3, 3, 1, '2020-08-05 08:31:16.924693');
INSERT INTO public.sales_ticket_amounts (id, partner_id, scope_id, bus_date, bus_date_week, bus_date_month, bus_date_year, store_id, eticket_id, channel_id, channel_name, order_type, order_type_name, order_time, refunded, gross_amount, net_amount, discount_amount, tip, package_fee, delivery_fee, service_fee, tax_fee, other_fee, pay_amount, rounding, overflow_amount, change_amount, product_count, accessory_count, eticket_count, created) VALUES (4388641911280271360, 4183192445833445399, 0, '2020-08-05', '2020-08-07', '2020-08-01', '2020-01-01', 4374825247166169089, 0, 4378501476008329217, 'eleme', 'TAKEOUT', 'TAKEOUT', '2020-08-05 14:11:17.000000', false, 38.0000000000000000, 26.7000007629394530, 11.3000001907348630, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 26.7000007629394530, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 1, 0, 1, '2020-08-05 08:16:08.635261');
INSERT INTO public.sales_ticket_amounts (id, partner_id, scope_id, bus_date, bus_date_week, bus_date_month, bus_date_year, store_id, eticket_id, channel_id, channel_name, order_type, order_type_name, order_time, refunded, gross_amount, net_amount, discount_amount, tip, package_fee, delivery_fee, service_fee, tax_fee, other_fee, pay_amount, rounding, overflow_amount, change_amount, product_count, accessory_count, eticket_count, created) VALUES (4388650429982998528, 4183192445833445399, 0, '2020-08-05', '2020-08-03', '2020-08-01', '2020-01-01', 4374825441484079105, 9223372036854775807, 4383963285460877313, 'INVALID', 'DINEIN', 'DINEIN', '2020-08-05 16:49:58.000000', false, 90.0000000000000000, 90.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 90.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 2, 0, 1, '2020-08-05 08:49:59.661589');
INSERT INTO public.sales_ticket_amounts (id, partner_id, scope_id, bus_date, bus_date_week, bus_date_month, bus_date_year, store_id, eticket_id, channel_id, channel_name, order_type, order_type_name, order_time, refunded, gross_amount, net_amount, discount_amount, tip, package_fee, delivery_fee, service_fee, tax_fee, other_fee, pay_amount, rounding, overflow_amount, change_amount, product_count, accessory_count, eticket_count, created) VALUES (4389379207717748736, 4183192445833445399, 0, '2020-08-07', '2020-08-03', '2020-08-01', '2020-01-01', 4374828784109486081, 4389379206568509440, 4383963285460877313, 'INVALID', 'DINEIN', 'DINEIN', '2020-08-07 16:10:11.000000', false, 113.0000000000000000, 113.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 113.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 3, 0, 1, '2020-08-07 09:05:53.849608');
INSERT INTO public.sales_ticket_amounts (id, partner_id, scope_id, bus_date, bus_date_week, bus_date_month, bus_date_year, store_id, eticket_id, channel_id, channel_name, order_type, order_type_name, order_time, refunded, gross_amount, net_amount, discount_amount, tip, package_fee, delivery_fee, service_fee, tax_fee, other_fee, pay_amount, rounding, overflow_amount, change_amount, product_count, accessory_count, eticket_count, created) VALUES (4389379207168294912, 4183192445833445399, 0, '2020-08-07', '2020-08-03', '2020-08-01', '2020-01-01', 4374826876435169281, 4389379205146640384, 4383963285460877313, 'INVALID', 'SELFHELP', 'SELFHELP', '2020-08-07 12:18:01.000000', false, 33.0000000000000000, 0.0000000000000000, 33.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 0.0000000000000000, 1, 0, 1, '2020-08-07 09:05:53.717730');

Flink 建表

create table store_caches
(
    id      bigint primary key,
    geo0    bigint,
    geo1    bigint,
    geo2    bigint,
    branch0 bigint,
    branch1 bigint,
    branch2 bigint
)
WITH (
    'connector' = 'postgres-cdc',
    'hostname' = '127.0.0.1',
    'port' = '5432',
    'username' = 'postgres',
    'password' = 'postgres',
    'database-name' = 'seesaw_boh_test_report',
    'schema-name' = 'public',
    'table-name' = 'store_caches',
    'debezium.slot.name' = 'store_caches'
);

 create table sales_ticket_amounts
(
    id              bigint        primary key,
    partner_id      bigint          ,
    scope_id        bigint          ,
    bus_date        date            ,
    bus_date_week   date            ,
    bus_date_month  date            ,
    bus_date_year   date            ,
    store_id        bigint          ,
    eticket_id      bigint          ,
    channel_id      bigint          ,
    channel_name    varchar(50)     ,
    order_type      varchar(15)     ,
    order_type_name varchar(15)     ,
    order_time      timestamp       ,
    refunded        boolean         ,
    gross_amount    numeric(38, 16) ,
    net_amount      numeric(38, 16) ,
    discount_amount numeric(38, 16) ,
    tip             numeric(38, 16) ,
    package_fee     numeric(38, 16) ,
    delivery_fee    numeric(38, 16) ,
    service_fee     numeric(38, 16) ,
    tax_fee         numeric(38, 16) ,
    other_fee       numeric(38, 16) ,
    pay_amount      numeric(38, 16) ,
    rounding        numeric(38, 16) ,
    overflow_amount numeric(38, 16) ,
    change_amount   numeric(38, 16) ,
    product_count   bigint          ,
    accessory_count bigint          ,
    eticket_count   bigint          ,
    created         TIMESTAMP       
)
WITH (
    'connector' = 'postgres-cdc',
    'hostname' = '127.0.0.1',
    'port' = '5432',
    'username' = 'postgres',
    'password' = 'postgres',
    'database-name' = 'seesaw_boh_test_report',
    'schema-name' = 'public',
    'table-name' = 'sales_ticket_amounts',
    'debezium.slot.name' = 'sales_ticket_amounts'
);

create table store_sales
(
    bus_date date,
    store_id bigint,
    gross_amount numeric(38, 16),
    net_amount numeric(38, 16),
    discount_amount numeric(38, 16),
    tip numeric(38, 16),
    package_fee numeric(38, 16),
    delivery_fee numeric(38, 16),
    service_fee numeric(38, 16),
    tax_fee numeric(38, 16),
    other_fee numeric(38, 16),
    pay_amount numeric(38, 16),
    rounding numeric(38, 16),
    overflow_amount numeric(38, 16),
    change_amount numeric(38, 16),
    order_count bigint,
    product_count bigint,
    accessory_count bigint,
    gross_amount_returned numeric(38, 16),
    net_amount_returned numeric(38, 16),
    discount_amount_returned numeric(38, 16),
    tip_returned numeric(38, 16),
    package_fee_returned numeric(38, 16),
    delivery_fee_returned numeric(38, 16),
    service_fee_returned numeric(38, 16),
    tax_fee_returned numeric(38, 16),
    other_fee_returned numeric(38, 16),
    pay_amount_returned numeric(38, 16),
    rounding_returned numeric(38, 16),
    overflow_amount_returned numeric(38, 16),
    change_amount_returned numeric(38, 16),
    order_count_returned bigint,
    PRIMARY KEY(bus_date, store_id) NOT ENFORCED
)
WITH (
    'connector' = 'jdbc',
    'url' = 'jdbc:postgresql://127.0.0.1:5432/seesaw_boh_test_report',
    'username' = 'postgres',
    'password' = 'postgres',
    'table-name' = 'store_sales'
);

Flink 添加 Job

INSERT INTO store_sales(
    bus_date,store_id,gross_amount,net_amount,discount_amount,tip,package_fee,delivery_fee,
    service_fee,tax_fee,other_fee,pay_amount,rounding,overflow_amount,change_amount,order_count,
    product_count,accessory_count,gross_amount_returned,net_amount_returned,discount_amount_returned,
    tip_returned,package_fee_returned,delivery_fee_returned,service_fee_returned,tax_fee_returned,
    other_fee_returned,pay_amount_returned,rounding_returned,overflow_amount_returned,change_amount_returned,
    order_count_returned
)
SELECT
    sales_ticket_amounts.bus_date AS bus_date,
    store_caches.id AS store_id,
    SUM(gross_amount) AS gross_amount,
    SUM(net_amount) AS net_amount,
    SUM(discount_amount) AS discount_amount,
    SUM(tip) AS tip,
    SUM(package_fee) AS package_fee,
    SUM(delivery_fee) AS delivery_fee,
    SUM(service_fee) AS service_fee,
    SUM(tax_fee) AS tax_fee,
    SUM(other_fee) AS other_fee,
    SUM(pay_amount) AS pay_amount,
    SUM(rounding) AS rounding,
    SUM(overflow_amount) AS overflow_amount,
    SUM(CASE WHEN refunded THEN 0 ELSE change_amount END) AS change_amount,
    SUM(CASE WHEN refunded THEN 0 ELSE eticket_count END) AS order_count,
    SUM(CASE WHEN refunded THEN 0 ELSE product_count END) AS product_count,
    SUM(CASE WHEN refunded THEN 0 ELSE accessory_count END) AS accessory_count,
    SUM(CASE WHEN refunded THEN gross_amount ELSE 0 END) AS gross_amount_returned,
    SUM(CASE WHEN refunded THEN net_amount ELSE 0 END) AS net_amount_returned,
    SUM(CASE WHEN refunded THEN discount_amount ELSE 0 END) AS discount_amount_returned,
    SUM(CASE WHEN refunded THEN tip ELSE 0 END) AS tip_returned,
    SUM(CASE WHEN refunded THEN package_fee ELSE 0 END) AS package_fee_returned,
    SUM(CASE WHEN refunded THEN delivery_fee ELSE 0 END) AS delivery_fee_returned,
    SUM(CASE WHEN refunded THEN service_fee ELSE 0 END) AS service_fee_returned,
    SUM(CASE WHEN refunded THEN tax_fee ELSE 0 END) AS tax_fee_returned,
    SUM(CASE WHEN refunded THEN other_fee ELSE 0 END) AS other_fee_returned,
    SUM(CASE WHEN refunded THEN pay_amount ELSE 0 END) AS pay_amount_returned,
    SUM(CASE WHEN refunded THEN rounding ELSE 0 END) AS rounding_returned,
    SUM(CASE WHEN refunded THEN overflow_amount ELSE 0 END) AS overflow_amount_returned,
    SUM(CASE WHEN refunded THEN change_amount ELSE 0 END) AS change_amount_returned,
    SUM(CASE WHEN refunded THEN eticket_count ELSE 0 END) AS order_count_returned
FROM
    sales_ticket_amounts
        LEFT JOIN store_caches ON sales_ticket_amounts.store_id = store_caches.id
GROUP BY
    sales_ticket_amounts.bus_date,
    store_caches.id

执行sql的时候报错呀 麻烦给看下

INSERT INTO enriched_orders SELECT o.*, p.name, p.description, s.shipment_id, s.origin, s.destination, s.is_arrived FROM orders AS o LEFT JOIN products AS p ON o.product_id = p.id LEFT JOIN shipments AS s ON o.order_id = s.order_id

2020-08-11 13:35:41 org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:386) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.NoSuchMethodError: 'void sun.misc.Unsafe.monitorEnter(java.lang.Object)' at com.alibaba.ververica.cdc.debezium.internal.DebeziumChangeConsumer.handleBatch(DebeziumChangeConsumer.java:101) at io.debezium.embedded.ConvertingEngineBuilder.lambda$notifying$2(ConvertingEngineBuilder.java:81) at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:810) at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:170) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)

flink CDC监控mysql表,表更新报错Encountered change event for table test.student whose schema isn't known to this connector

我尝试使用flink cdc的官方案例,代码如下
`
import com.alibaba.ververica.cdc.connectors.mysql.MySQLSource;
import com.alibaba.ververica.cdc.debezium.StringDebeziumDeserializationSchema;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;

import java.util.Properties;

public class MySqlBinlogSourceExample {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.setProperty("database.history.kafka.bootstrap.servers", "kafka1:9092");
properties.setProperty("database.history.kafka.topic", "demo1");
properties.setProperty("snapshot.new.tables", "parallel");
properties.setProperty("key.converter", "org.apache.kafka.connect.json.JsonConverter");
properties.setProperty("value.converter", "org.apache.kafka.connect.json.JsonConverter");
properties.setProperty("decimal.handling.mode", "string");
// properties.setProperty("inconsistent.schema.handling.mode","warn");// 不报错,但是不显示binlog
// properties.setProperty("snapshot.mode","schema_only_recovery");

    SourceFunction<String> sourceFunction = MySQLSource.<String>builder()
            .debeziumProperties(properties)
            .hostname("localhost")
            .port(3306)

// .databaseList("Test")
.tableList("Test.student")// monitor all tables under inventory database
.username("root")
.password("123")
.deserializer(new StringDebeziumDeserializationSchema()) // converts SourceRecord to String
.build();

    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

    env
            .addSource(sourceFunction)
            .print().setParallelism(1); // use parallelism 1 for sink to keep message ordering

    env.execute();
}

}
`
依赖如下:

org.apache.flink flink-clients_${scala.binary.version} ${flink.version} org.apache.flink flink-java ${flink.version} org.apache.flink flink-streaming-java_${scala.binary.version} ${flink.version} com.alibaba.ververica flink-connector-mysql-cdc 1.1.0

其中 flink 版本为1.11.1,
刚开始运行的时候日志打印为
image
是能读到数据的,但是我一旦在mysql表中执行更新操作,插入了一条数据,就会抛出异常
Caused by: org.apache.kafka.connect.errors.ConnectException: Encountered change event for table test.student whose schema isn't known to this connector at io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:230) at io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:207) at io.debezium.connector.mysql.BinlogReader.handleEvent(BinlogReader.java:600) at com.github.shyiko.mysql.binlog.BinaryLogClient.notifyEventListeners(BinaryLogClient.java:1130) at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:978) at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:581) at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:860) at java.lang.Thread.run(Thread.java:748)
PS:Mysql server version 5.7.31,
据我了解,debezium是依赖与Kafka的,是否需要正确配置本地Kafka的connect才能执行成功?

Remove the Unsafe.monitorEnter usage in code to support JDK11

Currently, it will throw the following exception if using JDK11

Caused by: java.lang.NoSuchMethodError: sun.misc.Unsafe.monitorEnter(Ljava/lang/Object;)V
    at com.alibaba.ververica.cdc.debezium.internal.DebeziumChangeConsumer.handleBatch(DebeziumChangeConsumer.java:101)
    at io.debezium.embedded.ConvertingEngineBuilder.lambda$notifying$2(ConvertingEngineBuilder.java:81)
    at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:810)
    at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:170)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

The JDK version

openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)

The reason is that we are using Unsafe.monitorEnter to manually hold the checkpoint lock during database snapshot phase. If we want to remove the Unsafe.monitorEnter, an approach is that, we have to using synchronize keyword to hold the checkpoint lock in SourceFunction.run method.

mysql-cdc 使用flink-sql在程序中创建表,查询时报错

mysql-cdc 使用flink-sql在程序中创建表,查询时报错,似乎是mysql-cdc table不支持update,delete等ddl语句?但是在flink sql-client中运行没有报错,数据可以查询到

`StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment();

EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();

StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(bsEnv, bsSettings);
String createSql = " CREATE TABLE qc_day_setting ( " +
" mn STRING, " +
" param_code STRING, " +
" span_value DECIMAL(12,6), " +
" check_time TIMESTAMP(3), " +
" status INT " +
" ) WITH ( " +
" 'connector' = 'mysql-cdc', " +
" 'hostname' = '', " +
" 'port' = '
', " +
" 'username' = '', " +
" 'password' = '
', " +
" 'database-name' = '', " +
" 'table-name' = '
' " +
" )";

bsTableEnv.executeSql(createSql);

TableResult results = bsTableEnv.executeSql("SELECT * FROM qc_day_setting");

results.print();

bsTableEnv.execute("123");`

Exception in thread "main" org.apache.flink.table.api.TableException: AppendStreamTableSink doesn't support consuming update and delete changes...

flink sql postgres cdc 模式下报错

[ERROR] Could not execute SQL statement. Reason:
org.postgresql.util.PSQLException: 错误: 无法访问文件 "decoderbufs": No such file or directory
请问这是什么原因,貌似是pg的问题,用的pg10 window版本,请问有什么解决办法吗?

table-name能否支持正则匹配

存储在mysql中的用户表,业务那边做了分表,分了100份,比如user_00,user_01,...,user_99,mysql-cdc的table-name能否支持正则匹配,一个flink table同时监测100个mysql表

关键字支持有问题

CREATE TABLE tag_info_binlog (
id STRING
,STATUS STRING
,PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',

Caused by: org.apache.kafka.connect.errors.DataException: STATUS is not a valid field name
at org.apache.kafka.connect.data.Struct.lookupField(Struct.java:254)
at org.apache.kafka.connect.data.Struct.get(Struct.java:74)
at com.alibaba.ververica.cdc.debezium.table.RowDataDebeziumDeserializeSchema.lambda$createRowConverter$508c5858$1(RowDataDebeziumDeserializeSchema.java:348)
at com.alibaba.ververica.cdc.debezium.table.RowDataDebeziumDeserializeSchema.lambda$wrapIntoNullableConverter$7b91dc26$1(RowDataDebeziumDeserializeSchema.java:374)
at com.alibaba.ververica.cdc.debezium.table.RowDataDebeziumDeserializeSchema.extractAfterRow(RowDataDebeziumDeserializeSchema.java:122)
at com.alibaba.ververica.cdc.debezium.table.RowDataDebeziumDeserializeSchema.deserialize(RowDataDebeziumDeserializeSchema.java:97)
at com.alibaba.ververica.cdc.debezium.internal.DebeziumChangeConsumer.handleBatch(DebeziumChangeConsumer.java:97)
at io.debezium.embedded.ConvertingEngineBuilder.lambda$notifying$2(ConvertingEngineBuilder.java:81)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:810)
at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:170)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Could not execute SQL statement.

执行插入的时候报错,
错误信息:
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: Could not find any factory for identifier 'mysql-cdc' that implements 'org.apache.flink.table.factories.DynamicTableSinkFactory' in the classpath.

Available factory identifiers are:

blackhole
print

报错AbstractMethodError: org.apache.kafka.connect.json.JsonSerializer.configure

你好,我在执行Usage for DataStream API的demo时报如下错误,请问是什么原因呢?

Caused by: java.lang.AbstractMethodError: org.apache.kafka.connect.json.JsonSerializer.configure(Ljava/util/Map;Z)V
	at org.apache.kafka.connect.json.JsonConverter.configure(JsonConverter.java:300)
	at org.apache.kafka.connect.json.JsonConverter.configure(JsonConverter.java:311)
	at io.debezium.embedded.EmbeddedEngine.<init>(EmbeddedEngine.java:582)
	at io.debezium.embedded.EmbeddedEngine.<init>(EmbeddedEngine.java:79)
	at io.debezium.embedded.EmbeddedEngine$BuilderImpl.build(EmbeddedEngine.java:300)
	at io.debezium.embedded.EmbeddedEngine$BuilderImpl.build(EmbeddedEngine.java:216)
	at io.debezium.embedded.ConvertingEngineBuilder.build(ConvertingEngineBuilder.java:139)
	at com.alibaba.ververica.cdc.debezium.DebeziumSourceFunction.run(DebeziumSourceFunction.java:299)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:201)

mysql-cdc读取数据后通过jdbc写入postgresql报错

报错如下:

[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: Could not find any factory for identifier 'jdbc' that implements 'org.apache.flink.table.factories.DynamicTableSourceFactory' in the classpath.
Available factory identifiers are:
datagen
mysql-cdc
postgres-cdc

MySQL建表如下:

create table test.dot(
id varchar(100),
dotno varchar(100),
orders int4,
channelname varchar(90) NULL,
etl_time timestamp NULL DEFAULT now(),
PRIMARY KEY (id)
)ENGINE=InnoDB DEFAULT 
CHARSET=utf8;
;

postgresql建表如下:

create table test.dot(
id varchar(100),
dotno varchar(100),
orders int4,
channelname varchar(90),
etl_time timestamp NULL DEFAULT now(),
PRIMARY KEY (id)
)
DISTRIBUTED BY (id);

flink建表如下:


create table dot_mysql(
id STRING,
dotno STRING,
orders int,
channelname STRING,
etl_time timestamp
)WITH (
  'connector' = 'mysql-cdc',
  'hostname' = '******',
  'port' = '3307',
  'username' = '***',
  'password' = '******',
  'database-name' = 'test',
  'table-name' = 'dot'
);

create table dot(
id varchar(100),
dotno varchar(100),
orders int,
channelname varchar(90),
etl_time timestamp,
PRIMARY KEY(id) NOT ENFORCED
)
WITH (
    'connector' = 'jdbc',
    'url' = 'jdbc:postgresql://************/******',
    'driver' = 'org.postgresql.Driver',
    'username' = '*****',
    'password' = '*****',
    'table-name' = 'dot'
);

第一次接触flink sql,然后在flink sql查dot表的时候报上述错误,请问在哪里添加jdbc的驱动呢?我在flink_home/lib下放了postgresql的驱动jar也没有作用。求大佬指点迷津~

请问CDC 如何同步DDL语句

请问 CDC 如何同步DDL语句
我的需求是通过 Flink 同步mysql 表,表结构发生变化新增也需要同步过去,请问有什么方案可以做到

请教个问题:这个connector可以在生产环境用吗? 主要是关于Debezium的快照,FlinkDatabaseHistory是否对机器的内存有大小要求?

假如在生产环境中,mysql中的单表比较大,比如10亿条,并且在不断增长。
FlinkDatabaseHistory中使用了ConcurrentLinkedQueue records来保存快照数据,
当快照数据较大时,是不是有跑不起来的风险或对机器有特殊要求?

`public class FlinkDatabaseHistory extends AbstractDatabaseHistory {

public static final String DATABASE_HISTORY_INSTANCE_NAME = "database.history.instance.name";

/**
 * We will synchronize the records into Flink's state during snapshot.
 * We have to use a global variable to communicate with Flink's source function,
 * because Debezium will construct the instance of {@link DatabaseHistory} itself.
 * Maybe we can improve this in the future.
 *
 * <p>NOTE: we just use Flink's state as a durable persistent storage as a replacement of
 * {@link FileDatabaseHistory} and {@link KafkaDatabaseHistory}. It doesn't need to guarantee
 * the exactly-once semantic for the history records. The history records shouldn't be super
 * large, because we only monitor the schema changes for one single table.
 *
 * @see com.alibaba.ververica.cdc.debezium.DebeziumSourceFunction#snapshotState(FunctionSnapshotContext)
 */
public static final Map<String, ConcurrentLinkedQueue<HistoryRecord>> ALL_RECORDS = new HashMap<>();

private ConcurrentLinkedQueue<HistoryRecord> records;
private String instanceName;

   ………………

}
`

在使用MySQL CDC Connector创建表时不支持debezium.snapshot.mode配置怎么回事

请问:在使用MySQL CDC Connector创建表时不支持debezium.snapshot.mode配置怎么回事?
是还需要配置其他什么吗

String ddl = "CREATE TABLE datagen ( \n" +
" id INT\n" +
") WITH ( \n" +
" 'connector' = 'mysql-cdc',\n" +
" 'hostname' = '', \n" +
" 'port' = '3306', \n" +
" 'username' = '
',\n" +
" 'password' = '*****',\n" +
" 'database-name' = '****',\n" +
" 'debezium.snapshot.locking.mode' = 'schema_only',\n" +
" 'table-name' = 'mysql_flink_source'\n" +
")";
tEnv.executeSql(ddl);

异常信息:
Exception in thread "main" org.apache.flink.table.api.ValidationException: Unable to create a source for reading table 'default_catalog.default_database.datagen'.

Table options are:

'connector'='mysql-cdc'
'database-name'=''
'debezium.snapshot.locking.mode'='schema_only'
'hostname'='
'
'password'=''
'port'='3306'
'table-name'='mysql_flink_source'
'username'='
'
at org.apache.flink.table.factories.FactoryUtil.createTableSource(FactoryUtil.java:125)
at org.apache.flink.table.planner.plan.schema.CatalogSourceTable.buildTableScan(CatalogSourceTable.scala:135)
at org.apache.flink.table.planner.plan.schema.CatalogSourceTable.toRel(CatalogSourceTable.scala:78)
at org.apache.calcite.sql2rel.SqlToRelConverter.toRel(SqlToRelConverter.java:3492)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2415)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2102)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2051)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:661)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:642)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3345)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:568)
at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:164)
at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:151)
at org.apache.flink.table.planner.operations.SqlToOperationConverter.toQueryOperation(SqlToOperationConverter.java:774)
at org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlQuery(SqlToOperationConverter.java:746)
at org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:236)
at org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlInsert(SqlToOperationConverter.java:525)
at org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:202)
at org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:684)
at cn.sprucetec.realtime.driver.TestMysqlCDC.main(TestMysqlCDC.java:70)
Caused by: org.apache.flink.table.api.ValidationException: Unsupported options found for connector 'mysql-cdc'.

Unsupported options:

debezium.snapshot.locking.mode

Supported options:

connector
database-name
hostname
password
port
property-version
server-id
table-name
username
at org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validate(FactoryUtil.java:487)
at com.alibaba.ververica.cdc.connectors.mysql.table.MySQLTableSourceFactory.createDynamicTableSource(MySQLTableSourceFactory.java:81)
at org.apache.flink.table.factories.FactoryUtil.createTableSource(FactoryUtil.java:122)
... 20 more

table-name正则匹配问题

set table-name to user_* to monitor all the user_ prefix tables
去查sink表,没有数据。
检查taskmanager 日志,发现:
2020-08-31 14:22:06,854 INFO io.debezium.connector.mysql.SnapshotReader [] - 'dashboard.user_00' is not added among known tables
2020-08-31 14:22:06,854 INFO io.debezium.connector.mysql.SnapshotReader [] - 'dashboard.user_00' is filtered out of capturing
2020-08-31 14:22:06,854 INFO io.debezium.connector.mysql.SnapshotReader [] - 'dashboard.user_01' is not added among known tables
2020-08-31 14:22:06,854 INFO io.debezium.connector.mysql.SnapshotReader [] - 'dashboard.user_01' is filtered out of capturing

将'table-name' = 'user_[0-9][0-9]',数据可以正常sink到表

实时CDC时 The binlog event does not contain expected number of columns

读取历史数据正常,但是插入新数据时,会报如下
Caused by: org.apache.kafka.connect.errors.ConnectException: The binlog event does not contain expected number of columns; the internal schema representation is probably out of sync with the real database schema, or the binlog contains events recorded with binlog_row_image other than FULL or the table in question is an NDB table
说columnCount 不一致,但是我没有alter table struct。

kafka changelog json 数据没写进去。

按照https://github.com/ververica/flink-cdc-connectors/wiki/%E4%B8%AD%E6%96%87%E6%95%99%E7%A8%8B 方法,执行到
第7步
--Flink SQL
CREATE TABLE kafka_gmv (
day_str STRING,
gmv DECIMAL(10, 5)
) WITH (
'connector' = 'kafka',
'topic' = 'kafka_gmv',
'scan.startup.mode' = 'earliest-offset',
'properties.bootstrap.servers' = 'localhost:9092',
'format' = 'changelog-json'
);

INSERT INTO kafka_gmv
SELECT DATE_FORMAT(order_date, 'yyyy-MM-dd') as day_str, SUM(price) as gmv
FROM orders
WHERE order_status = true
GROUP BY DATE_FORMAT(order_date, 'yyyy-MM-dd');

-- 读取 Kafka 的 changelog 数据,观察 materialize 后的结果
SELECT * FROM kafka_gmv;

kafka中topic也创建了,但是一直没有消息进去。但是单独执行
SELECT DATE_FORMAT(order_date, 'yyyy-MM-dd') as day_str, SUM(price) as gmv
FROM orders
WHERE order_status = true
GROUP BY DATE_FORMAT(order_date, 'yyyy-MM-dd');

又是有结果的。

Support DATE_FORMAT(Timestamp,'yyyyMMdd')

when field‘s type is datetime in mysql, everything is ok, but when field‘s type is timestamp, i got this
Caused by: java.lang.IllegalArgumentException: Unable to convert to LocalDateTime from unexpected value '2018-09-27T09:42:12Z' of type java.lang.String

java.lang.LinkageError: loader constraint violation: loader

java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/flink/util/ChildFirstClassLoader) previously initiated loading for a different type with name "javax/ws/rs/core/MediaType"
启动时报上面错误,跟jar加载顺序有关?我其它flink程序都是正常,将flink-sql-connector-mysql-cdc-1.1.0.jar放到lib下后,就这样了

在Flink SQL中,注册mysql表的时候提示 [ERROR] Unknown or invalid SQL statement.

系统环境:macOS 10.14.3
运行环境:使用 https://github.com/ververica/flink-cdc-connectors/wiki/中文教程 提供的docker环境
运行过程: 1.docker-compose up -d 后 ps 确认过所有容器为Up状态
2.按教程分别进去mysql和postgres容器中添加数据,并使用本地Navicat分别连接mysql和postgres,确定可成功连接且看到数据
3.更新本地flink到1.11.1,启动flink集群,进入sql-client

问题现象:注册教程中第一个mysql表的时候提示 [ERROR] Unknown or invalid SQL statement.

Client does not support authentication protocol requested by server; consider upgrading MySQL client

经过调整 MySQL 版本,5.7 正常运行,8.* 有问题。

下面是测试数据:

  1. zeppelin-0.9.0-preview2
  2. flink-1.11.1-bin-scala_2.11
FLINK_HOME/lib/flink-sql-connector-mysql-cdc-1.1.0.jar
FLINK_HOME/lib/flink-sql-connector-postgres-cdc-1.1.0.jar
version: '3'
services:
  mysql:
    image: mysql:8.0.21
    ports:
      - 3306:3306

  postgres:
    image: postgres:9.6.19
    ports:
      - 5432:5432
%flink.ssql

DROP TABLE IF EXISTS products;

CREATE TABLE products (
  id INT,
  name STRING,
  description STRING
) WITH (
  'connector' = 'mysql-cdc',
  'hostname' = 'localhost',
  'port' = '3306',
  'username' = 'root',
  'password' = 'password',
  'database-name' = 'test',
  'table-name' = 'products'
);

DROP TABLE IF EXISTS orders;

CREATE TABLE orders (
  order_id INT,
  order_date TIMESTAMP(0),
  customer_name STRING,
  price DECIMAL(10, 5),
  product_id INT,
  order_status BOOLEAN
) WITH (
  'connector' = 'mysql-cdc',
  'hostname' = 'localhost',
  'port' = '3306',
  'username' = 'root',
  'password' = 'password',
  'database-name' = 'test',
  'table-name' = 'orders'
);

DROP TABLE IF EXISTS shipments;

CREATE TABLE shipments (
  shipment_id INT,
  order_id INT,
  origin STRING,
  destination STRING,
  is_arrived BOOLEAN
) WITH (
  'connector' = 'postgres-cdc',
  'hostname' = 'localhost',
  'port' = '5432',
  'username' = 'root',
  'password' = 'password',
  'database-name' = 'test',
  'schema-name' = 'public',
  'table-name' = 'shipments'
);
%flink.ssql(type=update)

SELECT o.* , p.name, p.description
FROM orders AS o
LEFT JOIN products AS p ON o.product_id = p.id
Caused by: org.apache.kafka.connect.errors.ConnectException: Failed to authenticate to the MySQL database at localhost:3306 with user 'root'
	at io.debezium.connector.mysql.BinlogReader.doStart(BinlogReader.java:441)
	at io.debezium.connector.mysql.AbstractReader.start(AbstractReader.java:116)
	at io.debezium.connector.mysql.ChainedReader.startNextReader(ChainedReader.java:206)
	at io.debezium.connector.mysql.ChainedReader.readerCompletedPolling(ChainedReader.java:158)
	at io.debezium.connector.mysql.AbstractReader.cleanupResources(AbstractReader.java:309)
	at io.debezium.connector.mysql.AbstractReader.poll(AbstractReader.java:288)
	at io.debezium.connector.mysql.ChainedReader.poll(ChainedReader.java:146)
	at io.debezium.connector.mysql.MySqlConnectorTask.doPoll(MySqlConnectorTask.java:443)
	at io.debezium.connector.common.BaseSourceTask.poll(BaseSourceTask.java:131)
	at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:779)
	at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:170)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.github.shyiko.mysql.binlog.network.AuthenticationException: Client does not support authentication protocol requested by server; consider upgrading MySQL client
	at com.github.shyiko.mysql.binlog.BinaryLogClient.authenticate(BinaryLogClient.java:728)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:515)
	at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:860)
	... 1 more

一个任务不同时间添加表,如何让后面添加的表从头开始读取数据,之前添加的表按当前时间读取?

flink 通过api 的方式同步数据:

需求如下:

  1. 第一次同步添加 A表
MySQLSource.<String>builder().tableList("A")
  1. 然后把 flink 任务停了,添加一张B表
MySQLSource.<String>builder().tableList("A,B")
  1. flink 启动 run 添加上 -s checkpoints,
    让 A表从上一次状态开始同步,B 表从头开始同步

目前的现状是: A表可以从上次开始同步,可是B表数据发送变化会把A表和B表全部数据都同步一遍

问题:生产环境下connector可能会对mysql server的造成i/o压力? 是否支持mysql server的主从切换?

问题1:binlog复制问题,这个 conentor (mysql-cdc)在定义源表时,定义几张表,就会建立几个链接去读binlog;感觉有点浪费,作业所需表多了几后,mysql应该i/o压力会比较大?

比如我定义两张表,连的都是同一个mysql server
// 输入表test
CREATE TABLE test ( idINT,nameVARCHAR(255),timeTIMESTAMP(3),status` INT,
PRIMARY KEY(id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '1',
'database-name' = 'ai_ask',
'table-name' = 'test'
);

// 输入表status
CREATE TABLE status (
id INT,
name VARCHAR(255),
PRIMARY KEY(id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '1',
'database-name' = 'ai_task',
'table-name' = 'status'
);

在mysql server侧会产生两个线程连接去对binlog进行i/o,感觉这样扩展性不高?比如在同一个mysql实例里,是不是只需要建一个i/o线程就够了?或者有使用上的建议可以规避这个可能的i/o问题吗,debezium有现成的参数可以合并吗?

2、是否支持mysql server的主从切换?
这个cdc connector 能支持mysql实例主从切换后,通过gtid进行binlog位点转换么?

@wuchong 求教~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.