GithubHelp home page GithubHelp logo

flinkcdc's Introduction

基于apache flink和flink-cdc-connectors工程的采集案例

该工程的目的是可以通过sql化的方式,方便的全量、增量一体化采集mysql、oracle等数据库中的数据,而不用安装其他的一些类canal、ogg等采集工具. 由于目前数栈flink版本(version:flink1.12.7)原因,该工程打包后,不可在数栈上使用.

支持的数据库

source端配置和数据类型

Connector Database Driver
mysql-cdc
  • MySQL: 5.6, 5.7, 8.0.x
  • RDS MySQL: 5.6, 5.7, 8.0.x
  • PolarDB MySQL: 5.6, 5.7, 8.0.x
  • Aurora MySQL: 5.6, 5.7, 8.0.x
  • MariaDB: 10.x
  • PolarDB X: 2.0.1
  • JDBC Driver: 8.0.21
    oracle-cdc
  • Oracle: 11, 12, 19
  • Oracle Driver: 19.3.0.0
    sqlserver-cdc
  • Sqlserver: 2012, 2014, 2016, 2017, 2019
  • JDBC Driver: 7.2.2.jre8

    sink端配置和数据类型(目前只支持写kafka)

    Connector Format Data
    kafka debezium-json debezium-json 数据格式
    kafka canal-json canal-json 数据格式
    kafka maxwell-json maxwell-json 数据格式
    kafka changelog-json changelog-json 数据格式:
    {"data":{},"op":"+I"}
    {"data":{},"op":"-U"}
    {"data":{},"op":"+U"}
    {"data":{},"op":"-D"}
    kafka ogg-json ogg-json 数据格式

    使用步骤

    1.将工程下载到本地,编译、打包 img.png

    2.编辑sql文件,代码内容如下:

    -- 定义配置(并行度、容错、状态后段等相关配置),配置可以参考下面链接:
    -- https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/
    -- https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/
    set pipeline.name = mysql-kafka;
    set table.exec.resource.default-parallelism = 1;
    
    -- source端配置和数据类型 参考上面
    CREATE TABLE source
    (
        id   INT,
        name STRING,
        PRIMARY KEY (id) NOT ENFORCED
    ) WITH (
          'connector' = 'mysql-cdc',
          'hostname' = 'localhost',
          'port' = '3306',
          'username' = 'root',
          'password' = 'root',
          'database-name' = 'test',
          'table-name' = 'out_cdc');
    
    -- sink端配置和数据类型 参考上面
    CREATE TABLE sink
    (
        id   INT,
        name STRING
    ) WITH (
          'connector' = 'kafka',
          'topic' = 'chuixue',
          'properties.bootstrap.servers' = 'localhost:9092',
          'format' = 'debezium-json');
    
    -- 执行sql
    insert into sink
    select *
    from source
    

    3.在服务器上执行 找到需要采集的数据源和目的,比如采集mysql到kafka,这选择mysql-kafka-1.13.5-2.2.1.jar运行,执行如下命令即可

    java -cp mysql-kafka-1.13.5-2.2.1.jar org.flinkcdc.core.Main sql文件的路径
    

    后续计划

    1.支持更多数据库采集

    flinkcdc's People

    Contributors

    simenliuxing avatar

    Watchers

     avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.