GithubHelp home page GithubHelp logo

sxfad / porter Goto Github PK

View Code? Open in Web Editor NEW
535.0 535.0 171.0 13.57 MB

Porter是一款数据同步中间件,主要用于解决同构/异构数据库之间的表级别数据同步问题。

Home Page: https://open.vbill.cn

Java 99.68% Shell 0.23% Batchfile 0.09%

porter's People

Contributors

himrzhang avatar matieli avatar monsterhx avatar murasakiseifu avatar wszghj avatar zhangkewei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

porter's Issues

porter-boot的运维接口

porter-boot任务执行节点通过提供http服务接口支持任务状态检查

{
"healthLevel": "GREEN",//健康状态
"workMode": "ZOOKEEPER",//工作模式:ZOOKEEPER、STANDALONE
"forceAssign": true,//是否强制启动任务、节点
"workLimit": 1000,//节点最大支持同步任务的数量
"consumeProcess": {
  //各同步任务堆积状态
},
"uploadStatistic": true,//是否上传统计指标
"consumerIdle": {
//各任务同步空闲时间
},
"dnodeSnapshot": {
"healthLevel": "健康级别",
"hostName": "主机名",
"address": "127.0.0.1",
"heartbeat": "最近心跳时间",
"processId": "进程号",
"healthLevelDesc": "",
"nodeId": "节点编号",
"tasks": {
//预分配任务
},
"status": "WORKING"//节点工作状态
},
"workUsed": 5,//当前已分配任务数
"healthLevelDesc": "",
"nodeId": "节点ID",
"status": "WORKING"//节点工作状态
}
     返回异常任务列表,可用于zabbix监控告警

porter下ZKClusterNodeListen监听问题

package cn.vbill.middleware.porter.cluster.zookeeper;
ZKClusterNodeListener监听问题
//下发命令
if (NODE_ORDER_PATTERN.matcher(zkEvent.getPath()).matches()) {
NodeCommandConfig commandConfig = JSONObject.parseObject(zkEvent.getData(), NodeCommandConfig.class);
...
//删除指令
client.delete(zkEvent.getPath());
}
这里监听后,执行删除指令后又继续监听,zkEvent.getData()为null。造成
集群节点监听->/porter/node/1/order/fd41270e-622e-4048-b73a-4b0e1750b0c6,null,OFFLINE
java.lang.NullPointerException: null

是否应改为
//下发命令
if (NODE_ORDER_PATTERN.matcher(zkEvent.getPath()).matches() && zkEvent.isOnline()) {

偶发性同步字段对应失败

20201231174147

在同步任务中其中一个表偶发性的字段对不上。但是查了binlog日志和字段对应的顺序是一致的。但是在同步的时候出现不一致的情况。希望大拿解答下。

如何配置从Mysql到关系数据库的同步

如何配置从Mysql到关系数据库的同步

以下配置文件格式适用配置管理后台"同步管理->高级任务配置(原菜单名:本地任务)->新增"
如果是本地任务配置文件需要增加前缀"porter.task[任务下标,从0开始]"

taskId=任务ID
nodeId=节点1,节点2,节点3
consumer.consumerName=CanalFetch
consumer.converter=canalRow
consumer.source.sourceType=CANAL
consumer.includes=数据库名.表名,数据库名.表名
consumer.source.filter=过滤表名正则
consumer.source.database=数据库
consumer.source.password=密码
consumer.source.address=ip:3306
consumer.source.username=用户名


loader.loaderName=JdbcBatch
loader.source.sourceType=JDBC
loader.source.dbType=可选项:MYSQL、ORACLE
loader.source.url=jdbc:mysql://127.0.0.1:3306/数据库?useUnicode=true&characterEncoding=utf8
loader.source.userName=用户名
loader.source.password=密码
loader.source.maxWait=60000
loader.source.minPoolSize=10
loader.source.maxPoolSize=50
loader.source.initialPoolSize=20
loader.source.connectionErrorRetryAttempts=3
loader.insertOnUpdateError=false

mapper[0].schema=源端schema,目标端数据库schema(mysql就是数据库名)
mapper[0].table=源表名,目标端表名
mapper[0].column.源表字段名=目标端表字段名

mapper[0].forceMatched=false

如何开发自定义数据处理插件

假设我们要将mysql表T_USER同步到目标端Oracle T_USER_2,源端表T_USER表结构与目标端表T_USER_2一致。我们的需求是只保留FLAG字段等于0的用户数据。

需求有了,接下来我们就要实现EventProcessor接口做自定义数据过滤

	package cn.vbill.middleware.porter.plugin;
	public class UserFilter implements cn.vbill.middleware.porter.core.event.s.EventProcessor {
    @Override
    public void process(ETLBucket etlBucket) {
        List<ETLRow> rows = etlBucket.getRows().stream().filter(r -> {
            //第一步 找到表名为T_USER的记录
            boolean tableMatch = r.getFinalTable().equalsIgnoreCase("T_USER");
            if (!tableMatch) return tableMatch;
            //第二步 找到字段FLAG的值不等于0的记录
            boolean columnMatch = r.getColumns().stream().filter(c -> c.getFinalName().equalsIgnoreCase("FLAG")
            && (null == c.getFinalValue() || !c.getFinalValue().equals("0"))).count() > 0;
            return tableMatch && columnMatch;
        }).collect(Collectors.toList());
        //第三步 清除不符合条件的集合
        etlBucket.getRows().removeAll(rows);
    }
}

在任务中指定自定义数据处理插件:

以下配置文件格式适用配置管理后台"同步管理->高级任务配置(原菜单名:本地任务)->新增"
如果是本地任务配置文件需要增加前缀"porter.task[任务下标,从0开始]"

taskId=任务ID
nodeId=节点1,节点2,节点3
consumer.consumerName=CanalFetch
consumer.converter=canalRow
consumer.source.sourceType=CANAL
consumer.source.slaveId=0
consumer.source.address=127.0.0.1:3306
consumer.source.database=数据库
consumer.source.username=账号
consumer.source.password=密码
consumer.source.filter=*.\.t_user
consumer.eventProcessor.className=cn.vbill.middleware.porter.plugin.UserFilter
consumer.eventProcessor.content=/path/UserFilter.class(xxx.jar包,xxx.java类)

loader.loaderName=JdbcBatch #目标端插件
loader.source.sourceType=JDBC
loader.source.dbType=ORACLE
loader.source.url=jdbc:oracle:thin:@//127.0.0.1:1521/oracledb
loader.source.userName=demo
loader.source.password=demo

mapper[0].auto=false
mapper[0].table=T_USER,T_USER_2

不能同步数据的问题

环境:
task配置:
taskId=102
nodeId=7e0ec95b-a870-4bc3-93e7-d57759daa789
consumer.consumerName=JdbcFetch
consumer.source.clientType=JDBCConsume
consumer.source.url=jdbc:mysql://192.168.1.144:3307/ds_data?useUnicode=true&characterEncoding=utf8&useSSL=false
consumer.source.userName=root
consumer.source.password=123456
consumer.source.table.0.table=ds_data.t_test
consumer.source.table.0.incrementColumn=id
consumer.source.table.0.timestampColumn=ts
consumer.source.table.0.timestampColumnCast=unix_timestamp(ts)*1000

loader.loaderName=JdbcMultiThread
#loader.loaderName=JdbcBatch
loader.source.clientType=JDBC
loader.source.url=jdbc:mysql://192.168.1.144:3307/target_data?useUnicode=true&characterEncoding=utf8&useSSL=false
loader.source.userName=root
loader.source.password=123456

mapper[0].schema=ds_data,target_data
mapper[0].table=t_test,t_test

日志:

2019-06-27 17:22:13.217 INFO 12834 --- [TaskWork-[taskId:102]-[consumer:ds_data.t_test]-main] c.v.m.porter.task.worker.TaskWork : 从注册中心拉取任务状态信息[102-ds_data.t_test]
2019-06-27 17:22:13.220 INFO 12834 --- [TaskWork-[taskId:102]-[consumer:ds_data.t_test]-main] c.v.m.porter.task.worker.TaskWork : 启动StageJob[102-ds_data.t_test]
2019-06-27 17:22:13.220 INFO 12834 --- [TaskWork-[taskId:102]-[consumer:ds_data.t_test]-main] c.v.m.p.common.client.AbstractClient : starting
2019-06-27 17:22:13.284 INFO 12834 --- [TaskWork-[taskId:102]-[consumer:ds_data.t_test]-main] c.v.m.p.common.client.AbstractClient : already start!
2019-06-27 17:22:13.286 INFO 12834 --- [TaskWork-[taskId:102]-[consumer:ds_data.t_test]-main] c.v.m.p.common.client.AbstractClient : starting
2019-06-27 17:22:13.287 INFO 12834 --- [TaskWork-[taskId:102]-[consumer:ds_data.t_test]-main] c.v.m.p.common.client.AbstractClient : already start!
2019-06-27 17:22:13.288 INFO 12834 --- [TaskWork-[taskId:102]-[consumer:ds_data.t_test]-main] c.v.m.porter.task.worker.TaskWork : 开始获取任务消费泳道[102-ds_data.t_test]上次同步点
2019-06-27 17:22:13.290 INFO 12834 --- [TaskWork-[taskId:102]-[consumer:ds_data.t_test]-main] c.v.m.porter.task.worker.TaskWork : 获取任务消费泳道[102-ds_data.t_test]上次同步点->,通知SelectJob
2019-06-27 17:22:13.309 INFO 12834 --- [TaskWork-[taskId:102]-[consumer:ds_data.t_test]-main] c.v.m.porter.task.worker.TaskWork : 计算任务消费泳道[102-ds_data.t_test]最终同步点->[{"incrementColumn":"ID","table":"DS_DATA.T_TEST","timestampColumn":"TS"}],通知SelectJob
2019-06-27 17:22:13.313 INFO 12834 --- [TaskWork-[taskId:102]-[consumer:ds_data.t_test]-main] c.v.m.porter.task.worker.TaskWork : 任务信号量[102-ds_data.t_test]:TRANSMIT
2019-06-27 17:22:13.446 INFO 12834 --- [suixingpay-13-JdbcConsumeClient-fetch-pool-0] com.alibaba.druid.pool.DruidDataSource : {dataSource-1} inited
2019-06-27 17:22:13.861 INFO 12834 --- [suixingpay-13-JdbcConsumeClient-fetch-pool-0] c.v.m.p.p.c.jdbc.client.JdbcClient : schema:ds_data,table:t_test,detail:{"columns":[{"name":"name","primaryKey":false,"required":false,"typeCode":12},{"name":"id","primaryKey":true,"required":true,"typeCode":4},{"name":"ts","primaryKey":false,"required":false,"typeCode":93}],"noPrimaryKey":false,"schemaName":"ds_data","tableName":"t_test"}

消息丢失是否有办法监控

1.更新类型的消息丢失,有什么好的监控方式吗。
2.查询最近未发生变化的数据记录,需要人工操作吗。如果不需要是如何“查询最近未发生变化的数据记录”

nodejs:"Error: Cannot find module 'xxxx'"解决方式

打包时遇到如下错误解决方式:
cd porter-ui
npm -p install xxxx(模块名)

> Task :manager:manager-boot:yarn_install
yarn install v1.15.2
[1/4] Resolving packages...
success Already up-to-date.
Done in 0.90s.

> Task :manager:manager-boot:buildPorterUI FAILED
yarn node v1.15.2
internal/modules/cjs/loader.js:584
    throw err;
    ^

Error: Cannot find module 'ora'
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:582:15)
    at Function.Module._load (internal/modules/cjs/loader.js:508:25)
    at Module.require (internal/modules/cjs/loader.js:637:17)
    at require (internal/modules/cjs/helpers.js:22:18)
    at Object.<anonymous> (/Users/zkevin/Documents/Workspaces/suixingpay-porter/porter-ui/builder/build.js:8:13)
    at Module._compile (internal/modules/cjs/loader.js:701:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:712:10)
    at Module.load (internal/modules/cjs/loader.js:600:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:539:12)
    at Function.Module._load (internal/modules/cjs/loader.js:531:3)
error Command failed.
Exit code: 1
Command: /Users/zkevin/Documents/Workspaces/suixingpay-porter/manager/manager-boot/build/node/node-v10.15.3-darwin-x64/bin/node
Arguments: /Users/zkevin/Documents/Workspaces/suixingpay-porter/porter-ui/builder/build.js
info Visit https://yarnpkg.com/en/docs/cli/node for documentation about this command.
Directory: /Users/zkevin/Documents/Workspaces/suixingpay-porter/porter-ui
Output:

gradle builde 编译时找不到build.js

Error: Cannot find module 'C:\Users\root\git\porter\porter-ui\builder\build.js'
at Function.Module._resolveFilename (internal/modules/cjs/loader.js:582:15)
at Function.Module._load (internal/modules/cjs/loader.js:508:25)
at Function.Module.runMain (internal/modules/cjs/loader.js:754:12)
at startup (internal/bootstrap/node.js:283:19)
at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3)

如何配置从Mysql到KAFKA的同步

如何配置从Mysql到kafka的同步

以下配置文件格式适用配置管理后台"同步管理->高级任务配置(原菜单名:本地任务)->新增"
如果是本地任务配置文件需要增加前缀"porter.task[任务下标,从0开始]"

taskId=任务ID
nodeId=节点ID
consumer.consumerName=CanalFetch
consumer.converter=canalRow
consumer.source.sourceType=CANAL
consumer.includes=数据库名.表名,数据库名.表名
consumer.source.filter=过滤表名正则
consumer.source.database=数据库名字
consumer.source.password=密码
consumer.source.address=ip:3306
consumer.source.username=用户名

loader.loaderName=KAFKA_SYNC
loader.source.sourceType=KAFKA_PRODUCE
loader.source.servers=kafka地址 
loader.source.topic=主题
loader.source.oggJson=true(输出格式为ogg json格式)
loader.source.partitionKey.数据库名(oracle schema名).表名=分片字段

standalone模式支持

截止到2.0.2版本,porter通过cn.vbill.middleware.porter.common.cluster.impl.zookeeper.ZookeeperClusterProvider插件支持cluster模式。
基于灵活化部署需求,计划支持单机模式,通过node.cluster.strategy=standalone配置参数激活

DateFormat线程安全问题

package cn.vbill.middleware.porter.task.alert.alerter;
ScanDataAlerter类中定义了一个DateFormat方法。
这样是否会造成线程安全性问题?

AlerterFactory关闭线程池

check方法中:
CyclicBarrier barrier = new CyclicBarrier(stats.size() + 1);
创建了stats.size() + 1屏障
但是只创建线程池,线程数量为3
//创建线程池
int threadSize = 3;
ExecutorService service = Executors.newFixedThreadPool(threadSize);

        for (DTaskStat stat : stats) {
            service.submit(new Runnable() {
                @Override
                public void run() {
                    //执行任务
                    try {
                        alerter.check(dataConsumer, dataLoader, stat, getCheckMeta(work, stat.getSchema(), stat.getTable()), work.getReceivers());
                    } finally {
                        try {
                            barrier.await();
                        } catch (Exception e) {
                            e.printStackTrace();
                        }
                    }
                }
            });
        }

分配告警检查任务到线程池子中,提交3个任务后,barrier.await();就在等待了,该任务没结束,然后线程池不能继续提交剩下的代码,所以导致线程卡在这。

编译yporter-boot 启动时报错,怎么解决

ava.lang.Error: Unresolved compilation problems:
The import org.apache.kafka cannot be ava.lang.Error: Unresolved compilation problems:
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
Producer cannot be resolved to a type
PartitionInfo cannot be resolved to a type
ProducerConfig cannot be resolved to a variable
ProducerConfig cannot be resolved to a variable
ProducerConfig cannot be resolved to a variable
StringSerializer cannot be resolved to a type
ProducerConfig cannot be resolved to a variable
StringSerializer cannot be resolved to a type
ProducerConfig cannot be resolved to a variable
ProducerConfig cannot be resolved to a variable
Producer cannot be resolved to a type
KafkaProducer cannot be resolved to a type
PartitionInfo cannot be resolved to a type
Producer cannot be resolved to a type
Producer cannot be resolved to a type
Producer cannot be resolved to a type
Producer cannot be resolved to a type
Producer cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
Producer cannot be resolved to a type
RecordMetadata cannot be resolved to a type
ProducerRecord cannot be resolved to a type
Producer cannot be resolved to a type
RecordMetadata cannot be resolved to a type
Producer cannot be resolved to a type
Producer cannot be resolved to a type
PartitionInfo cannot be resolved to a type
PartitionInfo cannot be resolved to a type

at cn.vbill.middleware.porter.common.client.impl.KafkaProduceClient.<init>(KafkaProduceClient.java:30)
at cn.vbill.middleware.porter.common.client.AbstractClient.getClient(AbstractClient.java:165)
at cn.vbill.middleware.porter.common.cluster.impl.zookeeper.ZookeeperClusterProvider.initClient(ZookeeperClusterProvider.java:73)
at cn.vbill.middleware.porter.common.cluster.impl.AbstractClusterProvider.initialize(AbstractClusterProvider.java:237)
at cn.vbill.middleware.porter.common.cluster.impl.AbstractClusterProvider.start(AbstractClusterProvider.java:205)
at cn.vbill.middleware.porter.common.cluster.ClusterProviderProxy.initialize(ClusterProviderProxy.java:56)
at cn.vbill.middleware.porter.boot.NodeBootApplication.main(NodeBootApplication.java:112)

2019-05-26 14:32:00.216 INFO 4236 --- [Thread-4] o.s.s.concurrent.ThreadPoolTaskExecutor : Shutting down ExecutorService 'applicationTaskExecutor'resolved
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
The import org.apache.kafka cannot be resolved
Producer cannot be resolved to a type
PartitionInfo cannot be resolved to a type
ProducerConfig cannot be resolved to a variable
ProducerConfig cannot be resolved to a variable
ProducerConfig cannot be resolved to a variable
StringSerializer cannot be resolved to a type
ProducerConfig cannot be resolved to a variable
StringSerializer cannot be resolved to a type
ProducerConfig cannot be resolved to a variable
ProducerConfig cannot be resolved to a variable
Producer cannot be resolved to a type
KafkaProducer cannot be resolved to a type
PartitionInfo cannot be resolved to a type
Producer cannot be resolved to a type
Producer cannot be resolved to a type
Producer cannot be resolved to a type
Producer cannot be resolved to a type
Producer cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
ProducerRecord cannot be resolved to a type
Producer cannot be resolved to a type
RecordMetadata cannot be resolved to a type
ProducerRecord cannot be resolved to a type
Producer cannot be resolved to a type
RecordMetadata cannot be resolved to a type
Producer cannot be resolved to a type
Producer cannot be resolved to a type
PartitionInfo cannot be resolved to a type
PartitionInfo cannot be resolved to a type

at cn.vbill.middleware.porter.common.client.impl.KafkaProduceClient.<init>(KafkaProduceClient.java:30)
at cn.vbill.middleware.porter.common.client.AbstractClient.getClient(AbstractClient.java:165)
at cn.vbill.middleware.porter.common.cluster.impl.zookeeper.ZookeeperClusterProvider.initClient(ZookeeperClusterProvider.java:73)
at cn.vbill.middleware.porter.common.cluster.impl.AbstractClusterProvider.initialize(AbstractClusterProvider.java:237)
at cn.vbill.middleware.porter.common.cluster.impl.AbstractClusterProvider.start(AbstractClusterProvider.java:205)
at cn.vbill.middleware.porter.common.cluster.ClusterProviderProxy.initialize(ClusterProviderProxy.java:56)
at cn.vbill.middleware.porter.boot.NodeBootApplication.main(NodeBootApplication.java:112)

2019-05-26 14:32:00.216 INFO 4236 --- [Thread-4] o.s.s.concurrent.ThreadPoolTaskExecutor : Shutting down ExecutorService 'applicationTaskExecutor'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.