GithubHelp home page GithubHelp logo

datax's Introduction

Datax-logo

DataX

Leaderboard

DataX 是阿里云 DataWorks数据集成 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS, databend 等各种异构数据源之间高效的数据同步功能。

DataX 商业版本

阿里云DataWorks数据集成是DataX团队在阿里云上的商业化产品,致力于提供复杂网络环境下、丰富的异构数据源之间高速稳定的数据移动能力,以及繁杂业务背景下的数据同步解决方案。目前已经支持云上近3000家客户,单日同步数据超过3万亿条。DataWorks数据集成目前支持离线50+种数据源,可以进行整库迁移、批量上云、增量同步、分库分表等各类同步解决方案。2020年更新实时同步能力,支持10+种数据源的读写任意组合。提供MySQL,Oracle等多种数据源到阿里云MaxCompute,Hologres等大数据引擎的一键全增量同步解决方案。

商业版本参见: https://www.aliyun.com/product/bigdata/ide

Features

DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。

DataX详细介绍

请参考:DataX-Introduction

Quick Start

请点击:Quick Start

Support Data Channels

DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:DataX数据源参考指南

类型 数据源 Reader(读) Writer(写) 文档
RDBMS 关系型数据库 MySQL
Oracle
OceanBase
SQLServer
PostgreSQL
DRDS
Kingbase
通用RDBMS(支持所有关系型数据库)
阿里云数仓数据存储 ODPS
ADB
ADS
OSS
OCS
Hologres
AnalyticDB For PostgreSQL
阿里云中间件 datahub 读 、写
SLS 读 、写
图数据库 阿里云 GDB
Neo4j
NoSQL数据存储 OTS
Hbase0.94
Hbase1.1
Phoenix4.x
Phoenix5.x
MongoDB
Cassandra
数仓数据存储 StarRocks 读 、
ApacheDoris
ClickHouse
Databend
Hive
kudu
selectdb
无结构化数据存储 TxtFile
FTP
HDFS
Elasticsearch
时间序列数据库 OpenTSDB
TSDB
TDengine

阿里云DataWorks数据集成

目前DataX的已有能力已经全部融和进阿里云的数据集成,并且比DataX更加高效、安全,同时数据集成具备DataX不具备的其它高级特性和功能。可以理解为数据集成是DataX的全面升级的商业化用版本,为企业可以提供稳定、可靠、安全的数据传输服务。与DataX相比,数据集成主要有以下几大突出特点:

支持实时同步:

离线同步数据源种类大幅度扩充:

我要开发新的插件

请点击:DataX插件开发宝典

重要版本更新说明

DataX 后续计划月度迭代更新,也欢迎感兴趣的同学提交 Pull requests,月度更新内容会介绍介绍如下。

项目成员

核心Contributions: 言柏 、枕水、秋奇、青砾、一斅、云时

感谢天烬、光戈、祁然、巴真、静行对DataX做出的贡献。

License

This software is free to use under the Apache License Apache license.

请及时提出issue给我们。请前往:DataxIssue

开源版DataX企业用户

Datax-logo

长期招聘 联系邮箱:[email protected]
【JAVA开发职位】
职位名称:JAVA资深开发工程师/专家/高级专家
工作年限 : 2年以上
学历要求 : 本科(如果能力靠谱,这些都不是条件)
期望层级 : P6/P7/P8

岗位描述:
    1. 负责阿里云大数据平台(数加)的开发设计。 
    2. 负责面向政企客户的大数据相关产品开发;
    3. 利用大规模机器学习算法挖掘数据之间的联系,探索数据挖掘技术在实际场景中的产品应用 ;
    4. 一站式大数据开发平台
    5. 大数据任务调度引擎
    6. 任务执行引擎
    7. 任务监控告警
    8. 海量异构数据同步

岗位要求:
    1. 拥有3年以上JAVA Web开发经验;
    2. 熟悉Java的基础技术体系。包括JVM、类装载、线程、并发、IO资源管理、网络;
    3. 熟练使用常用Java技术框架、对新技术框架有敏锐感知能力;深刻理解面向对象、设计原则、封装抽象;
    4. 熟悉HTML/HTML5和JavaScript;熟悉SQL语言;
    5. 执行力强,具有优秀的团队合作精神、敬业精神;
    6. 深刻理解设计模式及应用场景者加分;
    7. 具有较强的问题分析和处理能力、比较强的动手能力,对技术有强烈追求者优先考虑;
    8. 对高并发、高稳定可用性、高性能、大数据处理有过实际项目及产品经验者优先考虑;
    9. 有大数据产品、云产品、中间件技术解决方案者优先考虑。

用户咨询支持:

钉钉群目前暂时受到了一些管控策略影响,建议大家有问题优先在这里提交问题 Issue,DataX研发和社区会定期回答Issue中的问题,知识库丰富后也能帮助到后来的使用者。

datax's People

Contributors

asdf2014 avatar binaryworld avatar caoliang-web avatar carvinhappy avatar cch1996 avatar dependabot[bot] avatar dingbo8128 avatar dingxiaobo avatar fuyouj avatar hantmac avatar heljoyliu avatar hf200012 avatar hffariel avatar hsbcjone avatar huolibo avatar jtchen-study avatar lsbnbdz avatar luckypicklezz avatar mr-kidbk avatar penglin358 avatar sanchouisacat avatar sangshuduo avatar stephenkgu avatar trafalgarluo avatar wenshao avatar wuchase avatar xudaojie avatar yifanzheng avatar yuzhiping avatar zyyang90 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datax's Issues

HdfsWriter写文件报错

经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[HdfsWriter-04], Description:[您配置的文件在写入时出现IO异常.]. - java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2278)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)

  • java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:197)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2278)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
    at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:40)
    at com.alibaba.datax.plugin.writer.hdfswriter.HdfsHelper.textFileStartWrite(HdfsHelper.java:317)
    at com.alibaba.datax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:360)
    at com.alibaba.datax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:56)
    at java.lang.Thread.run(Thread.java:745)

IDEA中无法进行打包?

1.执行成功没有打包成功
2.通过命令行去打包报错
[ERROR] Unknown lifecycle phase ".test.skip=true". You must specify a valid lifecycle phase or a g
gin-prefix>: or :[:]:. Available
validate, initialize, generate-sources, process-sources, generate-resources, process-resources, co
, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, te
st-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integrati
ll, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following ar
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/LifecyclePhaseNotFoundException

maven build不成功

otsstreamreader里面的pom依赖有点问题

 <dependency>
            <groupId>com.aliyun.openservices</groupId>
            <artifactId>tablestore-streamclient</artifactId>
            <version>1.0.0</version>
        </dependency>

不应该是

<dependency>
            <groupId>com.aliyun.openservices</groupId>
            <artifactId>tablestore-streamclient</artifactId>
            <version>1.0.0-SNAPSHOT</version>
        </dependency>

估计阿里内部有自己的snapshot仓库,可是外部已经找不到这个snapshot jar了

本地编译完成执行 命令报错

2018-03-14 09:26:24.390 [job-0] ERROR RetryUtil - Exception when calling callable, 异常Msg:Code:[DBUtilErrorCode-10], Description:[连接数据库失败. 请检查您的 账号、密码、数据库名称、IP、Port或者向 DBA 寻求帮助(注意网络环境).]. - 具体错误信息为:java.sql.SQLException: No suitable driver found for ["jdbc:oracle:thin:@*********"]
com.alibaba.datax.common.exception.DataXException: Code:[DBUtilErrorCode-10], Description:[连接数据库失败. 请检查您的 账号、密码、数据库名称、IP、Port或者向 DBA 寻求帮助(注意网络环境).]. - 具体错误信息为:java.sql.SQLException: No suitable driver found for

可是我的账号密码是对的

为什么不支持复杂数据类型

为什么mongo 和 hdfs 不支持对象和数组对象这样的复杂数据类型,是贵公司没有这样的场景还是实现起来有坑??

hbasereader1.1支持hbase1.0吗

我们这边生产是hbase1.0-cdh5.5 我抽取只取rowkey 任务成功,加一列之后,文件为空,但是都没报错。这个可能是版本问题吗?

elasticsearchwriter不能用

运行报错:
Caused by: java.lang.IllegalArgumentException: Preemptive authentication set without credentials provider
at io.searchbox.client.config.HttpClientConfig$Builder.build(HttpClientConfig.java:301)
at com.alibaba.datax.plugin.writer.elasticsearchwriter.ESClient.createClient(ESClient.java:65)
at com.alibaba.datax.plugin.writer.elasticsearchwriter.ESWriter$Job.prepare(ESWriter.java:49)
at com.alibaba.datax.core.job.JobContainer.prepareJobWriter(JobContainer.java:724)
at com.alibaba.datax.core.job.JobContainer.prepare(JobContainer.java:309)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:115)
... 3 more
本地的elasticsearch环境没有配置auth。
job相关配置如下:
"writer": {
"name": "elasticsearchwriter",
"parameter": {
"endpoint": "http://localhost:9200",
"index": "xxx",
"type": "xxx",
"accessId": "",
"accessKey": "",
"cleanup": true,
"settings": {"index" :{"number_of_shards": 5, "number_of_replicas": 0}},
"discovery": false,
"batchSize": 1000,
"splitter": ",",
……
}

mysqlreader big query OOM

1.fix 423 row in DBUtil.java

stmt.setFetchSize(fetchSize);
if (stmt instanceof com.mysql.jdbc.Statement) {
((com.mysql.jdbc.Statement)stmt).enableStreamingResults();
}

  1. exception:
    at com.mysql.jdbc.MysqlIO.nextRowFast([Lcom/mysql/jdbc/Field;IZIZZZ)Lcom/mysql/jdbc/ResultSetRow; (MysqlIO.java:2114)
    at com.mysql.jdbc.MysqlIO.nextRow([Lcom/mysql/jdbc/Field;IZIZZZLcom/mysql/jdbc/Buffer;)Lcom/mysql/jdbc/ResultSetRow; (MysqlIO.java:1921)
    at com.mysql.jdbc.MysqlIO.readSingleRowSet(JIIZ[Lcom/mysql/jdbc/Field;)Lcom/mysql/jdbc/RowData; (MysqlIO.java:3278)
    at com.mysql.jdbc.MysqlIO.getResultSet(Lcom/mysql/jdbc/StatementImpl;JIIIZLjava/lang/String;Z[Lcom/mysql/jdbc/Field;)Lcom/mysql/jdbc/ResultSetImpl; (MysqlIO.java:462)
    at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(Lcom/mysql/jdbc/StatementImpl;IIIZLjava/lang/String;Lcom/mysql/jdbc/Buffer;ZJ[Lcom/mysql/jdbc/Field;)Lcom/mysql/jdbc/ResultSetImpl; (MysqlIO.java:2997)
    at com.mysql.jdbc.MysqlIO.readAllResults(Lcom/mysql/jdbc/StatementImpl;IIIZLjava/lang/String;Lcom/mysql/jdbc/Buffer;ZJ[Lcom/mysql/jdbc/Field;)Lcom/mysql/jdbc/ResultSetImpl; (MysqlIO.java:2245)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(Lcom/mysql/jdbc/StatementImpl;Ljava/lang/String;Ljava/lang/String;Lcom/mysql/jdbc/Buffer;IIIZLjava/lang/String;[Lcom/mysql/jdbc/Field;)Lcom/mysql/jdbc/ResultSetInternalMethods; (MysqlIO.java:2638)
    at com.mysql.jdbc.ConnectionImpl.execSQL(Lcom/mysql/jdbc/StatementImpl;Ljava/lang/String;ILcom/mysql/jdbc/Buffer;IIZLjava/lang/String;[Lcom/mysql/jdbc/Field;Z)Lcom/mysql/jdbc/ResultSetInternalMethods; (ConnectionImpl.java:2526)
    at com.mysql.jdbc.ConnectionImpl.execSQL(Lcom/mysql/jdbc/StatementImpl;Ljava/lang/String;ILcom/mysql/jdbc/Buffer;IIZLjava/lang/String;[Lcom/mysql/jdbc/Field;)Lcom/mysql/jdbc/ResultSetInternalMethods; (ConnectionImpl.java:2484)
    at com.mysql.jdbc.StatementImpl.executeQuery(Ljava/lang/String;)Ljava/sql/ResultSet; (StatementImpl.java:1446)
    at com.alibaba.datax.plugin.rdbms.util.DBUtil.query(Ljava/sql/Statement;Ljava/lang/String;)Ljava/sql/ResultSet; (DBUtil.java:445)
    at com.alibaba.datax.plugin.rdbms.util.DBUtil.query(Ljava/sql/Connection;Ljava/lang/String;II)Ljava/sql/ResultSet; (DBUtil.java:431)
    at com.alibaba.datax.plugin.rdbms.util.DBUtil.query(Ljava/sql/Connection;Ljava/lang/String;I)Ljava/sql/ResultSet; (DBUtil.java:409)
    at com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader$Task.startRead(Lcom/alibaba/datax/common/util/Configuration;Lcom/alibaba/datax/common/plugin/RecordSender;Lcom/alibaba/datax/common/plugin/TaskPluginCollector;I)V (CommonRdbmsReader.java:197)
    at com.alibaba.datax.plugin.reader.mysqlreader.MysqlReader$Task.startRead(Lcom/alibaba/datax/common/plugin/RecordSender;)V (MysqlReader.java:81)
    at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run()V (ReaderRunner.java:57)
    at java.lang.Thread.run()V (Thread.java:722)

在测试使用ftp读写文件的时候报错!插件[ftpreader,ftpWriter]加载失败,1s后重试

有人遇到过这种情况吗? 别的测试都没有问题.就是在同步ftp文件时候出错
[root@master bin]# python datax.py ../job/testjob.json
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2018-03-27 03:21:35.675 [main] WARN ConfigParser - 插件[ftpreader,ftpWriter]加载失败,1s后重试... Exception:Code:[Framework-12], Description:[DataX插件初始化错误, 该问题通常是由于DataX安装错误引起,请联系您的运维解决 .]. - 插件加载失败,未完成指定插件加载:[ftpWriter, ftpreader]
2018-03-27 03:21:36.704 [main] ERROR Engine -
经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[Framework-12], Description:[DataX插件初始化错误, 该问题通常是由于DataX安装错误引起,请联系您的运维解决 .]. - 插件加载失败,未完成指定插件加载:[ftpWriter, ftpreader]
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26)
at com.alibaba.datax.core.util.ConfigParser.parsePluginConfig(ConfigParser.java:142)
at com.alibaba.datax.core.util.ConfigParser.parse(ConfigParser.java:63)
at com.alibaba.datax.core.Engine.entry(Engine.java:137)
at com.alibaba.datax.core.Engine.main(Engine.java:204)

Transformer模块能否让业务以自定义jar包的形式

如题,因为部分场景的transformer处理逻辑内部需要到外部系统获取远程数据,比如通过http或者dubbo rpc等,这种需要依赖外部的jar包,需要充分自定义获取的数据和转换逻辑,请问这种场景能支持么?

SQLServer uniqueidentifier类型错误 这个有遇到么

com.microsoft.sqlserver.jdbc.SQLServerException: 将字符串转换为 uniqueidentifier 时失败。
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216) ~[sqljdbc4-4.0.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515) ~[sqljdbc4-4.0.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:404) ~[sqljdbc4-4.0.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:350) ~[sqljdbc4-4.0.jar:na]
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696) ~[sqljdbc4-4.0.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715) ~[sqljdbc4-4.0.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180) ~[sqljdbc4-4.0.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155) ~[sqljdbc4-4.0.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.execute(SQLServerPreparedStatement.java:332) ~[sqljdbc4-4.0.jar:na]
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.doOneInsert(CommonRdbmsWriter.java:382) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.doBatchInsert(CommonRdbmsWriter.java:362) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.startWriteWithConnection(CommonRdbmsWriter.java:291) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.startWrite(CommonRdbmsWriter.java:319) [plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.sqlserverwriter.SqlServerWriter$Task.startWrite(SqlServerWriter.java:81) [sqlserverwriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:56) [datax-core-0.0.1-SNAPSHOT.jar:na]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_131]

{"exception":"将字符串转换为 uniqueidentifier 时失败。","record":[{"byteSize":36,"index":0,"rawData":"000001F3-BA23-4524-AE1B-AF1ABC33BB72","type":"STRING"},{"byteSize":8,"index":1,"rawData":1332550993733,"type":"DATE"},{"byteSize":1,"index":2,"rawData":0,"type":"LONG"},{"byteSize":3,"index":3,"rawData":"柴永祁","type":"STRING"},{"byteSize":0,"index":4,"type":"DATE"},{"byteSize":0,"index":5,"type":"STRING"},{"byteSize":1,"index":6,"rawData":"1","type":"STRING"},{"byteSize":0,"index":7,"type":"STRING"},{"byteSize":0,"index":8,"type":"STRING"},{"byteSize":0,"index":9,"type":"STRING"},{"byteSize":0,"index":10,"type":"STRING"},{"byteSize":12,"index":11,"rawData":"K00000574226","type":"STRING"},{"byteSize":0,"index":12,"type":"STRING"},{"byteSize":1,"index":13,"rawData":0,"type":"LONG"},{"byteSize":1,"index":14,"rawData":"2","type":"STRING"},{"byteSize":6,"index":15,"rawData":"HTERRO","type":"STRING"},{"byteSize":8,"index":16,"rawData":1496974830000,"type":"DATE"},{"byteSize":0,"index":17,"type":"STRING"},{"byteSize":8,"index":18,"rawData":1496974794000,"type":"DATE"},{"byteSize":1,"index":19,"rawData":0,"type":"LONG"},{"byteSize":1,"index":20,"rawData":0,"type":"LONG"},{"byteSize":1,"index":21,"rawData":0,"type":"LONG"},{"byteSize":1,"index":22,"rawData":0,"type":"LONG"},{"byteSize":1,"index":23,"rawData":0,"type":"LONG"},{"byteSize":1,"index":24,"rawData":0,"type":"LONG"},{"byteSize":1,"index":25,"rawData":0,"type":"LONG"},{"byteSize":1,"index":26,"rawData":0,"type":"LONG"},{"byteSize":1,"index":27,"rawData":0,"type":"LONG"},{"byteSize":1,"index":28,"rawData":7,"type":"LONG"},{"byteSize":25,"index":29,"rawData":"[email protected]","type":"STRING"},{"byteSize":25,"index":30,"rawData":"[email protected]","type":"STRING"},{"byteSize":1,"index":31,"rawData":0,"type":"LONG"},{"byteSize":1,"index":32,"rawData":0,"type":"LONG"},{"byteSize":1,"index":33,"rawData":0,"type":"LONG"},{"byteSize":0,"index":34,"type":"DATE"},{"byteSize":0,"index":35,"type":"STRING"},{"byteSize":1,"index":36,"rawData":0,"type":"LONG"},{"byteSize":19,"index":37,"rawData":"1970-01-01 00:00:00","type":"STRING"},{"byteSize":1,"index":38,"rawData":0,"type":"LONG"}],"type":"writer"}

无公网IP数据同步:两端都处于各自子网

数据读写两端位于不同的子网,例如,从A子网的mysql同步数据集到B网的oracle。
DataX就没法使用了吧,他的reader和wirte插件都是本地JDBC远程连接,这样DataX就没法部署
我只是简单浏览了下,官方介绍,不足之处请指正。
还是可以这样做:

数据源1---->代理1----->reader插件1---->DataX--->write插件2---->代理2----->数据源2
A子网 公网 B子网

在两个同步的数据源出部署代理,采集数据到Datax相应的插件,插件在Framework管理下同样导入写入数据源

使用datax如何通过时间戳进行增量数据的捞取?为何运行一段时间后老是会漏数据?

基于时间戳字段利用datax来捞取增量数据,我每次都是以脚本执行起始时刻作为下一次捞取增量数据的依据?至于漏数据,目前能想到的有以下两种可能原因:

  1. 数据库节点和datax节点的时间没有校准,存在误差。
  2. mysql数据库的主从同步延迟因素没有考虑进来。
    除了上述因素,如果还有其他因素可能导致这种情况,还请不吝赐教。
    让时间窗口存在一定叠加(部分数据会存在重复捞取),不知能否解决这个问题?

Error occured on my first compile

[ERROR] Failed to execute goal on project otsstreamreader: Could not resolve dependencies for project com.alibaba.datax:otsstreamreader:jar:0.0.1-SNAPSHOT: Could not find artifact com.aliyun.openservices:tablestore-streamclient:jar:1.0.0-SNAPSHOT -> [Help 1]
[ERROR]

After clone this project. I used "mvn -U clean package assembly:assembly -Dmaven.test.skip=true" to compile it . but ERROR came out.

运行脚步失败,是我的打开方式有问题吗?

[root@master bin]# python datax.py ../job/testjob.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

2018-03-27 03:36:51.264 [main] WARN ConfigParser - 插件[ftpreader,ftpWriter]加载失败,1s后重试... Exception:Code:[Framework-12], Description:[DataX插件初始化错误, 该问题通常是由于DataX安装错误引起,请联系您的运维解决 .]. - 插件加载失败,未完成指定插件加载:[ftpWriter, ftpreader]
2018-03-27 03:36:52.305 [main] ERROR Engine -

经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[Framework-12], Description:[DataX插件初始化错误, 该问题通常是由于DataX安装错误引起,请联系您的运维解决 .]. - 插件加载失败,未完成指定插件加载:[ftpWriter, ftpreader]
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26)
at com.alibaba.datax.core.util.ConfigParser.parsePluginConfig(ConfigParser.java:142)
at com.alibaba.datax.core.util.ConfigParser.parse(ConfigParser.java:63)
at com.alibaba.datax.core.Engine.entry(Engine.java:137)
at com.alibaba.datax.core.Engine.main(Engine.java:204)

MongoDB 读取数据写入hdfs,丢失字段

MongoDB 的列是参差不齐的,使用mongodbReader读取,写到hdfs上,当碰到不存在的字段就被丢掉了,也没有使用分隔符占位,导致写出的数据impala读取时会字段不全、列对应不上

elasticsearch 插件报错 org.apache.http.client.ClientProtocolException

配置:

{
"writer": {
      "parameter": {
        "dynamic": false,
        "indexType": "default",
        "cleanup": false,
        "accessKey": "elastic",
        "index": "ordertest",
        "settings": {
          "index": {
            "number_of_replicas": "1",
            "number_of_shards": "5"
          }
        },
        "column": [
          {
            "name": "id",
            "type": "long"
          },
          {
            "name": "order_id",
            "type": "long"
          },
          {
            "name": "merchant_id",
            "type": "long"
          },
          {
            "name": "order_name",
            "type": "text"
          },
          {
            "name": "gmt_create",
            "type": "date"
          },
          {
            "name": "gmt_modify",
            "type": "date"
          }
        ],
        "batchSize": 1000,
        "accessId": "elastic",
        "discovery": false,
        "endpoint": "es-cn-4590i74hi000awxw9.elasticsearch.aliyuncs.com",
        "splitter": ","
      },
      "plugin": "elasticsearch"
    }
}

异常信息:

经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[ESWriter-03], Description:[mappings错误.].  - org.apache.http.client.ClientProtocolException
	at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26)
	at com.alibaba.datax.plugin.writer.elasticsearchwriter.ESWriter$Job.prepare(ESWriter.java:76)
	at com.alibaba.datax.core.job.JobContainer.prepareJobWriter(JobContainer.java:947)
	at com.alibaba.datax.core.job.JobContainer.prepare(JobContainer.java:391)
	at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:165)
	at com.alibaba.datax.core.Engine.start(Engine.java:96)
	at com.alibaba.datax.core.Engine.entry(Engine.java:184)
	at com.alibaba.datax.core.Engine.main(Engine.java:217)

plugin.json文件不存在,请检查您的配置文件

问题一:
com.alibaba.datax.common.exception.DataXException: Code:[Common-00], Describe:[您提供的配置文件存在错误信息,请检查您的作业配置 .] - 配置信息错误,您提供的配置文件[/Users/Test/Desktop/DataX-master/core/target/datax/plugin/writer/streamwriter/plugin.json]不存在. 请检查您的配置文件.

当我在"/Users/Test/Desktop/DataX-master/core/target/datax/plugin/writer/streamwriter"目录下,创建plugin.json信息后,又回出现如问题二所述的问题。


问题二
`
经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[Framework-12], Description:[DataX插件初始化错误, 该问题通常是由于DataX安装错误引起,请联系您的运维解决 .]. - 插件加载失败,存在重复插件:/Users/Test/Desktop/DataX-master/core/target/datax/plugin/writer/streamwriter/plugin.json

`
说是存在重复插件plugin.json!!!!!


在这里,需要特别说明的是,/Users/Test/Desktop/DataX-master/core/target/datax/文件夹下的plugin文件夹及其里面的子文件夹、.json文件均为我个人创建。
之所以会创建这些文件,是因为在我想启动DataX项目时,报出一些错误,之后我按照提示,一步步的建立plugin文件夹及其里面的子文件夹、文件。

Percentage 0.00% 一直为0%

INFO - 2018-02-27 16:11:00.485 [job-26950] INFO StandAloneJobContainerCommunicator - Total 3589088 records, 61014496 bytes | Speed 224.40KB/s, 13516 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 283.231s | All Task WaitReaderTime 133.380s | Percentage 0.00%

执行过程中, Percentage的值一直都是 0.00% ,直到完成才是100% ,请问这应该是一个问题吧?

maven编译不通过 有大佬帮忙看下是什么原因吗?

Downloading: http://maven.aliyun.com/nexus/content/groups/public/eigenbase/eigenbase-properties/1.1.4/eigenbase-properties-1.1.4.pom
[WARNING] The POM for eigenbase:eigenbase-properties:jar:1.1.4 is missing, no dependency information available
Downloading: http://maven.aliyun.com/nexus/content/groups/public/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom
[WARNING] The POM for org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde is missing, no dependency information available
Downloading: http://maven.aliyun.com/nexus/content/groups/public/eigenbase/eigenbase-properties/1.1.4/eigenbase-properties-1.1.4.jar
Downloading: http://maven.aliyun.com/nexus/content/groups/public/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.276 s
[INFO] Finished at: 2018-03-27T16:59:14+08:00
[INFO] Final Memory: 14M/240M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hdfsreader: Could not resolve dependencies for project com.alibaba.datax:hdfsreader:jar:0.0.1-SNAPSHOT: The following artifacts could not be resolved: org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde, eigenbase:eigenbase-properties:jar:1.1.4: Failure to find org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-net-cn has elapsed or updates are forced -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

Process finished with exit code 1

mvn build 问题

ERROR] Failed to execute goal on project odpsreader: Could not resolve dependencies for project com.alibaba.datax:odpsreader:jar:0.0.1-SNAPSHOT: Could not find artifact com.alibaba.external:bouncycastle.provider:jar:1.38-jdk15 in alimaven (http://maven.aliyun.com/nexus/content/groups/public/) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal on project odpsreader: Could not resolve dependencies for project com.alibaba.datax:odpsreader:jar:0.0.1-SNAPSHOT: Could not find artifact com.alibaba.external:bouncycastle.provider:jar:1.38-jdk15 in alimaven (http://maven.aliyun.com/nexus/content/groups/public/)
at org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.getDependencies(LifecycleDependencyResolver.java:221)
at org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.resolveProjectDependencies(LifecycleDependencyResolver.java:127)
at org.apache.maven.lifecycle.internal.MojoExecutor.ensureDependenciesAreResolved(MojoExecutor.java:245)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:199)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.project.DependencyResolutionException: Could not resolve dependencies for project com.alibaba.datax:odpsreader:jar:0.0.1-SNAPSHOT: Could not find artifact com.alibaba.external:bouncycastle.provider:jar:1.38-jdk15 in alimaven (http://maven.aliyun.com/nexus/content/groups/public/)
at org.apache.maven.project.DefaultProjectDependenciesResolver.resolve(DefaultProjectDependenciesResolver.java:211)
at org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.getDependencies(LifecycleDependencyResolver.java:195)
... 23 more
Caused by: org.eclipse.aether.resolution.DependencyResolutionException: Could not find artifact com.alibaba.external:bouncycastle.provider:jar:1.38-jdk15 in alimaven (http://maven.aliyun.com/nexus/content/groups/public/)
at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:384)
at org.apache.maven.project.DefaultProjectDependenciesResolver.resolve(DefaultProjectDependenciesResolver.java:205)
... 24 more
Caused by: org.eclipse.aether.resolution.ArtifactResolutionException: Could not find artifact com.alibaba.external:bouncycastle.provider:jar:1.38-jdk15 in alimaven (http://maven.aliyun.com/nexus/content/groups/public/)
at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:444)
at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:246)
at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:367)
... 25 more
Caused by: org.eclipse.aether.transfer.ArtifactNotFoundException: Could not find artifact com.alibaba.external:bouncycastle.provider:jar:1.38-jdk15 in alimaven (http://maven.aliyun.com/nexus/content/groups/public/)
at org.eclipse.aether.connector.basic.ArtifactTransportListener.transferFailed(ArtifactTransportListener.java:39)
at org.eclipse.aether.connector.basic.BasicRepositoryConnector$TaskRunner.run(BasicRepositoryConnector.java:355)
at org.eclipse.aether.util.concurrency.RunnableErrorForwarder$1.run(RunnableErrorForwarder.java:67)
at org.eclipse.aether.connector.basic.BasicRepositoryConnector$DirectExecutor.execute(BasicRepositoryConnector.java:581)
at org.eclipse.aether.connector.basic.BasicRepositoryConnector.get(BasicRepositoryConnector.java:249)
at org.eclipse.aether.internal.impl.DefaultArtifactResolver.performDownloads(DefaultArtifactResolver.java:520)
at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:421)
... 27 more

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.