hudi_demo's Introduction

hudi一键入湖V1.0

日志

2021-03-05
近期挺多人关注我这项目的，一年前写的临时方案
在公司内部我们已经把它完善，支撑着近300张表T+1增量入湖，还有少量表的近实时入湖

接下来我会提炼出通用的代码到这个项目，让这个demo具备执行能力
同时我也会分享hudi实施过程中我们踩过的一些坑
欢迎你的加入

1.全量同步

1.1 数据库 -> spark -> hudi
- 1.1.1 数据库 -> spark
- 1.1.2 注册临时表,注册schema
- 1.1.3 计算同步字段（updatetime,修改时间） max值，开个新库
- 1.1.4 spark -> hudi

2.增量采集

1.1 注册kafka topic
1.2 配置streamsets,并启动
- 1.2.1 配置公共值
- 1.2.2 JDBC
- 1.2.3 Field Type
- 1.2.4 Kafka
1.3 调用定时任务默认1小时一次 kafka_to_hudi.sh

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

daoxingruhai / hudi_demo Goto Github PK