GithubHelp home page GithubHelp logo

lekaijun / tapdata Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tapdata/tapdata

0.0 0.0 0.0 13.57 MB

Tapdata Live Data Platform Project

License: Other

Shell 0.23% JavaScript 0.15% Python 2.24% Java 97.29% HTML 0.05% Jupyter Notebook 0.01% Dockerfile 0.03%

tapdata's Introduction

build status

English Readme

中文简要文档地址

What is Tapdata?

Tapdata is a live data platform designed to connect data silos and provide fresh data to the downstream operational applications & operational analytics.

Env Prepare

  1. Please make sure you have Docker installed on your machine before you get starated.
  2. Currently we only tested on linux OS(No specific flavor requirement).
  3. clone repo: git clone https://github.com/tapdata/tapdata.git && cd tapdata

Quick Use

This is the easiest way to experiment Tapdata:

run bash build/quick-use.sh will pull docker image and start an all-inone container

Quick Build

Alternatively, you may build the project using following command:

  1. run bash build/quick-dev.sh will build a docker image from source and start a all in one container

If you want to build in docker, please install docker and set build/env.sh tapdata_build_env to "docker" (default)

If you want to build in local, please install:

  1. JDK
  2. maven set build/env.sh tapdata_build_env to "local"

run bash build/clean.sh If you want to clean build target

Quick Steps

If everything is ok, now you should be in a terminal window, follow next steps, have a try!

Create New DataSource

# 1. mongodb
source = DataSource("mongodb", "$name").uri("$uri")

# 2. mysql
source = DataSource("mysql", "$name").host("$host").port($port).username("$username").port($port).db("$db")

# 3. pg
source = DataSource("postgres", "$name").host("$host").port($port).username("$username").port($port).db("$db").schema("$schema").logPluginName("wal2json")

# save will check all config, and load schema from source
source.save()

Preview Table

  1. use $name will switch datasource context
  2. show tables will display all tables in current datasource
  3. desc $table_name will display table schema

Migrate A Table

migrate job is real time default

# 1. create a pipeline
p = Pipeline("$name")

# 2. use readFrom and writeTo describe a migrate job
p.readFrom("$source_name.$table").write("$sink_name.$table")

# 3. start job
p.start()

# 4. monitor job
p.monitor()
p.logs()

# 5. stop job
p.stop()

Migrate Aable With UDF

No record schema change support in current version, will support in few days

If you want to change record schema, please use mongodb as sink

# 1. define a python function
def fn(record):
    record["x"] = 1
    return record

# 2. using processor between source and target
p.readFrom(...).processor(fn).writeTo(...)

Migrate Multi Tables

migrate job is real time default

# 1. create a pipeline
p = Pipeline("$name")

# 2. use readFrom and writeTo describe a migrate job, multi table relation syntax is a little different
source = Source("$datasource_name", ["table1", "table2"...])
source = Source("$datasource_name", table_re="xxx.*")

# 3. using prefix/suffix add table prefix/suffix
p.readFrom(source).writeTo("$datasource_name", prefix="", suffix="")

# 4. start job
p.start()

Manager

  1. show datasources will show all data sources, you can use delete datasource $name delete it if now job using it
  2. show jobs will show all jobs and it's stats
  3. logs job $job_name [limit=20] [t=5] [tail=True] will show job log
  4. monitor job $job_name will keep show job metrics
  5. status job $job_name will show job status(running/stopped...)

License

Tapdata uses multiple licenses.

The license for a particular work is defined with following prioritized rules:

  • License directly present in the file
  • LICENSE file in the same directory as the work
  • First LICENSE found when exploring parent directories up to the project top level directory

Defaults to Server Side Public License. For PDK Connectors, the license is Apache V2.

Join now

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.