GithubHelp home page GithubHelp logo

gordonmurray / apache_flink_and_paimon Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 1.0 183.27 MB

Trying out Apache Paimon with Apache Flink using Docker Compose

Home Page: https://gordonmurray.com/data/2023/11/05/trying-out-apache-paimon-with-flink.html

License: MIT License

s3 orc apache-flink paimon

apache_flink_and_paimon's Introduction

Using Apache Flink to write to s3 using Apache Paimon

Use the docker-compose.yml file to create a MariaDB database and an Apache Flink Job and Task manager to work with.

Make sure to add your AWS credentials to the docker-compose.yml file first, so that it will be able to write to s3.

docker compose up -d

Once the containers are running, submit the job to Flink using:

docker exec -it jobmanager /opt/flink/bin/sql-client.sh embedded -f /opt/flink/job.sql

If you open your browser to http://localhost:8081 you'll see the Flink UI with your job running, saving the data from the database to s3 using the Paimon format

The data in s3 will be in a folder named after the database, in OCR format. ( Optimized Row Columnar (ORC) file format )

aws s3 ls my-s3-bucket/paimon/my_database.db/myproducts/

Gave the following output

PRE bucket-0/
PRE manifest/
PRE schema/
PRE snapshot

The schema it stored for the products table on s3 was in JSON format:

{
  "id" : 0,
  "fields" : [ {
    "id" : 0,
    "name" : "id",
    "type" : "INT NOT NULL"
  }, {
    "id" : 1,
    "name" : "name",
    "type" : "STRING"
  }, {
    "id" : 2,
    "name" : "price",
    "type" : "DECIMAL(10, 2)"
  } ],
  "highestFieldId" : 2,
  "partitionKeys" : [ ],
  "primaryKeys" : [ "id" ],
  "options" : { },
  "timeMillis" : 1696694538055
}

apache_flink_and_paimon's People

Contributors

gordonmurray avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.