GithubHelp home page GithubHelp logo

global19-atlassian-net / kafka-connect-cosmosdb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/kafka-connect-cosmosdb

0.0 1.0 0.0 3.01 MB

Kafka Connect connector for Azure Cosmos DB

License: MIT License

Java 92.60% PowerShell 0.75% Dockerfile 1.44% Shell 5.21%

kafka-connect-cosmosdb's Introduction

Kafka Connect for Azure Cosmos DB

Open Source Love svg2 PRs Welcome Maintenance

Java CI with Maven Release

Introduction

This project is pre-production. File any issues / feature requests / questions etc. you may have in the Issues for this repo.

This project provides connectors for Kafka Connect to read from and write data to Azure Cosmos DB.

Supported Data Formats

The sink & source connectors are configurable in order to support:

Format Name Description
JSON (Pain) JSON record structure without any attached schema.
JSON with Schema JSON record structure with explicit schema information to ensure the data matches the expected format.
AVRO A row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.

Since key and value settings, including the format and serialization, can be independently configured in Kafka, it is possible to work with different data formats for records' keys and values respectively.

To cater for this there is converter configuration for both key.converter and value.converter.

Converter Configuration Examples

JSON (Plain)

  • If you need to use JSON without Schema Registry for Connect data, you can use the JsonConverter supported with Kafka. The example below shows the JsonConverter key and value properties that are added to the configuration:

    key.converter=org.apache.kafka.connect.json.JsonConverter
    key.converter.schemas.enable=false
    value.converter=org.apache.kafka.connect.json.JsonConverter
    value.converter.schemas.enable=false

JSON with Schema

  • When the properties key.converter.schemas.enable and value.converter.schemas.enable are set to true, the key or value is not treated as plain JSON, but rather as a composite JSON object containing both an internal schema and the data.

    key.converter=org.apache.kafka.connect.json.JsonConverter
    key.converter.schemas.enable=true
    value.converter=org.apache.kafka.connect.json.JsonConverter
    value.converter.schemas.enable=true
  • The resulting message to Kafka would look like the example below, with schema and payload top-level elements in the JSON:

    {
      "schema": {
        "type": "struct",
        "fields": [
          {
            "type": "int32",
            "optional": false,
            "field": "userid"
          },
          {
            "type": "string",
            "optional": false,
            "field": "name"
          }
        ],
        "optional": false,
        "name": "ksql.users"
      },
      "payload": {
        "userid": 123,
        "name": "user's name"
      }
    }

NOTE: The message written is made up of the schema + payload. Notice the size of the message, as well as the proportion of it that is made up of the payload vs. the schema. This is repeated in every message you write to Kafka. In scenarios like this, you may want to use a serialisation format like JSON Schema or Avro, where the schema is stored separately and the message holds just the payload.

AVRO

  • This connector supports AVRO. To use AVRO you need to configure a AvroConverter so that Kafka Connect knows how to work with AVRO data. This connector has been tested with the AvroConverter supplied by Confluent, under Apache 2.0 license, but another custom converter can be used in its place instead if you prefer.

  • Kafka deals with keys and values independently, you need to specify the key.converter and value.converter properties as required in the worker configuration.

  • An additional converter property must also be added, when using AvroConverter, that provides the URL for the Schema Registry.

The example below shows the AvroConverter key and value properties that are added to the configuration:

key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://schema-registry:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://schema-registry:8081

Choosing a conversion format

  • If you're configuring a Source connector and

    • If you want Kafka Connect to incldue plain JSON in the message it writes to Kafka, you'd set JSON (Plain) configuration.
    • If you want Kafka Connect to include the schema in the message it writes to Kafka, you’d set JSON with Schema configuration.
    • If you want Kafka Connect to include AVRO format in the message it writes to Kafka, you'd set AVRO configuration.
  • If you’re consuming JSON data from a Kafka topic in to a Sink connector, you need to understand how the JSON was serialised when it was written to the Kafka topic:

    • If it was with JSON serialiser, then you need to set Kafka Connect to use the JSON converter (org.apache.kafka.connect.json.JsonConverter).
      • If the JSON data was written as a plain string, then you need to determine if the data includes a nested schema/payload. If it does,then you would set, JSON with Schema configuration.
      • However, if you’re consuming JSON data and it doesn’t have the schema/payload construct, then you must tell Kafka Connect not to look for a schema by setting schemas.enable=false as per JSON (Plain) configuration.
    • If it was with AVRO serialiser, then you need to set Kafka Connect to use the AVRO converter (io.confluent.connect.avro.AvroConverter) as per AVRO configuration.

Common Errors

Some of the common errors you can get if you misconfigure the converters in Kafka Connect. These will show up in the sinks you configure for Kafka Connect, as it’s this point at which you’ll be trying to deserialize the messages already stored in Kafka. Converter problems tend not to occur in sources because it’s in the source that the serialization is set.

Converter Configuration Erros

Configuration

Common Configuration Properties

The Sink and Source connectors share the following common configuration properties

Name Type Description Required/Optional
connect.cosmosdb.connection.endpoint uri Cosmos DB endpoint URI string Required
connect.cosmosdb.master.key string The Cosmos DB primary key that the sink connects with Required
connect.cosmosdb.databasename string The name of the Cosmos DB database the sink writes to Required
connect.cosmosdb.containers.topicmap string Mapping between Kafka Topics and Cosmos DB Containers, formatted using CSV as shown: topic#container,topic2#container2 Required

For Sink connector specific configuration, please refer to the Sink Connector Documentation

For Source connector specific configuration, please refer to the Source Connector Documentation

Project Setup

Please refer Developer Walkthrough and Project Setup for initial setup instructions.

Resources

kafka-connect-cosmosdb's People

Contributors

allantargino avatar atxryan avatar brandynbrown avatar danielsemedo avatar dansemedo avatar denisw avatar dependabot[bot] avatar helayoty avatar jcocchi avatar jyotsnaravikumar avatar kev-ms avatar lesseradmin avatar marcelaldecoa avatar marcelopio avatar microsoftopensource avatar msftgits avatar ryancrawcour avatar sivamu avatar skarri-microsoft avatar vaijanathb avatar vianeyja avatar yevster avatar yorek avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.