GithubHelp home page GithubHelp logo

hortonworks / registry Goto Github PK

View Code? Open in Web Editor NEW
12.0 203.0 7.0 10.41 MB

Schema Registry

License: Apache License 2.0

Shell 0.53% Java 90.94% HTML 0.27% JavaScript 4.27% CSS 1.27% Python 0.56% PLSQL 0.22% PLpgSQL 0.16% Dockerfile 0.05% Mustache 0.08% Gherkin 1.06% FreeMarker 0.41% Smarty 0.13% EJS 0.05%
schema-registry kafka kinesis flink spark-streaming metadata schemas storm

registry's Introduction

Registry

Registry is a framework to build metadata repositories. As part of Registry, we currently have SchemaRegistry repositories.

Follow @schemaregistry on Twitter for updates on the project.

Documentation

Documentation and tutorials can be found on the Registry docs

Getting Help

Registry users or devs should send a message to Registry Google Group

License

Copyright 2016-2022 Cloudera.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Committers

...and many more!

registry's People

Contributors

acsaki avatar afritz-cloud avatar akatona84 avatar apsaltis avatar arunmahadevan avatar cloudera-releng avatar csivaguru avatar gcsaba2 avatar gergowilder avatar gkomlossi avatar guruchai avatar harshach avatar heartsavior avatar hmcl avatar joylyn avatar kamalcph avatar koccs avatar michaelandrepearce avatar nattilabalint avatar omkreddy avatar parth-brahmbhatt avatar pnagy-cldr avatar priyank5485 avatar ptgoetz avatar raju-saravanan avatar satishd avatar shahsank3t avatar urbandan avatar vesense avatar viktorsomogyi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

registry's Issues

Add Confluent Schema Registry compatible API to enable migration

Many users of Kafka with Avro will be using Confluent Schema Registry, to enable an easier migration path, as not all consumer, producers will be able to be changed and released at the same time.

Registry could expose compatible rest api for apps either producing or consuming still on confluent serdes.
Like wise to add also an option to run with a compatible byte wire protocol, so that apps with the new registry serdes able to talk/understand the confluent protocol.

e.g. schema meta info and version could be referencable by a unique single id, which then this id is sent in the byte[], with the same leading magic / protocol byte

Listeners to pull in external meta store schemas

We should provide a Scheduled Listener interface to allow plugins to pull in external meta store schemas into registry. Example , a listener that can be scheduled to pull in any schemas or version changes from confluent's schema registry.

KafkaAvroDeserializer should not require setting the version id.

Currently it seems that the KafkaAvroDeserializer needs to know the version id upfront.

This causes two issues:

Apps that handle messages such as GenericRecord
Apps have the SpecificRecord in the class path/jvm.

Ideally it should have similar to confluents, where if no READER_VERSION is present, it simply uses the same id as the incoming data. (GenericRecord problem)

If specific record is found, it should find the schema from the specific record found on the class path, it should use its schema

Currently this blocks migration from Confluents Schema Registry.

Add support for optional evolution for schemas.

This will add a feature to register schema metadata which can either have multiple versions or have a single version. Currently it is always multiple versions and there are cases where users may want a schema to have only one version of the schema and it should not allow adding more versions.

Support composition of avro schemas.

Currently avro requires users to create individual schema documents and there is no ability to include/import other schema documents. This feature will allow users to include other schemas and refer the types. Multiple schemas can be composed with respective abstractions and they canbe composed in meaningful ways.

#todo add examples

Create schemaGroup API to register group names

This is intended to create a admin page on the UI side to register schema group names , instead free-form text like today. Once we do this we can enable rules on schema names , example Kafka group requires suffixing ":v" or ":k" with the schema names. We've lot of users trip over this registration naming scheme.
Also having a predefined schemaGroup name allows users not to make mistakes in registering the schema.

Schema Registry needs to allow users to paste a schema

Previous versions of Schema Registry allowed me to paste a schema in, which was very nice. Now it requires that a file be uploaded. This means that I have to copy the schema, paste into vi, save, go back to web app, upload, navigate to file... While I can appreciate the desire to upload a file if you already have a file saved off, it is much more of a pain if you don't have the schema already saved in a file. I would like to have the option to either paste the text into the UI or uploading an existing file.

Schema Registry should allow schema registry clients to handle schema identifier and version tracking on their own

The idea is that some well-behaved schema registry clients do not need/want the SR library to do the serialization and deserialization of the data for them and instead want the schema registry to primarily focus on the publishing/retrieval of the schemas themselves.
Today the schema registry client takes care of the serialization and deserialization of data, and when it does so, it will write out the identifier and version then write out the raw data.
So, for example, the resulting bytes on disk would be
<>
NiFi and protocols like JMS, HTTP, etc.. have facilities to support context (headers) and content (payload). So we don't need/want serializers to write that stuff for us, and we'll handle passing those references around for the objects.

SampleApplicationTest.testApi test failure

testApis(com.hortonworks.registries.schemaregistry.examples.avro.SampleApplicationTest) Time elapsed: 0.708 sec <<< ERROR!
java.lang.RuntimeException: Jar /serdes-examples.jar could not be loaded
at com.hortonworks.registries.schemaregistry.examples.avro.SampleSchemaRegistryClientApp.runCustomSerDesApi(SampleSchemaRegistryClientApp.java:181)
at com.hortonworks.registries.schemaregistry.examples.avro.SampleApplicationTest.testApis(SampleApplicationTest.java:43)

Create an audit log of clients using schemas

We should introduce clientId to SchemaRegistryClient which will at configured interval sends a heartbeat to the registry server along with schema id & version it's using and if it's a producer or consumer. This will help us build an audit log of clients accessing schemas. This will in-turn give indications to Schema Authors to see any potential change to schema might affect which clients.

Add support for Kafka Header Registry

As you'll see KIP-82 got adopted and submitted.

This means as of Kafka 0.11, Kafka will have headers.

Kafka Record, the value is simply a byte[] as such delegates the handling of what schema or how to decode that to the consumer, which obviously solutions like schema registry provide.

The Kafka Header record, introduces a String key and a value byte[], following this, as such having support for being able to register the kafka header value types.

It would be great to support schema registry for headers where the schema can be a primitive, int8, int16, int32, int64, float32, float64, boolean, bytes, string or more complex avro like schemas.

The idea would be a subject for lookup could be the topic + header key, or if all values for the same key within an organisation then simply the subject could just be header key.

Is this possible to record with the current schema repo api's? as in is the mapping agnostic and just simply subject = topic + ".key" or subject = topic + ".value" as such we could make subject = topic ".header." + headerKey

Hdfs service should be mandatory for HBase and Hive

If user creates an environment by picking HBase/Hive from an HDP cluster added via ambari then HDFS from same cluster should automatically added by UI to the environment. Reason is HBase has hbase.rootdir that uses hdfs-site.xml core-site.xml to connect to HDFS. Without HDFS being present in environment storm topology fails at runtime throwing an exception from HBaseClient that it cannot connect to hdfs.

Treat union type having null to be treated with default value as null.

Currently, union types having null as the fist type should be treated as a type with default value being null as mentioned here.

(Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing "null", the "null" is usually listed first, since the default value of such unions is typically null.)

But avro(1.7.x and 1.8.x) implementation does not handle this scenario and it may take a while in getting this fix from avro. Better to address this issue in schemaregistry while computing effective or resultant schema.

_orderByField should have been handled for inmemory storage manager also.

_orderByField query parsing may need to be pushed to the service instead of keeping at JDBC storage manager level. Ideally, StorageManager should not try to parse the query params and figure out orderByField query etc but it should expose find APIs with List of OrderByFields as optional argument.

SchemaGroup and enforcing uniqueness with in the group or across the SchemaRegistry

Lets say, If I want to register a schemaName with person with following schema for Nifi
{ "name": "person", "namespace": "nifi", "type": "record", "fields": [ { "name": "id", "type": "string" }, { "name": "firstName", "type": "string", "aliases": [ "first_name" ] }, { "name": "lastName", "type": "string", "aliases": [ "last_name" ] }, { "name": "email", "type": "string" }, { "name": "gender", "type": "string" }, { "name": "ipAddress", "type": "string", "aliases": [ "ip_address" ] } ] }
Should we allow users to use the same schemaName under a different group.
If I want to use schemaName "person" under schemaGroup "kafka"
{ "name": "person", "namespace": "nifi", "type": "record", "fields": [ { "name": "id", "type": "string" }, { "name": "firstName", "type": "string", "aliases": [ "first_name" ] }, { "name": "lastName", "type": "string", "aliases": [ "last_name" ] }, { "name": "email", "type": "string" }, { "name": "gender", "type": "string" }, }
The above request comes back as success but I don't see new schema getting registered.
cc @satishd

Store schema ID and version in Kafka message headers instead of payload prefix

The serde currently stores schema identifiers as a payload prefix. This makes the payload incompatible with standard Avro deserializers.

Since starting with 0.11 Kafka supports message headers, it is now possible to store schema identifiers in the headers and leave the payload unchanged.

Nice to have: the serde should be backward compatible and automatically detect the version of Kafka it is running against and store the identifiers either in headers or in the payload, depending on what's available.

UI layout of adding schemas

This issue is discussed in #86 here.

Have we thought about option of having 3/4 area for schema text and 1/4 area for the left column about name/type/compatibility etc and description can be two rows resizable textbox?

This is mainly about having the left column with 1/3rd area and schema text with 2/3rd area and description to have resizable text box initially with two rows of space. Currently left column takes half of the space which may not be really needed.

@harshach @shahsank3t any thoughts/opinions?

Add postgres sql scripts

We already have storage manager support for postgres. We need to convert mysql scripts to postgres.

Add support of handling confluent kafka byte wire protocol

To aid migration for user currently using the confluent platform, where producers are using still confluent serdes
or
Where need to integrate with tooling that supports the confluent serdes only atm (until registry adoption is more wide spread).

It would be good to be able to produce a confluent wire compatible protocol, and like wise consume, making the serialiser configurable which protocol to produce, and the consumer able to simply just handle either.

This will need to ensure we check the protocol versions in the byte array, confluent currently uses a leading 0x0 byte for the protocol magic byte, followed by 4 bytes for the int32 id.

This would link with the other confluent compatibility work I've already PR'd, and enable a faster adoption.

Updating schema description in the UI does not work

To replicate:

  • edit an existing schema; the Add Version dialog is displayed
  • change the description and click OK; "Version added successfully" is displayed

Expected result:

  • schema now has new description

Actual result:

  • description has not changed, even after refreshing the page

Schema registry UI is very slow at initial load

In our environment it takes about 30 seconds for the UI to get populated with all the schemas in the registry.

Initially the screen displays "No data found" and then the schemas start to trickle in slowly.

It would be helpful to display some indication that the schemas are still being loaded. It would also be nice to speed up the loading if possible.

HA Support for schema registry.

Support for HighAvailability for schema registry cluster. Allow multiple instances of schema registry running in the same cluster with

  • One node in the cluster would act like a master, writes should be handled only by this instance. This node can also handle read requests.
  • All other nodes can take read requests and write requests would be redirected to the master

Need pagination support

Currently, schema lists get long if too many schemas are added. It will be better to provide pagination support showing 10 or 15 schemas at a time.

Need an API that can return if the schema creation was successful or not

Currently, the POST API returns the new schemaId if the schema was created successfully or old schemaId when a user tries to create a schema with the same name. UI has no way to identify if the schemaId in response is for new or old and hence can not show proper notification message if it's created or not.
Having another API or same API with a proper response will help in showing proper notifications.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.