hortonworks / registry Goto Github PK
View Code? Open in Web Editor NEWSchema Registry
License: Apache License 2.0
Schema Registry
License: Apache License 2.0
Many users of Kafka with Avro will be using Confluent Schema Registry, to enable an easier migration path, as not all consumer, producers will be able to be changed and released at the same time.
Registry could expose compatible rest api for apps either producing or consuming still on confluent serdes.
Like wise to add also an option to run with a compatible byte wire protocol, so that apps with the new registry serdes able to talk/understand the confluent protocol.
e.g. schema meta info and version could be referencable by a unique single id, which then this id is sent in the byte[], with the same leading magic / protocol byte
Currently, union types having null as the fist type should be treated as a type with default value being null as mentioned here.
(Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing "null", the "null" is usually listed first, since the default value of such unions is typically null.)
But avro(1.7.x and 1.8.x) implementation does not handle this scenario and it may take a while in getting this fix from avro. Better to address this issue in schemaregistry while computing effective or resultant schema.
Support for HighAvailability for schema registry cluster. Allow multiple instances of schema registry running in the same cluster with
This will add a feature to register schema metadata which can either have multiple versions or have a single version. Currently it is always multiple versions and there are cases where users may want a schema to have only one version of the schema and it should not allow adding more versions.
This issue is raised based on the discussion happened on google-groups conversation.
By default Kafka Avro serializer computes schema name as the topic name. User can always over ride this default behavior by implementing getSchemaKey in KafkaAvroSerializer
To replicate:
Expected result:
Actual result:
We should introduce clientId to SchemaRegistryClient which will at configured interval sends a heartbeat to the registry server along with schema id & version it's using and if it's a producer or consumer. This will help us build an audit log of clients accessing schemas. This will in-turn give indications to Schema Authors to see any potential change to schema might affect which clients.
I tried creating a schema with 'NONE' compatibility but the response always defaulted to 'BACKWARD'
The idea is that some well-behaved schema registry clients do not need/want the SR library to do the serialization and deserialization of the data for them and instead want the schema registry to primarily focus on the publishing/retrieval of the schemas themselves.
Today the schema registry client takes care of the serialization and deserialization of data, and when it does so, it will write out the identifier and version then write out the raw data.
So, for example, the resulting bytes on disk would be
<>
NiFi and protocols like JMS, HTTP, etc.. have facilities to support context (headers) and content (payload). So we don't need/want serializers to write that stuff for us, and we'll handle passing those references around for the objects.
The serde currently stores schema identifiers as a payload prefix. This makes the payload incompatible with standard Avro deserializers.
Since starting with 0.11 Kafka supports message headers, it is now possible to store schema identifiers in the headers and leave the payload unchanged.
Nice to have: the serde should be backward compatible and automatically detect the version of Kafka it is running against and store the identifiers either in headers or in the payload, depending on what's available.
Currently, schema lists get long if too many schemas are added. It will be better to provide pagination support showing 10 or 15 schemas at a time.
In our environment it takes about 30 seconds for the UI to get populated with all the schemas in the registry.
Initially the screen displays "No data found" and then the schemas start to trickle in slowly.
It would be helpful to display some indication that the schemas are still being loaded. It would also be nice to speed up the loading if possible.
Aggregated information for schema metadata including
This is intended to create a admin page on the UI side to register schema group names , instead free-form text like today. Once we do this we can enable rules on schema names , example Kafka group requires suffixing ":v" or ":k" with the schema names. We've lot of users trip over this registration naming scheme.
Also having a predefined schemaGroup name allows users not to make mistakes in registering the schema.
This issue is discussed in #86 here.
Have we thought about option of having 3/4 area for schema text and 1/4 area for the left column about name/type/compatibility etc and description can be two rows resizable textbox?
This is mainly about having the left column with 1/3rd area and schema text with 2/3rd area and description to have resizable text box initially with two rows of space. Currently left column takes half of the space which may not be really needed.
@harshach @shahsank3t any thoughts/opinions?
Currently avro requires users to create individual schema documents and there is no ability to include/import other schema documents. This feature will allow users to include other schemas and refer the types. Multiple schemas can be composed with respective abstractions and they canbe composed in meaningful ways.
#todo add examples
Right now it's not possible to link directly to a schema in the UI.
It would be nice to be able to obtain a link that brings up a specific schema in the UI.
We should provide a Scheduled Listener interface to allow plugins to pull in external meta store schemas into registry. Example , a listener that can be scheduled to pull in any schemas or version changes from confluent's schema registry.
Should have a UI page to show listing as well as adding new serializers & deserializers for added schemas.
Previous versions of Schema Registry allowed me to paste a schema in, which was very nice. Now it requires that a file be uploaded. This means that I have to copy the schema, paste into vi, save, go back to web app, upload, navigate to file... While I can appreciate the desire to upload a file if you already have a file saved off, it is much more of a pain if you don't have the schema already saved in a file. I would like to have the option to either paste the text into the UI or uploading an existing file.
Currently, the POST API returns the new schemaId if the schema was created successfully or old schemaId when a user tries to create a schema with the same name. UI has no way to identify if the schemaId in response is for new or old and hence can not show proper notification message if it's created or not.
Having another API or same API with a proper response will help in showing proper notifications.
To aid migration for user currently using the confluent platform, where producers are using still confluent serdes
or
Where need to integrate with tooling that supports the confluent serdes only atm (until registry adoption is more wide spread).
It would be good to be able to produce a confluent wire compatible protocol, and like wise consume, making the serialiser configurable which protocol to produce, and the consumer able to simply just handle either.
This will need to ensure we check the protocol versions in the byte array, confluent currently uses a leading 0x0 byte for the protocol magic byte, followed by 4 bytes for the int32 id.
This would link with the other confluent compatibility work I've already PR'd, and enable a faster adoption.
SchemaRegistryClient should be able to take clusterUrl which contains sequence of schemaregistry urls separated by , and it should be able to handle failover of the target schema registry instances.
We already have storage manager support for postgres. We need to convert mysql scripts to postgres.
testApis(com.hortonworks.registries.schemaregistry.examples.avro.SampleApplicationTest) Time elapsed: 0.708 sec <<< ERROR!
java.lang.RuntimeException: Jar /serdes-examples.jar could not be loaded
at com.hortonworks.registries.schemaregistry.examples.avro.SampleSchemaRegistryClientApp.runCustomSerDesApi(SampleSchemaRegistryClientApp.java:181)
at com.hortonworks.registries.schemaregistry.examples.avro.SampleApplicationTest.testApis(SampleApplicationTest.java:43)
Currently, there is search API to find schema versions containing fields with a given name. There should be a search API to find schemas for given name or description.
Currently, sources jars are generated for all the profiles. This can be reduced to wherever it is really applicable and this should be removed from default profile as it takes around 15min to run mvn clean install
The link to documentation
http://registry-project.readthedocs.io/en/latest/
currently leads to no documentation.
Currently it seems that the KafkaAvroDeserializer needs to know the version id upfront.
This causes two issues:
Apps that handle messages such as GenericRecord
Apps have the SpecificRecord in the class path/jvm.
Ideally it should have similar to confluents, where if no READER_VERSION is present, it simply uses the same id as the incoming data. (GenericRecord problem)
If specific record is found, it should find the schema from the specific record found on the class path, it should use its schema
Currently this blocks migration from Confluents Schema Registry.
Remove mariadb driver. Update bootstrap-storage script to fetch mysql driver.
As you'll see KIP-82 got adopted and submitted.
This means as of Kafka 0.11, Kafka will have headers.
Kafka Record, the value is simply a byte[] as such delegates the handling of what schema or how to decode that to the consumer, which obviously solutions like schema registry provide.
The Kafka Header record, introduces a String key and a value byte[], following this, as such having support for being able to register the kafka header value types.
It would be great to support schema registry for headers where the schema can be a primitive, int8, int16, int32, int64, float32, float64, boolean, bytes, string or more complex avro like schemas.
The idea would be a subject for lookup could be the topic + header key, or if all values for the same key within an organisation then simply the subject could just be header key.
Is this possible to record with the current schema repo api's? as in is the mapping agnostic and just simply subject = topic + ".key" or subject = topic + ".value" as such we could make subject = topic ".header." + headerKey
_orderByField query parsing may need to be pushed to the service instead of keeping at JDBC storage manager level. Ideally, StorageManager
should not try to parse the query params and figure out orderByField query etc but it should expose find APIs with List of OrderByFields as optional argument.
If user creates an environment by picking HBase/Hive from an HDP cluster added via ambari then HDFS from same cluster should automatically added by UI to the environment. Reason is HBase has hbase.rootdir that uses hdfs-site.xml core-site.xml to connect to HDFS. Without HDFS being present in environment storm topology fails at runtime throwing an exception from HBaseClient that it cannot connect to hdfs.
Lets say, If I want to register a schemaName with person with following schema for Nifi
{ "name": "person", "namespace": "nifi", "type": "record", "fields": [ { "name": "id", "type": "string" }, { "name": "firstName", "type": "string", "aliases": [ "first_name" ] }, { "name": "lastName", "type": "string", "aliases": [ "last_name" ] }, { "name": "email", "type": "string" }, { "name": "gender", "type": "string" }, { "name": "ipAddress", "type": "string", "aliases": [ "ip_address" ] } ] }
Should we allow users to use the same schemaName under a different group.
If I want to use schemaName "person" under schemaGroup "kafka"
{ "name": "person", "namespace": "nifi", "type": "record", "fields": [ { "name": "id", "type": "string" }, { "name": "firstName", "type": "string", "aliases": [ "first_name" ] }, { "name": "lastName", "type": "string", "aliases": [ "last_name" ] }, { "name": "email", "type": "string" }, { "name": "gender", "type": "string" }, }
The above request comes back as success but I don't see new schema getting registered.
cc @satishd
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.