Comments (22)
Could you give us some more details related to exceptions and logs? Also, if possible (in case it is small enough), would you send us the Avro file you're using along with the schema? Finally, would you also let us know what results you get when running df.show
and matchDF.show
?
Yet, how is matchDF
being created? Maybe should it be df.toAvro( ...
?
from abris.
felip,
I corrected the code, there is only one dataframe df.
-
the DF once loaded from the avro file, shows the full contents with the show command and appears to be loaded fine. I have also tested a few select, filter statements with it to verify it works.
-
I was mistaken kafka does receive the message, when i run the confluent commandline consumer I get:
./kafka-avro-console-consumer --topic exampletopic --zookeeper ip:2181 --from-beginning
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
Processed a total of 1 messages
[2018-05-17 11:42:10,044] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:107)
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 1214396213
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:191)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:218)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:394)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:387)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:65)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:138)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:148)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:140)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:146)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:84)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
[2018-05-17 11:42:10,044] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:107)
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 1214396213
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:191)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:218)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:394)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:387)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:65)
prior to running the producer code there wasn't a schema registered. My understanding is that this is fine, and it will pull the schema once you produce a message. No exeptions are thrown from the REPL.
from abris.
if I am using confluent kafka (and schema reg)
from abris.
It seems that this is the issue for this exception: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 1214396213
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
Would you be able to try the examples as explained here: https://github.com/AbsaOSS/ABRiS? You'll see snippets showing how to use Spark to read from Kafka, so that you can check the results and also examples of how to use the library to perform the schema registration.
from abris.
I deleted the topic, recreated it, and used the commandline to producer a few test messages to the same topic. I again tried the same code.. I
[2018-05-17 12:05:11,519] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:107)
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 1214396213
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:191)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:218)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:394)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:387)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:65)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:138)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:148)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:140)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:146)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:84)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
[2018-05-17 12:05:11,519] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:107)
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 1214396213
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:191)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:218)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:394)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:387)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:65)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:138)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:148)
at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:140)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:146)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:84)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
from abris.
Is this exception being generated by the program or the command line?
from abris.
by the commandline. I went through verified the topic was created. I also generated a json with the avro file's schema, and used that instead of schema inference. The first attempt seems to have an issue with talking to Kafka, but after restarting the kafka cluster it runs normally. When i try to use confluent's command line consumer tool with the library, that's the exception thrown. I have tried using the command line producer and it consumes correctly. I have reread your examples. Can you point me to a simple example that loads a avro file and sends it to a kafka topic. I would love to use your library.
from abris.
Sure thing. Here there's an example of how to write a Dataframe into Kafka: https://github.com/AbsaOSS/ABRiS/blob/master/src/main/scala/za/co/absa/abris/examples/SampleKafkaDataframeWriterApp.scala
Here you'll find an example of how to read it: https://github.com/AbsaOSS/ABRiS/blob/master/src/main/scala/za/co/absa/abris/examples/SampleKafkaAvroFilterApp.scala
Since Confluent enhances the payload with the schema version, there's a separate API for this, which you'll also find in the library.
from abris.
https://github.com/AbsaOSS/ABRiS/blob/master/src/main/scala/za/co/absa/abris/examples/SampleKafkaDataframeWriterApp.scala
Yes this is the one I have been reading.
from abris.
val avroSchema = new SparkToAvroProcessor(df.schema, "test", "test")
val schemaId = SchemaManager.register(avroSchema.getAvroSchema, subject)
val pw = new PrintWriter(new File("/tmp/test.avsc" ))
pw.write(schema.toString)
pw.close
df.toAvro("test","test").write.format("kafka").option("kafka.bootstrap.servers", "kafka01:9092").option("topic","testTopic").save()
So I have it registering to the schema reg, and writing that same schema out to a file, then using it. The same error is present.
from abris.
This is probably related to the fact that Confluent adds the schema id to the payload and the command line consumer is waiting for it. The library does not add this id since it is a Confluent specific construct. Would you please try reading from Kafka using Spark, as in the example, just to be sure?
If this is the case, I can add a .toConfluentAvro(...
to the APi.
from abris.
ok I will just to point out when I run
curl -X GET http://localhost:8081/subjects/Kafka-value/versions/latest
I get the correct schema, but when I do a post, and just
http://localhost:8081/subjects/Kafka-value/
I get a {"error_code":40403,"message":"Schema not found"}
from abris.
https://github.com/confluentinc/schema-registry
from abris.
This is a Schema Registry specific question, but the expected result is actually schema not found. The GET API always expects you to inform the version. Check the spec for the GET method: https://docs.confluent.io/current/schema-registry/docs/api.html
Also, take a look at my previous comment, please.
from abris.
yes I am able to read from the topic, so it appears to be a schema-reg issue.
from abris.
When you say that confluent adds a schema Id to the payload, do you mean the schema id that's sent to the schema reg or are you saying that the schema id is added to the kafka message?
from abris.
The schema id is added to the Kafka message as part of the Avro payload.
You can check line 84 here
and line 120 here
from abris.
I get the same error when I use .fromConfluentAvro as in the example. I do not get this error when I use the commandline producer and .fromConfluentAvro
from abris.
how can I verify that the id is being sent with the kafka message?
from abris.
In the context of the library, there is a fromConfluentAvro( ...
that you can use to retrieve from Confluent Kafka. From an inspection perspective, you can check the method.
from abris.
I get the same error when it tries to retrieve the schema.
from abris.
"it" what?
Also, have you been able to run the examples just changing the parameters?
Finally, can you share the Avro record you're trying to send, along with the schema?
from abris.
Related Issues (20)
- from_avro converts `\uFFFD` to a question mark HOT 1
- schema registry being called with http instead of https HOT 2
- Improve code-coverage & add GH check action HOT 1
- Fix JaCoCo CI for PRs from forked repos HOT 1
- update madrapps/jacoco-report
- Detect different schema versions in batch HOT 5
- Revert pull_request action back HOT 3
- TopicNameStrategy issue HOT 1
- Split GitHub actions for tests and test coverage
- Multiple schemas in one topic example HOT 1
- Spark 3.4 Support HOT 13
- malformed records to topic HOT 2
- foreach batch download by schem id HOT 3
- Container exited with a non-zero exit code 137 | Out of memory HOT 5
- Issues running inside Scala notebook on databricks HOT 1
- Fix tests for Spark 3.5.0
- Fix NoSuchMethodException in Spark 3.5.x
- get key from avro message HOT 3
- Compatibility with Spark 3.5 HOT 3
- Version 6.4.0 failing for Spark 3.5.0 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from abris.