Comments (12)
Hi @lhotari
I update perf tool in
https://github.com/semistone/pulsar/tree/debug_ssues_22601
it only include one commit which modify PerformanceProducer.java to include
big payload ( -bp 5 means 5 percent big payload)
and BatcherBuilder.KEY_BASED (-kb)
my consumer command is
bin/pulsar-perf consume persistent://my-tenant/my-namespace/my-topic-1 --auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationTls --auth-params '{"tlsCertFile":"conf/superuser.cer","tlsKeyFile":"conf/superuser.key.pem"}' -n 10 -sp Latest -ss angus_test --batch-index-ack -st Key_Shared
and producer command is
bin/pulsar-perf produce persistent://my-tenant/my-namespace/my-topic-1 -r 6000 -kb -s 2000 -bp 5 -bm 1000 -b 1 -mk random --auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationTls --auth-params '{"tlsCertFile":"conf/superuser.cer","tlsKeyFile":"conf/superuser.key.pem"}'
that error happen when
Batch builder is KEY_BASE
with random event key
and few big payload (in my environment 3% could reproduce 10% will crash producer)
in my test
I use normal payload 2K bytes , big payload 20K bytes
if I removed any above conditions, that error will either reduced or disappear.
when it happen it will have WARN message in pulsar-broker.log
2024-05-09T01:12:35,246+0000 [pulsar-io-3-31] WARN org.apache.pulsar.broker.service.ServerCnx - [/100.96.184.253:39710] Got exception java.lang.IllegalArgumentException: Invalid unknonwn tag type: 6
or
2024-05-09T01:12:35,260+0000 [broker-topic-workers-OrderedExecutor-15-0] ERROR org.apache.pulsar.common.protocol.Commands - [persistent://budas/budas-preprod-internal/bud_stream_input-partition-1] [angus_test] Failed to peek sticky key from the message metadata
java.lang.IllegalArgumentException: Invalid unknonwn tag type: 4
unfortunately I can't preproduce in docker, I guess docker standalone is different from my pulsar cluster.
my pulsar cluster is
almost default config but with TLS auth in broker/bookkeeper/zookeeper.
Please help to check it, if have any problem to reproduce this issue in your environment,
then I will try to simplify my pulsar cluster to reproduce it.
Thanks
from pulsar.
@lhotari
we do many tests
current broker setting is
maxMessageSize=5242880
and producer setting (small batch message and big max bytes)
batchingMaxMessages: 500
batchingMaxBytes: 3145728
batchingMaxPublishDelayMicros: 500
payload
98% < 3K bytes
2% between 10-20K bytes
then it will show that error and publish throughput isn't good.
but if we change to
batchingMaxMessages: 1000
batchingMaxBytes: 3145728
batchingMaxPublishDelayMicros: 1000
and filter all data bigger than 15K bytes
then that error disappear
so we decide to create
one batch publisher to publish data < 15000 bytes
and one chunk publisher to publish data >= 15000 bytes
then it worked and performance is also better than previous test
we still don't known why
but at least we have workaround solution now.
I don't how which batch producer configuration could fix this errors.
if you have any suggestions, we will still try it .
and we also publish in multi thread programs,
seems like it's not directly related to loading but related to payload size
but maybe if the publish rate is low, it's more difficult to reproduce .
we also tried to reproduce by perf tool but it didn't always happen.
thanks
from pulsar.
I tried to upgrade to bookkeeper 4.17.0
but still have the same issue :(
[pulsar@cockroach308 lib]$ ls |grep bookkeeper
org.apache.bookkeeper-bookkeeper-benchmark-4.17.0.jar
org.apache.bookkeeper-bookkeeper-common-4.17.0.jar
org.apache.bookkeeper-bookkeeper-common-allocator-4.17.0.jar
org.apache.bookkeeper-bookkeeper-perf-4.17.0.jar
org.apache.bookkeeper-bookkeeper-proto-4.17.0.jar
org.apache.bookkeeper-bookkeeper-server-4.17.0.jar
org.apache.bookkeeper-bookkeeper-slogger-api-4.17.0.jar
org.apache.bookkeeper-bookkeeper-slogger-slf4j-4.17.0.jar
org.apache.bookkeeper-bookkeeper-tools-4.17.0.jar
org.apache.bookkeeper-bookkeeper-tools-framework-4.17.0.jar
org.apache.bookkeeper-bookkeeper-tools-ledger-4.17.0.jar
org.apache.bookkeeper-circe-checksum-4.17.0.jar
org.apache.bookkeeper-cpu-affinity-4.17.0.jar
org.apache.bookkeeper.http-http-server-4.17.0.jar
org.apache.bookkeeper.http-vertx-http-server-4.17.0.jar
org.apache.bookkeeper-native-io-4.17.0.jar
org.apache.bookkeeper-statelib-4.17.0.jar
org.apache.bookkeeper.stats-bookkeeper-stats-api-4.17.0.jar
org.apache.bookkeeper.stats-codahale-metrics-provider-4.17.0.jar
org.apache.bookkeeper.stats-otel-metrics-provider-4.17.0.jar
org.apache.bookkeeper.stats-prometheus-metrics-provider-4.17.0.jar
org.apache.bookkeeper-stream-storage-cli-4.17.0.jar
org.apache.bookkeeper-stream-storage-java-client-4.17.0.jar
org.apache.bookkeeper-stream-storage-server-4.17.0.jar
org.apache.bookkeeper-stream-storage-service-api-4.17.0.jar
org.apache.bookkeeper-stream-storage-service-impl-4.17.0.jar
org.apache.bookkeeper.tests-stream-storage-tests-common-4.17.0.jar
org.apache.pulsar-pulsar-package-bookkeeper-storage-3.2.2.jar
from pulsar.
we also tried to reproduce by perf tool but it didn't always happen.
@semistone Please share a way how to reproduce it. It's not a problem if it's not always consistent. Fixing this issue will be a lot easier if there's at least some way to reproduce.
I will try to reproduce in perf tool.
from pulsar.
I almost could reproduce by perf tool
when very few payload > 30K bytes. others are 3K bytes
then
error happen when messageKeyGenerationMode=random
if without messageKeyGenerationMode, then error disappear
I guess in batch mode, payload size have some restriction.
let me confirm again tomorrow to make sure I didn't make any stupid mistake during my test.
from pulsar.
@semistone Just wondering if this could be related to apache/bookkeeper#4196?
There might are also other recent ByteBuf retain/release fixes such as #22393 .
In Bookkeeper, there's apache/bookkeeper#4289 pending release and apache/bookkeeper#4293 is pending review.
from pulsar.
we still try to compare what's the different between our producer and perf tool
will feedback later once we have any conclusion.
from pulsar.
we also tried to reproduce by perf tool but it didn't always happen.
@semistone Please share a way how to reproduce it. It's not a problem if it's not always consistent. Fixing this issue will be a lot easier if there's at least some way to reproduce.
from pulsar.
I tried to upgrade to bookkeeper 4.17.0
but still have the same issue :(
@semistone Thanks for testing this.
from pulsar.
@semistone since you have some way to reproduce this in your own tests, would you be able to test if this can be reproduced with dispatcherDispatchMessagesInSubscriptionThread=false
?
Lines 435 to 436 in 80d4675
It impacts this code:
from pulsar.
@semistone since you have some way to reproduce this in your own tests, would you be able to test if this can be reproduced with
dispatcherDispatchMessagesInSubscriptionThread=false
?
I tested, still the same
from pulsar.
@semistone since you have some way to reproduce this in your own tests, would you be able to test if this can be reproduced with
dispatcherDispatchMessagesInSubscriptionThread=false
?I tested, still the same
@semistone Thanks for testing. That tells that it's not related to switching the thread in
from pulsar.
Related Issues (20)
- [Tests][Bug] There are multiple memory leaks that cause Pulsar CI to fail with OOME HOT 2
- [Bug] Pulsar concurrent containers has concurrency issues HOT 1
- [Bug] ReadonlyManagedLedger initialization does not fill in the properties
- [Bug] "Topic does not have schema to check" on topic-level geo replication
- [Bug] Consumer doesn't consider batch max number of messages configuration and return less number of messages
- We need a size limit for cache of single managedLeger HOT 2
- [Bug] Postgresql has a vulnerability CVE-2024-1597
- [Bug] Infinispan Client Hotrod has a vulnerability CVE-2023-4586
- Pulsar Admin: expose the compression algoritm for "peek messages" HOT 1
- [Bug] Amazon Ion-Java has a vulnerability CVE-2024-21634
- [Bug] After updating broker from 3.0.3 to 3.0.4, reader cannot receive new messages from compacted topics. HOT 5
- [Bug] ReaderBuilder changes config state during reader creation failure due to server connection failure
- pulsar-shell CTRL-C during shell operation should not print stacktrace
- [Bug] consumers stops receiving new messages due to invalid blockedConsumerOnUnackedMsgs state HOT 9
- [Bug] Unable to run tests in the pulsar-broker-auth-sasl module
- [Bug] Ctrl-C does not terminating the current shell operation in pulsar shell HOT 2
- Add parameters for KeyStore TLS in Pulsar test client
- [Doc] Ack cumulative is not possible in key_shared subscriptions
- [Bug] Authentication using OpenID Connect assumes alg element under the OpenID Keys URL to be required. It is optional per specs. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pulsar.