GithubHelp home page GithubHelp logo

apache / pulsar Goto Github PK

View Code? Open in Web Editor NEW
13.8K 401.0 3.5K 206.48 MB

Apache Pulsar - distributed pub-sub messaging system

Home Page: https://pulsar.apache.org/

License: Apache License 2.0

Java 98.28% Shell 0.48% Python 0.72% HTML 0.01% JavaScript 0.01% HCL 0.04% Go 0.37% Dockerfile 0.04% Batchfile 0.04% Lua 0.01% Groovy 0.01%
pulsar pubsub messaging streaming queuing event-streaming

pulsar's Introduction

logo

docker pull contributors last commit release downloads

Pulsar is a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.

Learn more about Pulsar at https://pulsar.apache.org

Main features

  • Horizontally scalable (Millions of independent topics and millions of messages published per second)
  • Strong ordering and consistency guarantees
  • Low latency durable storage
  • Topic and queue semantics
  • Load balancer
  • Designed for being deployed as a hosted service:
    • Multi-tenant
    • Authentication
    • Authorization
    • Quotas
    • Support mixing very different workloads
    • Optional hardware isolation
  • Keeps track of consumer cursor position
  • REST API for provisioning, admin and stats
  • Geo replication
  • Transparent handling of partitioned topics
  • Transparent batching of messages

Repositories

This repository is the main repository of Apache Pulsar. Pulsar PMC also maintains other repositories for components in the Pulsar ecosystem, including connectors, adapters, and other language clients.

Helm Chart

Ecosystem

Clients

Dashboard & Management Tools

Website

CI/CD

Archived/Halted

Pulsar Runtime Java Version Recommendation

  • pulsar ver > 2.10 and master branch
Components Java Version
Broker 17
Functions / IO 17
CLI 17
Java Client 8 or 11 or 17
  • 2.8 <= pulsar ver <= 2.10
Components Java Version
Broker 11
Functions / IO 11
CLI 8 or 11
Java Client 8 or 11
  • pulsar ver < 2.8
Components Java Version
All 8 or 11

Build Pulsar

Requirements

  • JDK

    Pulsar Version JDK Version
    master and 2.11 + JDK 17
    2.8 / 2.9 / 2.10 JDK 11
    2.7 - JDK 8
  • Maven 3.6.1+

  • zip

Note:

This project includes a Maven Wrapper that can be used instead of a system-installed Maven. Use it by replacing mvn by ./mvnw on Linux and mvnw.cmd on Windows in the commands below.

It's better to use CMD rather than Powershell on Windows. Because maven will activate the windows profile which runs rename-netty-native-libs.cmd.

Build

Compile and install:

$ mvn install -DskipTests

Compile and install individual module

$ mvn -pl module-name (e.g: pulsar-broker) install -DskipTests

Minimal build (This skips most of external connectors and tiered storage handlers)

mvn install -Pcore-modules,-main -DskipTests

Run Unit Tests:

$ mvn test

Run Individual Unit Test:

$ mvn -pl module-name (e.g: pulsar-client) test -Dtest=unit-test-name (e.g: ConsumerBuilderImplTest)

Run Selected Test packages:

$ mvn test -pl module-name (for example, pulsar-broker) -Dinclude=org/apache/pulsar/**/*.java

Start standalone Pulsar service:

$ bin/pulsar standalone

Check https://pulsar.apache.org for documentation and examples.

Build custom docker images

The commands used in the Apache Pulsar release process can be found in the release process documentation.

Here are some general instructions for building custom docker images:

  • Docker images must be built with Java 8 for branch-2.7 or previous branches because of ISSUE-8445.
  • Java 11 is the recommended JDK version in branch-2.8, branch-2.9 and branch-2.10.
  • Java 17 is the recommended JDK version in master.

The following command builds the docker images apachepulsar/pulsar-all:latest and apachepulsar/pulsar:latest:

mvn clean install -DskipTests
# setting DOCKER_CLI_EXPERIMENTAL=enabled is required in some environments with older docker versions
export DOCKER_CLI_EXPERIMENTAL=enabled
mvn package -Pdocker,-main -am -pl docker/pulsar-all -DskipTests

After the images are built, they can be tagged and pushed to your custom repository. Here's an example of a bash script that tags the docker images with the current version and git revision and pushes them to localhost:32000/apachepulsar.

image_repo_and_project=localhost:32000/apachepulsar
pulsar_version=$(mvn initialize help:evaluate -Dexpression=project.version -pl . -q -DforceStdout)
gitrev=$(git rev-parse HEAD | colrm 10)
tag="${pulsar_version}-${gitrev}"
echo "Using tag $tag"
docker tag apachepulsar/pulsar-all:latest ${image_repo_and_project}/pulsar-all:$tag
docker push ${image_repo_and_project}/pulsar-all:$tag
docker tag apachepulsar/pulsar:latest ${image_repo_and_project}/pulsar:$tag
docker push ${image_repo_and_project}/pulsar:$tag

Setting up your IDE

Read https://pulsar.apache.org/contribute/setup-ide for setting up IntelliJ IDEA or Eclipse for developing Pulsar.

Documentation

Note:

For how to make contributions to Pulsar documentation, see Pulsar Documentation Contribution Guide.

Contact

Mailing lists
Name Scope Subscribe Unsubscribe Archives
[email protected] User-related discussions Subscribe Unsubscribe Archives
[email protected] Development-related discussions Subscribe Unsubscribe Archives
Slack

Pulsar slack channel at https://apache-pulsar.slack.com/

You can self-register at https://communityinviter.com/apps/apache-pulsar/apache-pulsar

Security Policy

If you find a security issue with Pulsar then please read the security policy. It is critical to avoid public disclosure.

Reporting a security vulnerability

To report a vulnerability for Pulsar, contact the Apache Security Team. When reporting a vulnerability to [email protected], you can copy your email to [email protected] to send your report to the Apache Pulsar Project Management Committee. This is a private mailing list.

https://github.com/apache/pulsar/security/policy contains more details.

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Crypto Notice

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See The Wassenaar Arrangement for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software: Pulsar uses the SSL library from Bouncy Castle written by http://www.bouncycastle.org.

pulsar's People

Contributors

315157973 avatar anonhxy avatar anonymitaet avatar bewaremypower avatar codelipenghui avatar coderzc avatar congbobo184 avatar eolivelli avatar ivankelly avatar jennifer88huang-zz avatar jerrypeng avatar jiazhai avatar lhotari avatar liangyepianzhou avatar mattisonchao avatar merlimat avatar michaeljmarshall avatar nicoloboschi avatar nodece avatar poorbarcode avatar rdhabalia avatar shoothzj avatar sijie avatar srkukarni avatar technoboy- avatar tisonkun avatar tuteng avatar urfreespace avatar wolfstudy avatar zymap avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pulsar's Issues

Geo Replication Synced or Asynced

In geo replication scenario, user's request will write to multi region and give user ack? or write to local region and give user ack then forward to other regions asynchronously ?

Reset Consumer on Global Topic

Expected behavior

It should be allowed to reset a cursor on a global cluster.
Subscriptions are local to a cluster, I don't see how a cursor reset on a global topic would generate problems.

Actual behavior

com.yahoo.pulsar.client.admin.PulsarAdminException$NotAllowedException: reset cursor not supported for global topic

Partitioned Consumer and Listener Threads

I understand that Pulsar Client could and should definitely be shared among Publisher and Consumer instances to take advantage of Netty's IO threads and share them among all instances.

But there seems to be an issue particularly with PartitionedConsumerImpl because it uses a messageListener to receive messages form it's internal ConsumerImpls.

Listener are executed in a pool of configurable size, but PartitionedConsumerImpl does blocking calls, so this is definitely a problem. If, for some reason, a PartitionedConsumer stops consuming messages, it will block listener threads equal to the amount of partitions it has, or maybe even more.

I don't really have a solution, but we should definitely have a better approach to PartitionedConsumerImpl

[Feature Proposal] client supporting JDK7

Client supporting JDK7

[merlimat]
That would be very tricky. The current Java API itself is built upon JDK8 features like CompletableFuture.
That would mean to do a rewrite of the client library to support JDK 7, but since it was EOLd ~2 years ago, I don't think it would be worth the effort.

As Andrew said above, WebSocket is probably a much easier/quicker way to solve that .

[DongbinNie]
While JDK7 is end of public updates from April 2015, JDK7 is still used widely, 45% in Plumber's report https://plumbr.eu/blog/java/java-version-and-vendor-data-analyzed-2016-edition
It will be helpful for Pulsar's adoption if the client supporting JDK7. Though WebSocket is a solution, but developers always prefer the first-class client.

Intermittent test failure in SimpleProducerConsumerTest.testSharedConsumerAckDifferentConsumer

The test is failing from time to time. It seems to be related to expecting the 2 consumer to get the same amount of messages, but after calling receive() on consumer1, the broker will push more messages to the consumer.

testSharedConsumerAckDifferentConsumer(com.yahoo.pulsar.client.api.SimpleProducerConsumerTest)  Time elapsed: 3.05 sec  <<< FAILURE!
java.lang.AssertionError: null
    at com.yahoo.pulsar.client.api.SimpleProducerConsumerTest.lambda$testSharedConsumerAckDifferentConsumer$28(SimpleProducerConsumerTest.java:916)
    at com.yahoo.pulsar.client.api.SimpleProducerConsumerTest.testSharedConsumerAckDifferentConsumer(SimpleProducerConsumerTest.java:912)

javascript client

Feature request

A simple node javascript client would be awesome...

Advertised broker address

Expected behavior

Connecting through the discovery to the broker in the cluster setup should redirect to the public address of the broker. Preferably there should be an option to set an advertised hostname, this should come together with an option to set a bind host for the broker (however, this might not be necessary).

Actual behavior

Broker advertises the hostname as the address for the client to connect to. On EC2, it results in the client trying to connect to something like ip-xx-xxx-xxx-xxx which is not resolvable from outside of the same region. The error I see on my local machine while trying to publish messages to Pulsar is the following:

2016-09-19 17:45:38,592 - ERROR - [main:CmdProduce@186] - java.net.UnknownHostException: ip-10-239-182-4: unknown error
com.yahoo.pulsar.client.api.PulsarClientException: java.net.UnknownHostException: ip-10-239-182-4: unknown error
    at com.yahoo.pulsar.client.impl.HttpClient$1.onThrowable(HttpClient.java:139)
    at com.ning.http.client.providers.netty.future.NettyResponseFuture.abort(NettyResponseFuture.java:238)
    at com.ning.http.client.providers.netty.request.NettyRequestSender.abort(NettyRequestSender.java:422)
    at com.ning.http.client.providers.netty.request.NettyRequestSender.sendRequestWithNewChannel(NettyRequestSender.java:290)
    at com.ning.http.client.providers.netty.request.NettyRequestSender.sendRequestWithCertainForceConnect(NettyRequestSender.java:142)
    at com.ning.http.client.providers.netty.request.NettyRequestSender.sendRequest(NettyRequestSender.java:117)
    at com.ning.http.client.providers.netty.request.NettyRequestSender.sendNextRequest(NettyRequestSender.java:493)
    at com.ning.http.client.providers.netty.handler.Protocol.exitAfterHandlingRedirect(Protocol.java:189)
    at com.ning.http.client.providers.netty.handler.HttpProtocol.handleHttpResponse(HttpProtocol.java:427)
    at com.ning.http.client.providers.netty.handler.HttpProtocol.handle(HttpProtocol.java:470)
    at com.ning.http.client.providers.netty.handler.Processor.messageReceived(Processor.java:88)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.handler.codec.http.HttpClientCodec.handleUpstream(HttpClientCodec.java:92)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: ip-10-239-182-4: unknown error
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
    at java.net.InetAddress.getAllByName(InetAddress.java:1192)
    at java.net.InetAddress.getAllByName(InetAddress.java:1126)
    at java.net.InetAddress.getByName(InetAddress.java:1076)
    at com.ning.http.client.NameResolver$JdkNameResolver.resolve(NameResolver.java:28)
    at com.ning.http.client.providers.netty.request.NettyRequestSender.remoteAddress(NettyRequestSender.java:358)
    at com.ning.http.client.providers.netty.request.NettyRequestSender.connect(NettyRequestSender.java:369)
    at com.ning.http.client.providers.netty.request.NettyRequestSender.sendRequestWithNewChannel(NettyRequestSender.java:283)
    ... 37 more

I could set the hostname on the machine to a public IP address / public hostname but it is an ugly workaround. I do not have any DNS other than EC2 defaults available right now.

Steps to reproduce

Launch the cluster setup in EC2 (without any DNS other than EC2 defaults) and try producing messages.

System configuration

Pulsar version: 1.14

If the suggestions sound okay, I can prepare a PR.

Reading position manipulation

Reading position manipulation, that the reading position can be adjusted backward or forward.

[Andrew]
You can skip and rewind via the admin client.

[merlimat]
In general we've tried to stay away from that because it clashes a bit with pub/sub model. What would be the use case for these manipulations? And what the interactions would look like?

[DongbinNie]
There's various circumstance needing to replay the messages, such as re-building the search indexes on restarting.
I didn't noticed it can be skipped and rewinded via the admin client, com.yahoo.pulsar.client.admin.PersistentTopics.resetCursor(String, String, long), maybe We can also reset the cursor based on message id.

下载1.14-release版本启动报错

Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class io.netty.channel.epoll.IovArray
at io.netty.channel.epoll.EpollEventLoop.(EpollEventLoop.java:62)
at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:114)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
at io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49)
at io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:93)
at io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:80)
at io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:61)
at org.apache.bookkeeper.client.BookKeeper.getDefaultEventLoopGroup(BookKeeper.java:936)
at org.apache.bookkeeper.client.BookKeeper.(BookKeeper.java:243)
at com.yahoo.pulsar.broker.BookKeeperClientFactoryImpl.create(BookKeeperClientFactoryImpl.java:79)
at com.yahoo.pulsar.broker.ManagedLedgerClientFactory.(ManagedLedgerClientFactory.java:38)
at com.yahoo.pulsar.broker.PulsarService.start(PulsarService.java:231)
at com.yahoo.pulsar.PulsarStandaloneStarter.start(PulsarStandaloneStarter.java:143)
at com.yahoo.pulsar.PulsarStandaloneStarter.main(PulsarStandaloneStarter.java:175)

Cluster setup - ZooKeeper errors

Expected behavior

Bookie starts and confirms it is running.

Actual behavior

Service dies with the following error:

2016-09-17 20:55:37,605 - WARN  [GarbageCollectorThread-1-1:GarbageCollectorThread@392] - Exception in gc
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /ledgers/LAYOUT
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
    at org.apache.bookkeeper.meta.LedgerLayout.store(LedgerLayout.java:146)
    at org.apache.bookkeeper.meta.LedgerManagerFactory.createNewLMFactory(LedgerManagerFactory.java:214)
    at org.apache.bookkeeper.meta.LedgerManagerFactory.newLedgerManagerFactory(LedgerManagerFactory.java:126)
    at org.apache.bookkeeper.bookie.GarbageCollectorThread$LedgerManagerProviderImpl.getLedgerManager(GarbageCollectorThread.java:635)
    at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:346)
    at org.apache.bookkeeper.util.SafeRunnable.run(SafeRunnable.java:31)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
    at java.lang.Thread.run(Thread.java:745)

My local zookeeper tells me:

[zk: localhost:2181(CONNECTED) 0] ls /
[namespace, admin, loadbalance, zookeeper, ledgers, managed-ledgers]
[zk: localhost:2181(CONNECTED) 1] ls /ledgers
[available, cookies, LAYOUT]

Steps to reproduce

Setup a clustered pulsar installation as described in the README. Start the bookkeeper.

System configuration

Pulsar version: 1.14

switch active consumer but failed

Expected behavior

i shutdown a topic's consumer and restart another consumer of the same topic but with error

Actual behavior

2016-09-18 10:23:33,429 - ERROR - [bookkeeper-ml-workers-33-1:SafeRunnable@33] - Unexpected throwable caught

java.lang.UnsupportedOperationException

at java.util.concurrent.CopyOnWriteArrayList$COWIterator.set(CopyOnWriteArrayList.java:1185)

at java.util.Collections.sort(Collections.java:234)

at com.yahoo.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.pickAndScheduleActiveConsumer(PersistentDispatcherSingleActiveConsumer.java:75)

at com.yahoo.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.addConsumer(PersistentDispatcherSingleActiveConsumer.java:106)

at com.yahoo.pulsar.broker.service.persistent.PersistentSubscription.addConsumer(PersistentSubscription.java:118)

at com.yahoo.pulsar.broker.service.persistent.PersistentTopic$2.openCursorComplete(PersistentTopic.java:334)

at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl$6.operationComplete(ManagedLedgerImpl.java:566)

at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$2.operationComplete(ManagedCursorImpl.java:275)

at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$24.operationComplete(ManagedCursorImpl.java:1844)

at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$24.operationComplete(ManagedCursorImpl.java:1834)

at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$66(MetaStoreImplZookeeper.java:254)

at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper$$Lambda$60/2049457780.run(Unknown Source)

at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:27)

at org.apache.bookkeeper.util.SafeRunnable.run(SafeRunnable.java:31)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)

at java.lang.Thread.run(Thread.java:745)

Steps to reproduce

start a consumer and subscribe topic a
shutdown consumer
start another consumer and subscribe topic a

System configuration

Pulsar version: 1.14

Intermittent test failure on BacklogQuotaManagerTest.testAheadProducerOnHoldTimeout

Seen intermittently in travis builds. Unfortunately the exception stack trace is not being printed fully.

com.yahoo.pulsar.client.admin.internal.BrokerStatsImpl$$EnhancerByMockitoWithCGLIB$$7141231b@443bf7aa
Tests run: 361, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 825.674 sec <<< FAILURE! - in TestSuite
testAheadProducerOnHoldTimeout(com.yahoo.pulsar.broker.service.BacklogQuotaManagerTest)  Time elapsed: 8.691 sec  <<< FAILURE!
io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
    at com.yahoo.pulsar.broker.service.BacklogQuotaManagerTest.testAheadProducerOnHoldTimeout(BacklogQuotaManagerTest.java:504)

delayed message other than delivered right away

Delayed message, it's better to can have different delayed time for subscriptions on the same topic.

[merlimat]
This one is interesting. I'd like to see how is feasible to do it in a highly scalable way. Perhaps this might be a good candidate to be implemented as a recipe (a ready-made solution built on top of the client library), rather than directly in the server.

[DongbinNie]
Agree, it can be implemented in the client side to alleviate server's burden.

SimpleDateFormat is not thread safe

Retention 0mb and BacklogQuota 0mb

It's not documented what setting a retention of 0mb actually does, neither with BacklogQuota.

By the way, both of this should actually mean no limit to be consistent with other settings.
If i'm not mistaken, setting those to 0mb will make brokers delete all data when they check.

In our case we want a retention based on time and not on size, so we actually want unlimited retention, and probably backlog quota too.

Intermittent test failure in ManagedLedgerBkTest.testConcurrentMarkDelete

Seen failing once on Travis build:

Tests run: 218, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.82 sec <<< FAILURE! - in TestSuite
testConcurrentMarkDelete(org.apache.bookkeeper.mledger.impl.ManagedLedgerBkTest)  Time elapsed: 42.069 sec  <<< FAILURE!
java.util.concurrent.ExecutionException: org.apache.bookkeeper.mledger.ManagedLedgerException$MetaStoreException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.bookkeeper.mledger.impl.ManagedLedgerBkTest.testConcurrentMarkDelete(ManagedLedgerBkTest.java:273)
Caused by: org.apache.bookkeeper.mledger.ManagedLedgerException$MetaStoreException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
Results :
Failed tests: 
  ManagedLedgerBkTest.testConcurrentMarkDelete:273 » Execution org.apache.bookke...

We're seeing thousands of duplicated messages when we stop a Broker

We started a Redis to store the consumed messages ID, each time a new message is consumed we check if the ID is in redis. If it is that means that the message is duplicated.

Expected behavior

Shouldn't see duplicated messages

Actual behavior

We're seeing thousands of duplicated messages after a broker goes down

Steps to reproduce

  1. Start Redis and store each consumed message ID
  2. Each time a message is consumed, search if the ID is already on Redis
  3. Stop a Broker
  4. Duplicated messages appear

Don't know if it helps but in the logs we see that all the duplicated messages have the same ledgerId and partitionIndex.

Message id duplicated messageId -> MessageIdImpl{ledgerId=9, entryId=510587, partitionIndex=2}

System configuration

Pulsar version: built from master

If you need any further information, please let us know.

100% CPU Usage on Bookies

We setup two new clusters with replication and we were seeing really bad performance when reading from backlog.
We then discovered some bookkeepers were at 100% CPU usage but apparently doing nothing, almost no Disk or Net IO.
I then proceeded to downgrading one of the bookkeeper to 1.14 and it seems to be working just fine.

Any ideas of what could be going on?

System configuration

Pulsar version: 1.15.2

Improve host usage collection

Current behavior

The load balancer collect host level usage and stats in the broker to make
informed decision on where to allocate or relocate groups of topics.

Currently the collection of this data is done with an external tool that can
be configured in conf/broker.conf :

loadBalancerHostUsageScriptPath=/path/to/my/usage/script

The script is expected to write a JSON response on stdout with this format:

{
    "bandwidthIn": {
        "limit": 10240000.0,
        "usage": 70074.0
    },
    "bandwidthOut": {
        "limit": 10240000.0,
        "usage": 273311.2
    },
    "cpu": {
        "limit": 2400.0,
        "usage": 460.79999999999995
    },
    "memory": {
        "limit": 48380,
        "usage": 30149
    }
}

At Yahoo, we are using a Python script that depends on internal version of sar to get the per-minute averages of network traffic and CPU utilization. The script is not very portable and for
that reason was not included in the open-source release.

We should find a better way to collect the needed information, possibly without resorting
to external tools or scripts.

A bit more of context about the values to be collected :

  • CPU:
    • Limit: 100 * {number_of_cores}
    • Usage: percent of usage across all cores (can be > 100)
  • Memory:
    • Limit: Total system memory in MB
    • Usage: System memory used (not cached) in MB
  • Bandwidth in / out:
    • Limit: Total bandwidth in kbps across all available NICs
    • Usage: Used bandwidth in kbps across all available NICs

Possible changes

We should be able to get most of the above information from Linux /proc files, and we could read them directly from Java instead of invoking an external process.

Memory usage

In /proc/meminfo we can find:

MemTotal:       49541256 kB
MemFree:        12117268 kB
Buffers:          575652 kB
Cached:          6412996 kB
...

To get the correct values we could use:

MemoryUsageMB = (${MemTotal} - ${MemFree} - ${Cached}) / 1024

CPU usage

In /proc/stat there is the list of CPU cores with the ticks for each single core, along with the aggregated data :

cpu  6036069640 1822219 3840682750 95802793498 99026636 847683 450700465 0 0
cpu0 377760160 83825 409994464 3618674672 13047501 40 66458 0 0
cpu1 343278842 79367 284039077 3791455422 5817324 9 3236744 0 0
cpu2 287578063 92192 303875037 3829373483 11407520 54 1186383 0 0
cpu3 246935730 100667 300170606 3874113862 15470374 51 487276 0 0
cpu4 220214390 107337 311660218 3894163960 13384924 70 227061 0 0
....

To convert these number into an average percent usage, we'd need to compare the ticks at different intervals. Eg: every minute check this file and compare it with last min values. See:
http://stackoverflow.com/questions/3017162/how-to-get-total-cpu-usage-in-linux-c

Network usage

To get the list of NICs in the system :

ls /sys/class/net

To get the speed in kbps of each interface :

cat /sys/class/net/${NAME}/speed

To get the number of bytes sent through the interface:

cat /sys/class/net/${NAME}/statistics/tx_bytes

To get the number of bytes received from the interface:

cat /sys/class/net/${NAME}/statistics/rx_bytes

The convert the absolute number of bytes into per-minute averages, we
would need to compare with the previous minute values.

Intermittent test failure on DiscoveryServiceTest

testBrokerDiscoveryRoundRobin(com.yahoo.pulsar.discovery.service.DiscoveryServiceTest)  Time elapsed: 0.493 sec  <<< FAILURE!
com.yahoo.pulsar.discovery.service.PulsarServerException: No active broker is available
	at com.yahoo.pulsar.discovery.service.DiscoveryServiceTest.testBrokerDiscoveryRoundRobin(DiscoveryServiceTest.java:82)
testClientServerConnectionTls(com.yahoo.pulsar.discovery.service.DiscoveryServiceTest)  Time elapsed: 1.168 sec  <<< FAILURE!
java.lang.AssertionError: expected [true] but found [false]
	at com.yahoo.pulsar.discovery.service.DiscoveryServiceTest.testClientServerConnectionTls(DiscoveryServiceTest.java:116)

Consumer not persisting Cursor

We have 4 consumers for the same topic, all are supposedly up to date (0 backlog).
But when restarting brokers, thus loading cursor state from zookeeper and bookkeeper, we're consistently getting one consumer reset a long time backwards.
Looking at zookeeper metadata that consumer is the only one to have an outdated markDeleteLedgerId and markDeleteEntryId, this happens for all 10 partitions of the consumer.
Not only has this consumer an outdated cursor, it seems to no be updating it as it consumes all the backlog generated when a broker is restarted and thus consumer is reset.

Binary protocol documentation

I saw a websocket docs around, but apart from reading the proto and source, is there any plans to expose and document the binary protocol? Websockets are nice and all, but documented binary protocol could allow for more community client libraries.

fail to mvn compile on win7

When run 'mvn clean compile', I got:
[ERROR] Failed to execute goal com.github.maven-nar:nar-maven-plugin:3.1.0:nar-validate (default-nar-validate) on project pulsar-checksum: Cannot deduce version number from: -> [Help 1]

does the build depend on local installation of visual c++ compiler?

could you please provide a how to build guide?

thx!

Possible Message Loss

Expected behavior

Pulsar should never loose messages.

Actual behavior

It seems to me that now, when this condition returns false we have message loss.
Consumer receives a message and discards it, Broker considers it belongs to that Consumer, it will never redeliver it unless it disconnects.
Shouldn't Consumer ask for redelivery of the message instead of discarding it?

Configuration should be immutable

Expected behavior

Configuration should be immutable. Being able to update the configuration of the Pulsar instance by using provided set methods is not safe for the program. It also isn't consistent. Some changes have an effect, some do not.

Actual behavior

The current implementation is mutable.

System configuration

Pulsar version: 1.14

I would like to suggest Typesafe Config library: https://github.com/typesafehub/config.
It's written in Java, supports simple values, as well as lists (no sets), env variable expansion and more.

There would be a number of changes required in unit tests. Some of the unit tests mutate a single configuration instance. Preferably, all tests should run on a clean environment.

I have done some preliminary work on this but I'd need merge master into it and make sure all tests pass.

Generally, if Typesafe Config is an acceptable option, I'd be happy to take this one.

Timeout while producing message

Last night we experienced a couple of timeouts (1s) producing messages to broker, we found that at that exact time the broker triggered a ledger close. Can these two events be related?

Producer logs:

[log_time:08:49:16.219] [thread:pulsar-timer-7-1] [level:INFO ] [logger:ProducerImpl] - [persistent://fury/global/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks-partition-5] [default-cluster1-4-3532] Message send timed out. Failing 1 messages

java.util.concurrent.CompletionException: com.yahoo.pulsar.client.api.PulsarClientException$TimeoutException: Could not send message to broker within given timeout
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at com.yahoo.pulsar.client.impl.ProducerImpl$1.sendComplete(ProducerImpl.java:152)
	at com.yahoo.pulsar.client.impl.ProducerImpl.lambda$failPendingMessages$6(ProducerImpl.java:932)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at com.yahoo.pulsar.client.impl.ProducerImpl.failPendingMessages(ProducerImpl.java:927)
	at com.yahoo.pulsar.client.impl.ProducerImpl.lambda$failPendingMessages$7(ProducerImpl.java:950)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:418)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:312)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
	at java.lang.Thread.run(Thread.java:745)
Caused by: com.yahoo.pulsar.client.api.PulsarClientException$TimeoutException: Could not send message to broker within given timeout
	at com.yahoo.pulsar.client.impl.ProducerImpl.run(ProducerImpl.java:904)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:588)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:662)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:385)
	... 2 more

Broker logs:

2017-01-25 08:49:15,142 - INFO  - [bookkeeper-ml-workers-36-1:OpAddEntry@155] - [fury/global/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks/persistent/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks-partition-5] Closing ledger 6999 for being full

2017-01-25 08:50:19,939 - INFO  - [main-EventThread:ManagedLedgerImpl@937] - [fury/global/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks/persistent/bf8dee5d854a4267a4b8ba7546d7d5d5-mediations-tasks-partition-5] Created new ledger 7740

What's also curious is that there's a one minute difference between ledger close and new open events.

Question regarding ZkBookieRackAffinityMapping

Hi. I couldn't find any information on the docs on how to tell pulsar to write on different bookies depending on which rack/region/az they are.

Searching the code I found the class ZkBookieRackAffinityMapping, that (correct me if I'm wrong), expects a json with the 'topology' on a znode called /bookie that will use to determine on which bookie write.

So we went ahead and created a little java process that checks the available bookies on /ledgers/available and writes that json, the output is similar to the one I've attached.

My main question here is, are we doing it right ? 😛
And how can we check if it's working?
Should we tell pulsar to use this class ?

Any help will be much appreciated. Thanks

bookie.txt

Bookkeeper failed to connect on manual recovery

I know this probably doesn't belong here.

We're trying to decomission a Bookkeeper node which is performing badly, but when we stop Bookkeeper service and try a manual recovery:
bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools zk1.example.com:2181 nodeToDecomission:3181 newNode:3181

The command fails saying it cannot connect to the stopped node, which of course is true, but I don't know why it's trying to connect to that given node if it's shut down.
The stacks are:

java.net.ConnectException: syscall:getsockopt(...): /nodeToDecomission:3181
	at io.netty.channel.unix.Socket.finishConnect(...)(Unknown Source)
2016-11-17 18:16:05,084 - ERROR - [bookkeeper-io-1-4:PerChannelBookieClient$2@284] - Could not connect to bookie: [id: 0x0a698a5c, L:/clientIp:39932]/nodeToDecomission:3181, current state CONNECTING : 

Do you have any idea of what could be going on?

Another thing to note is that when we stopped the Bookkeeper in the first place, Brokers seemed to keep retrying their connections to that Bookie for a really long time even though it was turned off.

Edit: as @estebangarcia said, the same exceptions are thrown if instead of trying manual recovery we start an Autorecovery process.

The JSON License is a risk

Pulsar depends on Json, which is one of Java Library, and its license is "The JSON License".
"The JSON License" includes the following sentence

"The Software shall be used for Good, not Evil."

and some people consider that it is a risk because it cannot be judged what is Evil.

I consider that Json should be replaced with alternative library to have many people and companies use Pulsar.

Message containing broker received timestamp

Message containing broker received timestamp, it will he helpful for delayed message and resetting cursor.

[merlimat]
Why should we use the time that the broker received the message for points 2 & 4? Shouldn't be
it more appropriate to use the publisher set timestamp?

[DongbinNie]
As one Topic/Partition is served by one Broker, the timestamp received by broker will be strictly ordered. The publisher time can't be trusted as it relies on the publisher's system time and network.

Failed to split namespace bundle

We had a couple of this errors last night, here are the logs from broker.

2017-01-25 08:53:54,522 - INFO  - [pool-1-thread-1:SimpleLoadManagerImpl@1381] - Will split hot namespace bundle fury/global/cbead41f2a5849b5a5be64df0f3521d2-orders-orders/0xb333332f_0xbffffffb, topics 2, producers+consumers 610, msgRate in+out 1000.1632182840386, bandwidth in+out 126300.59782161229

2017-01-25 08:53:54,523 - INFO  - [pulsar-web-44-25:Namespaces@781] - [null] Split namespace bundle fury/global/cbead41f2a5849b5a5be64df0f3521d2-orders-orders/0xbffffffb_0xccccccc8

2017-01-25 08:53:54,523 - ERROR - [pulsar-web-44-25:PulsarWebResource@364] - [null] Failed to validate namespace bundle fury/global/cbead41f2a5849b5a5be64df0f3521d2-orders-orders/0xbffffffb_0xccccccc

java.lang.IllegalArgumentException: Cannot find bundle in the bundles list
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:93)
	at com.yahoo.pulsar.common.naming.NamespaceBundles.validateBundle(NamespaceBundles.java:101)
	at com.yahoo.pulsar.broker.web.PulsarWebResource.validateNamespaceBundleRange(PulsarWebResource.java:361)
	at com.yahoo.pulsar.broker.web.PulsarWebResource.validateNamespaceBundleOwnership(PulsarWebResource.java:372)
	at com.yahoo.pulsar.broker.admin.Namespaces.splitNamespaceBundle(Namespaces.java:793)
	at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:143)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
	at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
	at org.eclipse.jetty.server.Server.handle(Server.java:524)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:319)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
	at java.lang.Thread.run(Thread.java:745)

com.yahoo.pulsar.client.admin.PulsarAdminException$ServerSideErrorException: Some error occourred on the server
	at com.yahoo.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:160)
	at com.yahoo.pulsar.client.admin.internal.NamespacesImpl.splitNamespaceBundle(NamespacesImpl.java:349)
	at com.yahoo.pulsar.broker.loadbalance.impl.SimpleLoadManagerImpl.doNamespaceBundleSplit(SimpleLoadManagerImpl.java:1396)
	at com.yahoo.pulsar.broker.loadbalance.NamespaceBundleSplitTask.run(NamespaceBundleSplitTask.java:37)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
	at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1032)
	at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java:819)
	at org.glassfish.jersey.client.JerseyInvocation.access$700(JerseyInvocation.java:92)
	at org.glassfish.jersey.client.JerseyInvocation$2.call(JerseyInvocation.java:701)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:228)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:444)
	at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:697)
	at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:448)
	at org.glassfish.jersey.client.JerseyInvocation$Builder.put(JerseyInvocation.java:332)
	at com.yahoo.pulsar.client.admin.internal.NamespacesImpl.splitNamespaceBundle(NamespacesImpl.java:347)
	... 9 more


2017-01-25 08:53:54,527 - ERROR - [pool-1-thread-1:SimpleLoadManagerImpl@1400] - Failed to split namespace bundle fury/global/cbead41f2a5849b5a5be64df0f3521d2-orders-orders/0xbffffffb_0xccccccc8


2017-01-25 08:53:54,527 - ERROR - [pulsar-web-44-10:PulsarWebResource@364] - [null] Failed to validate namespace bundle fury/global/cbead41f2a5849b5a5be64df0f3521d2-orders-orders/0xb333332f_0xbffffffb


2017-01-25 08:53:54,527 - INFO  - [pulsar-web-44-10:Namespaces@781] - [null] Split namespace bundle fury/global/cbead41f2a5849b5a5be64df0f3521d2-orders-orders/0xb333332f_0xbffffffb


java.lang.IllegalArgumentException: Invalid upper boundary for bundle
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:93)
	at com.yahoo.pulsar.common.naming.NamespaceBundles.validateBundle(NamespaceBundles.java:102)
	at com.yahoo.pulsar.broker.web.PulsarWebResource.validateNamespaceBundleRange(PulsarWebResource.java:361)
	at com.yahoo.pulsar.broker.web.PulsarWebResource.validateNamespaceBundleOwnership(PulsarWebResource.java:372)
	at com.yahoo.pulsar.broker.admin.Namespaces.splitNamespaceBundle(Namespaces.java:793)
	at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:143)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
	at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
	at org.eclipse.jetty.server.Server.handle(Server.java:524)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:319)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
	at java.lang.Thread.run(Thread.java:745)


2017-01-25 08:53:54,528 - ERROR - [pool-1-thread-1:SimpleLoadManagerImpl@1400] - Failed to split namespace bundle fury/global/cbead41f2a5849b5a5be64df0f3521d2-orders-orders/0xb333332f_0xbffffffb


com.yahoo.pulsar.client.admin.PulsarAdminException$ServerSideErrorException: Some error occourred on the server
	at com.yahoo.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:160)
	at com.yahoo.pulsar.client.admin.internal.NamespacesImpl.splitNamespaceBundle(NamespacesImpl.java:349)
	at com.yahoo.pulsar.broker.loadbalance.impl.SimpleLoadManagerImpl.doNamespaceBundleSplit(SimpleLoadManagerImpl.java:1396)
	at com.yahoo.pulsar.broker.loadbalance.NamespaceBundleSplitTask.run(NamespaceBundleSplitTask.java:37)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
	at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1032)
	at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java:819)
	at org.glassfish.jersey.client.JerseyInvocation.access$700(JerseyInvocation.java:92)
	at org.glassfish.jersey.client.JerseyInvocation$2.call(JerseyInvocation.java:701)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:228)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:444)
	at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:697)
	at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:448)
	at org.glassfish.jersey.client.JerseyInvocation$Builder.put(JerseyInvocation.java:332)
	at com.yahoo.pulsar.client.admin.internal.NamespacesImpl.splitNamespaceBundle(NamespacesImpl.java:347)
	... 9 more

Retention is not enforced when consumers don't advance

Retention policies are not enforced a single consumer (or more) don't acknowledge messages.
The issue seems to be this, where the oldest acked position for a topic is searched and only older ledgers get discarded.
Is this correct? Should TTL be used instead retention for enforcement?

Intermittent test failures in PersistentTopicE2ETest.testMessageReplay

com.yahoo.pulsar.client.admin.internal.BrokerStatsImpl$$EnhancerByMockitoWithCGLIB$$a5febb76@6594d3f2
Tests run: 363, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 823.349 sec <<< FAILURE! - in TestSuite
testMessageReplay(com.yahoo.pulsar.broker.service.PersistentTopicE2ETest)  Time elapsed: 0.125 sec  <<< FAILURE!
java.lang.AssertionError: arrays differ firstly at element [11]; expected value is <48> but was <56>. 
    at com.yahoo.pulsar.broker.service.PersistentTopicE2ETest.testMessageReplay(PersistentTopicE2ETest.java:1158)
Results :
Failed tests: 
  PersistentTopicE2ETest.testMessageReplay:1158 arrays differ firstly at element [11]; expected value is <48> but was <56>. 
Tests run: 363, Failures: 1, Errors: 0, Skipped: 0

Intermittent test fail on BatchMessageTest.testSimpleBatchProducerConsumer

Output from Travis build:

com.yahoo.pulsar.client.admin.internal.BrokerStatsImpl$$EnhancerByMockitoWithCGLIB$$fb883741@6b353df8
Tests run: 361, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 831.053 sec <<< FAILURE! - in TestSuite
testSimpleBatchProducerConsumer(com.yahoo.pulsar.broker.service.BatchMessageTest)  Time elapsed: 0.306 sec  <<< FAILURE!
java.lang.AssertionError: expected [0] but found [4]
    at com.yahoo.pulsar.broker.service.BatchMessageTest.testSimpleBatchProducerConsumer(BatchMessageTest.java:293)

Broker Shutdown while unloading bundles

Using the functionallity in #88 we enabled Broker loadbalancing and testing it.
We're seeing that almost everytime a Broker unloads a topic it prints a bunch of stacks like the following that do not seem too harmful.

2016-11-16 17:34:41,750 - INFO  - [pulsar-web-44-1:Consumer@257] - Disconnecting consumer: Consumer{subscription=PersistentSubscription{topic=persistent://items/global/items/items-partition-7, name=items-104}, consumerId=1057, consumerName=ip-10-3-3-105, address=/10.3.3.105:44574}
2016-11-16 17:34:41,750 - WARN  - [pulsar-web-44-19:DestinationLookup@114] - Failed to lookup broker for topic persistent://items/global/items/items-partition-7: java.lang.IllegalStateException: Namespace bundle items/global/items/0x33333332_0x4ccccccb is being unloaded
java.util.concurrent.CompletionException: java.lang.IllegalStateException: Namespace bundle items/global/items/0x33333332_0x4ccccccb is being unloaded
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:984)
        at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
        at com.yahoo.pulsar.broker.namespace.NamespaceService.getBrokerServiceUrlAsync(NamespaceService.java:131)
        at com.yahoo.pulsar.broker.lookup.DestinationLookup.lookupDestinationAsync(DestinationLookup.java:77)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
        at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:143)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
        at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
        at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
        at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
        at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
        at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
        at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
        at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
        at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at org.eclipse.jetty.server.Server.handle(Server.java:524)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:319)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
        at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Namespace bundle items/global/items/0x33333332_0x4ccccccb is being unloaded
        at com.yahoo.pulsar.broker.namespace.NamespaceService.lambda$findBrokerServiceUrl$4(NamespaceService.java:305)
        at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656)
        at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:669)
        at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:1997)
        at com.yahoo.pulsar.broker.namespace.NamespaceService.findBrokerServiceUrl(NamespaceService.java:289)
        at com.yahoo.pulsar.broker.namespace.NamespaceService.lambda$getBrokerServiceUrlAsync$0(NamespaceService.java:131)
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
        ... 52 more
2016-11-16 17:34:41,752 - INFO  - [pulsar-web-44-19:Slf4jRequestLog@60] - 10.3.3.231 - - [16/Nov/2016:17:34:41 +0000] "GET //pulsar-discovery.furycloud.io/lookup/v2/destination/persistent/items/global/items/items-partition-7 HTTP/1.1" 500 392 "-" "Pulsar-Java-v1.15.2" 2
2016-11-16 17:34:41,752 - INFO  - [pulsar-web-44-1:PersistentDispatcherMultipleConsumers@108] - Removed consumer Consumer{subscription=PersistentSubscription{topic=persistent://items/global/items/items-partition-7, name=items-104}, consumerId=1057, consumerName=ip-10-3-3-105, address=/10.3.3.105:44574} with pending 1000 acks

The problem is that after a few seconds the broker always looses it's connection to Zookeeper and then crashes.

2016-11-16 17:34:09,533 - INFO  - [pool-1-thread-1:SimpleLoadManagerImpl@1345] - Running namespace bundle split with thresholds: topics 1000, sessions 1000, msgRate 1000, bandwidth 104857600, maxBundles 128
2016-11-16 17:34:09,534 - INFO  - [pool-1-thread-1:SimpleLoadManagerImpl@1381] - Will split hot namespace bundle items/global/items/0x33333332_0x4ccccccb, topics 4, producers+consumers 4, msgRate in+out 5103.013012492122, bandwidth in+out 2094753.0844518002
2016-11-16 17:34:09,535 - INFO  - [pool-1-thread-1:SimpleLoadManagerImpl@1381] - Will split hot namespace bundle items/global/items/0xccccccc8_0xe6666661, topics 3, producers+consumers 3, msgRate in+out 3823.7859370443607, bandwidth in+out 1568994.7075086082
2016-11-16 17:34:09,575 - INFO  - [pool-1-thread-1:PulsarService@563] - Admin api url: http://ip-10-3-3-225:8080
2016-11-16 17:34:09,681 - INFO  - [pulsar-web-44-8:Namespaces@781] - [null] Split namespace bundle items/global/items/0x33333332_0x4ccccccb
2016-11-16 17:34:09,702 - INFO  - [pulsar-web-44-8:OwnershipCache@224] - Trying to acquire ownership of items/global/items/0x33333332_0x3ffffffe
2016-11-16 17:34:09,702 - INFO  - [pulsar-web-44-8:OwnershipCache@224] - Trying to acquire ownership of items/global/items/0x3ffffffe_0x4ccccccb
2016-11-16 17:34:09,706 - INFO  - [main-EventThread:OwnershipCache@229] - Successfully acquired ownership of /namespace/items/global/items/0x33333332_0x3ffffffe
2016-11-16 17:34:09,706 - INFO  - [main-EventThread:OwnershipCache@229] - Successfully acquired ownership of /namespace/items/global/items/0x3ffffffe_0x4ccccccb
2016-11-16 17:34:09,709 - INFO  - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x2586d8253880021 local:/10.3.3.225:41540 remoteserver:ip-10-3-3-143.ec2.internal/10.3.3.143:2181 lastZxid:4294989977 xid:4113 sent:4113 recv:4173 queuedpkts:0 pendingresp:0 queuedevents:1] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/admin/local-policies/items/global/items
2016-11-16 17:34:12,805 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 25 seconds
2016-11-16 17:34:14,805 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 23 seconds
2016-11-16 17:34:16,806 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 21 seconds
2016-11-16 17:34:19,073 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 18 seconds
2016-11-16 17:34:21,106 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 16 seconds
2016-11-16 17:34:23,106 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 14 seconds
2016-11-16 17:34:25,106 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 12 seconds
2016-11-16 17:34:27,107 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 10 seconds
2016-11-16 17:34:29,107 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 8 seconds
2016-11-16 17:34:31,107 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 6 seconds
2016-11-16 17:34:33,108 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 4 seconds
2016-11-16 17:34:35,108 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 2 seconds
2016-11-16 17:34:37,108 - WARN  - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@160] - zoo keeper disconnected, waiting to reconnect, time remaining = 0 seconds
2016-11-16 17:34:38,101 - INFO  - [pulsar-io-39-2:ServerCnx@577] - [PersistentTopic{topic=persistent://items/global/items/items-partition-7}][pulsar.repl.us-east] Closing producer on cnx /10.3.3.140:57424
2016-11-16 17:34:38,106 - INFO  - [pulsar-io-39-2:ServerCnx@580] - [PersistentTopic{topic=persistent://items/global/items/items-partition-7}][pulsar.repl.us-east] Closed producer on cnx /10.3.3.140:57424
2016-11-16 17:34:38,225 - INFO  - [pulsar-io-39-6:ClientCnx@288] - [ip-10-3-3-140/10.3.3.140:6650] Broker notification of Closed producer: 58
2016-11-16 17:34:38,226 - INFO  - [pulsar-io-39-6:HandlerBase@109] - [persistent://items/global/items/items-partition-7] [pulsar.repl.us-west] Closed connection [id: 0x21af9022, L:/10.3.3.225:55580 - R:ip-10-3-3-140/10.3.3.140:6650] -- Will try again in 0.12 s
2016-11-16 17:34:38,347 - WARN  - [pulsar-timer-46-1:HandlerBase@112] - [persistent://items/global/items/items-partition-7] [pulsar.repl.us-west] Reconnecting after timeout
2016-11-16 17:34:38,359 - WARN  - [pulsar-io-39-3:HttpClient@140] - [http://internal-pulsar-brokers-1007547935.us-east-1.elb.amazonaws.com/lookup/v2/destination/persistent/items/global/items/items-partition-7] HTTP get request failed: Request failed.
2016-11-16 17:34:38,359 - WARN  - [pulsar-io-39-3:HandlerBase@74] - [persistent://items/global/items/items-partition-7] [pulsar.repl.us-west] Error connecting to broker: com.yahoo.pulsar.client.api.PulsarClientException: HTTP get request failed: Request failed.
2016-11-16 17:34:38,359 - WARN  - [pulsar-io-39-3:HandlerBase@92] - [persistent://items/global/items/items-partition-7] [pulsar.repl.us-west] Could not get connection to broker: com.yahoo.pulsar.client.api.PulsarClientException: HTTP get request failed: Request failed. -- Will try again in 0.235 s
2016-11-16 17:34:38,595 - INFO  - [pulsar-timer-46-1:HandlerBase@96] - [persistent://items/global/items/items-partition-7] [pulsar.repl.us-west] Reconnecting after connection was closed
2016-11-16 17:34:38,597 - WARN  - [pulsar-io-39-3:HttpClient@140] - [http://internal-pulsar-brokers-1007547935.us-east-1.elb.amazonaws.com/lookup/v2/destination/persistent/items/global/items/items-partition-7] HTTP get request failed: Request failed.
2016-11-16 17:34:38,598 - WARN  - [pulsar-io-39-3:HandlerBase@74] - [persistent://items/global/items/items-partition-7] [pulsar.repl.us-west] Error connecting to broker: com.yahoo.pulsar.client.api.PulsarClientException: HTTP get request failed: Request failed.
2016-11-16 17:34:38,598 - WARN  - [pulsar-io-39-3:HandlerBase@92] - [persistent://items/global/items/items-partition-7] [pulsar.repl.us-west] Could not get connection to broker: com.yahoo.pulsar.client.api.PulsarClientException: HTTP get request failed: Request failed. -- Will try again in 0.431 s
2016-11-16 17:34:39,030 - INFO  - [pulsar-timer-46-1:HandlerBase@96] - [persistent://items/global/items/items-partition-7] [pulsar.repl.us-west] Reconnecting after connection was closed
2016-11-16 17:34:39,032 - WARN  - [pulsar-io-39-3:HttpClient@140] - [http://internal-pulsar-brokers-1007547935.us-east-1.elb.amazonaws.com/lookup/v2/destination/persistent/items/global/items/items-partition-7] HTTP get request failed: Request failed.
2016-11-16 17:34:39,032 - WARN  - [pulsar-io-39-3:HandlerBase@74] - [persistent://items/global/items/items-partition-7] [pulsar.repl.us-west] Error connecting to broker: com.yahoo.pulsar.client.api.PulsarClientException: HTTP get request failed: Request failed.
2016-11-16 17:34:39,032 - WARN  - [pulsar-io-39-3:HandlerBase@92] - [persistent://items/global/items/items-partition-7] [pulsar.repl.us-west] Could not get connection to broker: com.yahoo.pulsar.client.api.PulsarClientException: HTTP get request failed: Request failed. -- Will try again in 0.904 s
2016-11-16 17:34:39,109 - ERROR - [pulsar-zk-session-watcher-10-1:ZooKeeperSessionWatcher@155] - timeout expired for reconnecting, invoking shutdown service
2016-11-16 17:34:39,109 - INFO  - [pulsar-zk-session-watcher-10-1:ZooKeeper@684] - Session: 0x2586d8253880021 closed
2016-11-16 17:34:39,109 - INFO  - [pulsar-zk-session-watcher-10-1:MessagingServiceShutdownHook@94] - Invoking Runtime.halt(-1)

Do you know what could be happening? We don't see any issue with Zookeepers and the other brokers continue working properly.

EDIT: Broker load doesn't seem to affect this, a broker with 7% CPU usage just crashed with the same exceptions.

Intermittent test failure SimpleProducerConsumerTest.testUnackBlockRedeliverMessages

Seen it failing a few times with:

testUnackBlockRedeliverMessages(com.yahoo.pulsar.client.api.SimpleProducerConsumerTest)  Time elapsed: 3.166 sec  <<< FAILURE!
java.lang.AssertionError: expected [20] but found [112]
    at com.yahoo.pulsar.client.api.SimpleProducerConsumerTest.testUnackBlockRedeliverMessages(SimpleProducerConsumerTest.java:1301)

Consumers blocked with 0 availablePermits

When a consumer with ackTimeout connects, it negotiates a given amount of permits with broker (say 1000), it then receives 1000 messages. At this point the consumer has 0 availablePermits and 1000 unackedMessages, If no one pulls these messages from this consumer, they are eventually redelivered to the broker, now the broker schedules redelivery of those messages, but never increments availablePermits for the consumer, leaving the consumer stuck with 0 availablePermits.

This happens at least with perMessageRedelivery, as a redelivery does not necessarily imply that permits have to be granted to the consumer. Unfortunately, I don't know if the Broker can really do something here.

I believe this issue is a consequence of a consumer asking for redelivery of messages that have never been pulled from the queue.

The best approach seems to be for the consumer to ask for more permits when this is the case.
What do you think?

NamespacesTest.testGetNamespaces fails sometimes

I've seen it failing just once on Travis build:

testGetNamespaces(com.yahoo.pulsar.broker.admin.NamespacesTest)  Time elapsed: 0.025 sec  <<< FAILURE!
java.lang.AssertionError: should have failed
    at com.yahoo.pulsar.broker.admin.NamespacesTest.testGetNamespaces(NamespacesTest.java:235)

Binary Protocol for getting topic information

Problem
Sometimes we need topic statics before producing or consuming messages (Common use cases listed below), currently such information is obtained by making admin calls which use HTTP. Admin calls are expensive because each call spawns a new connection and may cause a redirect if the current broker doesn’t own the topic, we need a way to get this information without using HTTP.

Common use cases

    • When Pulsar is used as a write ahead log we decide whether to allow updates to a topic based on the amount of backlog the topic has.
    • If a client producing messages needs to create some kind of real time throttling mechanism then it needs msgRateIn information
    • A client may need to check if someone has subscribed to the topic before creating message.
    • A client may need to check if a consumer is blocked due to unacked messages before producing messages.

Proposed solution

  1. Add new commands to the existing binary protocol for getting the persistent topic stats (details below).
  2. Once the consumer has subscribed or publisher has been created we know the broker which owns the topic and will use the same connection to get the stats using binary protocol.
  3. If the connection becomes stale - due to topic being unloaded or the broker crashing, the stats api will return an error.
message CommandTopicStats {
    
    enum StatsType {
        TOPIC = 0;
        SUBSCRIPTIONS = 2;
        PUBLISHERS = 3;
    }

    message Topic {
        enum Field {
            ALL                 = 0;
            MSG_RATE_IN         = 1;
            MSG_THROUGHPUT_IN   = 2;
            MSG_RATE_OUT        = 3;
            MSG_THROUGHPUT_OUT  = 4;
            AVERAGE_MSG_SIZE        = 5;
            STORAGE_SIZE        = 6;
        }
        repeated Field fields = 0;
    }

    message Subscription {
        enum Field {
            ALL                 = 0;
            MSG_RATE_OUT        = 1;
            MSG_THROUGHPUT_OUT  = 2;
            MSG_RATE_REDELIVER  = 3;
            MSG_BACKLOG         = 4;
            UNACKED_MESSAGES        = 5;
            MSG_RATE_EXPIRED        = 6;
            CONSUMER            = 7;
        }

        message Consumer {
            enum FIELD {
                ALL                         = 0;
                MSG_RATE_OUT                = 1;
                MSG_THROUGHPUT_OUT          = 2;
                MSG_RATE_REDELIVER          = 3;
                AVAILABLE_PERMITS               = 4;
                UNACKED_MESSAGES                = 5;
                BLOCKED_CONSUMER_ON_UNACKED_MSGS    = 6;
                ADDRESS                 = 7;
                CONNECTED_SINCE             = 8;
            }

            repeated Field fields = 0;
            required string consumer_name = 1;
        }

        repeated Field fields = 0;
        required string subscription_name = 1;
        optional Consumer consumer = 2;
    }

    message Publisher {
        enum Field {
            ALL                 = 0;
            MSG_RATE_IN         = 1;
            MSG_THROUGHPUT_IN   = 2;
            AVERAGE_MSG_SIZE        = 3;
            ADDRESS         = 4;
            CONNECTED_SINCE     = 5;
        }

        repeated Field fields = 0;
        required string producer_name = 1;
    }


    required uint64 request_id          = 1;
    required string topic_name              = 2;
    optional StatsType stats_type       = 3 [default = TOPIC];

    oneof stats_type_info {
        optional Topic topic        = 4;
        optional Subscription subscription = 5;
        optional Publisher publisher = 6;
    }
}

Introduce a way to update broker service configuration dynamically

As discussed at #176

We need to think well how to expose the whole thing in consistent way. You have the values from the config file + the values in ZK. There should be a clear way to understand what is the actual configuration that is:
Supposed to be applied to all brokers
It's actually being applied in a given broker...
Other problem is how to clearly identify which values are dynamically tunable (eg: zk servers string will not be dynamic...)

@merlimat @saandrews : I have created very initial level commit to propose a possible approach for this feature. Can you please provide your thoughts or ideas on it. If it makes sense then we can move forward on this approach.

Shared mode consumer priority

Shared mode consumer priority, the messages will only goes to the consumers with higher priority if there's permit, otherwise the message can also be pushed to the consumers with lower priority.

[merlimat]
Good idea. This should be relatively easy to implement.

[DongbinNie]
:-)

Geo Replication

Could you please show some details about Geo-Replication?

Subscribing on dedicated position

Subscribing on other position instead of the last committed position on the first subscription.

[merlimat]
That would be only meaningful if the retention-time has been specified, otherwise the messages might have been already deleted. Also, what should the outcome be if subscribing to a position that is not available anymore?

[DongbinNie]
It may be a error or subscribing to the first available position.

[merlimat]
An option that is kind of similar is to subscribe and then rollback the subscription to X hours before.

[DongbinNie]
Yes, it can be achieved by this way.

Intermittent test failure PulsarClientToolTest.testInitialzation

testInitialzation(com.yahoo.pulsar.client.cli.PulsarClientToolTest)  Time elapsed: 10.01 sec  <<< FAILURE!
org.testng.internal.thread.ThreadTimeoutException: Method org.testng.internal.TestNGMethod.testInitialzation() didn't finish within the time-out 10000
Results :
Failed tests: 
  PulsarClientToolTest.testInitialzation » ThreadTimeout Method org.testng.inter...

WebSocket doesn't work well when using VIP and authentication plugin using IP address

We have trouble with WebSocket function of Pulsar.

  • WebSocketService makes a client and the client connects Broker.
  • When the server for WebSocket is same as the server for Broker and when using VIP, a VIP member tries to connect with the VIP address.
  • When a VIP member is connected from a VIP member through VIP address, it behaves as if it is connected from not VIP member's address but VIP address because of loopback address setting.
  • When authentication plugin uses IP address information, authentication fails.

Dead-Letter Queues

Pulsar should support dead-letter queues to prevent consumers being stuck when they cannot acknowledge a message.
Reference #180

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.