nats-io / nats-streaming-server Goto Github PK
View Code? Open in Web Editor NEWNATS Streaming System Server
Home Page: https://nats.io
License: Apache License 2.0
NATS Streaming System Server
Home Page: https://nats.io
License: Apache License 2.0
Say that you set some store limits, for instance 1 channel max. Publishing on "foo" works fine. Publishing on "bar should report an error (too many channels). This is not happening.
The reason is that the server acks the publisher before trying to persist the message (where limits are checked).
Running more than one streaming server with the same ID in a NATS server cluster will result in undefined behavior, and should not be allowed. Add defensive code to prevent startup if another streaming server is detected with the same cluster ID.
==================
WARNING: DATA RACE
Write by goroutine 50:
github.com/nats-io/stan-server/server.(*subState).deleteFromList()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:1256 +0xd0
github.com/nats-io/stan-server/server.(*client).removeSub()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/client.go:41 +0xc8
github.com/nats-io/stan-server/server.(*clientStore).RemoveSub()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/client.go:153 +0x11a
github.com/nats-io/stan-server/server.(*StanServer).processUnSubscribeRequest()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:1351 +0x96d
github.com/nats-io/stan-server/server.(*StanServer).(github.com/nats-io/stan-server/server.processUnSubscribeRequest)-fm()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:638 +0x37
github.com/nats-io/nats.(*Conn).waitForMsgs()
/Users/ivan/dev/go/src/github.com/nats-io/nats/nats.go:1355 +0x3aa
Previous read by goroutine 90:
github.com/nats-io/stan-server/server.(*StanServer).removeAllNonDurableSubscribers()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:1284 +0xe4
github.com/nats-io/stan-server/server.(*StanServer).closeClient()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:857 +0x13c
github.com/nats-io/stan-server/server.(*StanServer).processCloseRequest()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:882 +0x52a
github.com/nats-io/stan-server/server.(*StanServer).(github.com/nats-io/stan-server/server.processCloseRequest)-fm()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:643 +0x37
github.com/nats-io/nats.(*Conn).waitForMsgs()
/Users/ivan/dev/go/src/github.com/nats-io/nats/nats.go:1355 +0x3aa
====================
Need support for Fault Tolerance and High Availability
When a server with FileStore with stored messages is restarted, all messages are marked as redelivered, regardless if they have ever been sent to any subscriber. This is not correct.
Only un-acknowledged messages previously sent to a subscriber should be marked as redelivered.
Test case:
From @mcqueary on March 18, 2016 19:14
We need a file store module.
Copied from original issue: nats-io/stan.go#40
From @ColinSullivan1 on March 15, 2016 21:18
When an application crashes, it may not call the STAN Close() function, so STAN will hold the application's clientID as if it were connected. This leaves STAN in a state where it will prevent a connection with the same ClientID, erroneously indicating it is a duplicate.
I propose we:
a) Add a disconnected alert/advisory from NATS that STAN can listen to. Upon a client disconnect, the NATS server publishes an advisory message containing the client name, perhaps in JSON format.
b) STAN uses the client ID as the NATS client name to connect.
c) STAN listens on the advisory subject and removes the client ID when notified a client has been disconnected.
Copied from original issue: nats-io/stan.go#30
If an application with a subscriber stops without calling Close()
, if new messages on the subject are received by the server, their sequence number will be persisted in the channel's subscriptions file. The server will try send those messages, and even attempt to redeliver them (until the first missed heartbeat).
For durables, this is more of an issue because on durable restart, all those messages will be considered redelivered - while one would argue that they should be considered new messages since the application never received them (I understand the limitations on how much the server can know if those messages were actually received or not, regardless of the described behavior).
Until we have a solid story about application stopping and STAN server knowing about it, should we reduce the client timeout from 5 minutes to something smaller (but not go too small like we did previously)? Any other approach that would eliminate this behavior, or is this considered ok?
This is directly analogous to TimeDeltaStart
vs. StartTime
. The user doesn't know (or care) what the current sequence number is, but wants the last, say, 1000 messages in order to prime e.g. a moving-window time series display. Right now there is no way to do that.
Ideally the user could just enter a negative offset, but that won't work because StartSequence is defined as uint64. Therefore I propose we just add DeltaOffset or OffsetDelta as another option and do the math in the server.
If you start a subscriber with start position "Last Received", then send a message on the subject, the subscriber does not receives it.
The issue is that if there is was no message ever sent on that subject, LastSequence() would be 0, and the server tries to get LastSequence() - 1, which because of the use of uint64 makes it the max uint64 value!
UPDATE: This is true for other start positions.
Suppose that you have 2 queue subscribers on "foo".
One message is sent, the first queue subscriber receiving it does not acknowledge it.
Message is redelivered to second queue subscriber which acknowledges it.
Server is restarted.
The message is going to be redelivered.
This should also incorporate updating dependencies/imports, travis and coverall configurations
From @ColinSullivan1 on March 16, 2016 22:25
Server-To-Client heart beats need to be configurable:
Something along the lines of:
We should discuss if the server sends the interval in the connect response and the clients behave accordingly, or we require the client to set this explicitly and independently. IMO sending these to the client will reduce support costs and make STAN simpler for users.
Copied from original issue: nats-io/stan.go#39
Create a durable "mydur"
Stop the application
Restart the application: OK
Stop the application
Restart the application: fails with duplicate durable.
We changed the HB and Timeout to something like 200ms and 2 seconds, respectively. By default, a NATS client reconnects after 2 seconds. So say that you stop/restart the NATS server (STAN server when NATS is embedded), then when STAN applications reconnect to NATS Server, it may be too late: the STAN server will have purged those clients because of lack of heartbeats.
FileStore assigns a subscription ID when storing a subscription. However, on recovery, the subscription ID for new subscriptions don't take into account the ids of the subscriptions recovered.
As of now, we persist the state only through subscriptions.
The problem is if an application "connects" and before creating its first subscription, the server is restarted, there would be no client state restored, which would cause other actions to fail (create subscriber, etc).
Same goes for STAN applications that would be only publishing: the server would not be able to check the client health through heartbeats since it would not know about those clients.
With some batterie powered disks, sync may not be required. Make it possible to disable through command line/options.
WARNING: DATA RACE
Read by goroutine 748:
github.com/nats-io/stan-server/server.(*StanServer).processSubscriptionRequest()
/home/travis/gopath/src/github.com/nats-io/stan-server/server/server.go:1591 +0xdb7
github.com/nats-io/stan-server/server.(*StanServer).(github.com/nats-io/stan-server/server.processSubscriptionRequest)-fm()
/home/travis/gopath/src/github.com/nats-io/stan-server/server/server.go:674 +0x37
github.com/nats-io/nats.(*Conn).waitForMsgs()
/home/travis/gopath/src/github.com/nats-io/nats/nats.go:1355 +0x3aa
Previous write by goroutine 747:
[failed to restore the stack]
Goroutine 748 (running) created at:
github.com/nats-io/nats.(*Conn).subscribe()
/home/travis/gopath/src/github.com/nats-io/nats/nats.go:1816 +0x504
github.com/nats-io/nats.(*Conn).Subscribe()
/home/travis/gopath/src/github.com/nats-io/nats/nats.go:1733 +0x7a
github.com/nats-io/stan-server/server.(*StanServer).initSubscriptions()
/home/travis/gopath/src/github.com/nats-io/stan-server/server/server.go:674 +0x692
github.com/nats-io/stan-server/server.RunServerWithOpts()
/home/travis/gopath/src/github.com/nats-io/stan-server/server/server.go:487 +0x140d
github.com/nats-io/stan-server/server.RunServer()
/home/travis/gopath/src/github.com/nats-io/stan-server/server/server.go:361 +0x148
github.com/nats-io/stan-server/test.RunServer()
/home/travis/gopath/src/github.com/nats-io/stan-server/test/test.go:11 +0x38
github.com/nats-io/go-stan.RunServer()
/home/travis/gopath/src/github.com/nats-io/go-stan/stan_test.go:30 +0x38
github.com/nats-io/go-stan.TestNoDuplicatesOnSubscriberStart()
/home/travis/gopath/src/github.com/nats-io/go-stan/stan_test.go:1728 +0x7a
testing.tRunner()
/tmp/workdir/go/src/testing/testing.go:456 +0xdc
Goroutine 747 (running) created at:
github.com/nats-io/nats.(*Conn).subscribe()
/home/travis/gopath/src/github.com/nats-io/nats/nats.go:1816 +0x504
github.com/nats-io/nats.(*Conn).Subscribe()
/home/travis/gopath/src/github.com/nats-io/nats/nats.go:1733 +0x7a
github.com/nats-io/stan-server/server.(*StanServer).initSubscriptions()
/home/travis/gopath/src/github.com/nats-io/stan-server/server/server.go:669 +0x451
github.com/nats-io/stan-server/server.RunServerWithOpts()
/home/travis/gopath/src/github.com/nats-io/stan-server/server/server.go:487 +0x140d
github.com/nats-io/stan-server/server.RunServer()
/home/travis/gopath/src/github.com/nats-io/stan-server/server/server.go:361 +0x148
github.com/nats-io/stan-server/test.RunServer()
/home/travis/gopath/src/github.com/nats-io/stan-server/test/test.go:11 +0x38
github.com/nats-io/go-stan.RunServer()
/home/travis/gopath/src/github.com/nats-io/go-stan/stan_test.go:30 +0x38
github.com/nats-io/go-stan.TestNoDuplicatesOnSubscriberStart()
/home/travis/gopath/src/github.com/nats-io/go-stan/stan_test.go:1728 +0x7a
testing.tRunner()
/tmp/workdir/go/src/testing/testing.go:456 +0xdc
====================
Getting the following from my travis build for stan-java-client. Is the master branch of stan-server out of sync with the master branch of gnatsd?
$ go get github.com/nats-io/stan-server
# github.com/nats-io/stan-server
../../../gopath/src/github.com/nats-io/stan-server/stan-server.go:93: natsOpts.BufSize undefined (type "github.com/nats-io/gnatsd/server".Options has no field or method BufSize)
The command "go get github.com/nats-io/stan-server" failed and exited with 2 during .
Your build has been stopped.
Introspection into the NATS streaming server via a monitoring interface is required. An interface similar to that of the NATS server should suffice.
Some discussion points:
/server
, /stores
, /clients
?If a durable is stopped after the server sent up to MaxInFlight
messages, when the durable is restarted, it does not receive the pending messages.
Users will want the NATS Streaming clients to connect using TLS. The embedded NATS server can receive TLS parameters through the NATS streaming server command line, but we need the NATS streaming server to use TLS with it's underlying connection to the NATS server.
To support this, TLS client side certificates need to be provided to the streaming server, which can be achieved through using command line parameters.
Say a subscriber is running. It receives messages 1 to 100. At this point, the server (with file store) is restarted. On new message, old messages are redelivered. This does not seem to be right.
Persistent, ordered and highly-available messages enable a great many client use cases (support for recovery from client down time, cluster healing etc).
There are additional use cases where being able to (minimally) filter/query historic messages and export the results to pre-defined format in non-realtime (or near real-time if possible). For example:
First thoughts are you should be able to filter by topic, date range and sequence range.
Hi guys, thank you for this awesome project.
Do you plan to release NATS Streaming Server to Docker Hub anytime soon?
Add vendoring for the gogo protobuf packages.
Remove "STAN:" from the individual log lines to instead prefix in the logger.
==================
WARNING: DATA RACE
Read by goroutine 15:
github.com/nats-io/nats-streaming-server/server.(*subState).deleteFromList()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:1551 +0x7f
github.com/nats-io/nats-streaming-server/server.(*subStore).Remove()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:309 +0x3a3
github.com/nats-io/nats-streaming-server/server.(*StanServer).removeAllNonDurableSubscribers()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:1596 +0x270
github.com/nats-io/nats-streaming-server/server.(*StanServer).closeClient()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:1140 +0x1bb
github.com/nats-io/nats-streaming-server/server.(*StanServer).processCloseRequest()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:1156 +0x302
github.com/nats-io/nats-streaming-server/server.(*StanServer).(github.com/nats-io/nats-streaming-server/server.processCloseRequest)-fm()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:913 +0x37
github.com/nats-io/nats.(*Conn).waitForMsgs()
/home/travis/gopath/src/github.com/nats-io/nats/nats.go:1360 +0x3aa
Previous write by goroutine 24:
runtime.slicecopy()
/home/travis/.gimme/versions/go1.6.2.linux.amd64/src/runtime/slice.go:113 +0x0
github.com/nats-io/nats-streaming-server/server.findBestQueueSub()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:1251 +0x1be
github.com/nats-io/nats-streaming-server/server.(*StanServer).sendMsgToQueueGroup()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:1264 +0x87
github.com/nats-io/nats-streaming-server/server.(*StanServer).performAckExpirationRedelivery()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:1425 +0xd45
github.com/nats-io/nats-streaming-server/server.(*StanServer).setupAckTimer.func1()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:1522 +0x36
Goroutine 15 (running) created at:
github.com/nats-io/nats.(*Conn).subscribe()
/home/travis/gopath/src/github.com/nats-io/nats/nats.go:1828 +0x515
github.com/nats-io/nats.(*Conn).Subscribe()
/home/travis/gopath/src/github.com/nats-io/nats/nats.go:1745 +0x7a
github.com/nats-io/nats-streaming-server/server.(*StanServer).initSubscriptions()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:913 +0xb17
github.com/nats-io/nats-streaming-server/server.RunServerWithOpts()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server.go:575 +0x106b
github.com/nats-io/nats-streaming-server/server.testStalledRedelivery()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server_test.go:1065 +0x1713
github.com/nats-io/nats-streaming-server/server.TestStalledQueueRedelivery()
/home/travis/gopath/src/github.com/nats-io/nats-streaming-server/server/server_test.go:1082 +0x43
testing.tRunner()
/home/travis/.gimme/versions/go1.6.2.linux.amd64/src/testing/testing.go:473 +0xdc
Goroutine 24 (running) created at:
time.goFunc()
/home/travis/.gimme/versions/go1.6.2.linux.amd64/src/time/sleep.go:129 +0x6d
==================
Limits are checked per file (out of the 5) but not based on the total for this store.
Some additional tuning parameters need to be added:
A configuration files passed via the -config command line is not parsed, and should be in order to support all of the NATS server options. This will likely be required for multi-user authorization/authentication.
From @mcqueary on March 15, 2016 19:14
See here and here in server.go
.
As a user, if my SubscriptionRequest
specifies a time 100 seconds in the past, but the earliest stored message was 80 seconds in the past, I get stan: invalid start time
. IMO that's broken, conceptually.
Think of how you'd use a "since" date when qualifying Google search results (i.e. show me hits since this date). If I entered 1/1/2016, but the first result was from 1/3/2016, I'd expect to get everything starting from 1/3/2016, and not an error indicating that my "since" date was wrong. The same applies here: if my start time is before the earliest message, just start with the earliest message!
It's tempting to say "well, just use StartPosition.First instead", but I as a user actually require data since exactly (at most) 100 seconds ago, and I have no idea when "First" will start (3 days, 2 hours, or 5 seconds).
I propose that if start time (or sequence) is earlier than the first message, that we start with the first message, and likewise that if the start time (or sequence) is later than the last message, we set the start time/sequence to 1ns or 1 seqno after the last message (i.e. default to StartPosition.NewOnly
).
Copied from original issue: nats-io/stan.go#24
When a server with FileStore is restarted, and there are existing subscribers, messages in the log should be delivered automatically. Right now, new message need to be sent to trigger delivery.
If a subscriber hits the MaxInFlight limit with all un-acknowledged messages, the server attempts but does not send un-acknowledged messages. This may prevent an application that missed messages (as opposed to just delaying the acknowledgements) to receive them again.
Because of the same issue, a durable would never receive old/new messages after a server restart if the number of pending acks is >= MaxInFlight.
Have a durable running with auto-ack and receive new-only.
Send messages 1 and 2.
Restart durable.
Send messages 3 and 4.
Restart durable => it receives messages 3 and 4 as redelivered (EXPECTED: nothing)
Stop durable.
Restart server.
Start durable => it receives messages 3, 4, 1, 2, 3, 4 (EXPECTED: nothing)
Support user authentication - some basic authentication can be added today (user/pass).
The subscription store in the file store implementation overtime will contain information no longer needed. The file should be compacted automatically.
clientID should be command-line, URL, filesystem, and JSON-friendly (i.e. not containing special characters that might interfere with using or parsing a clientID in any of these)
If the store contains no messages, cs.msgs.FirstMsg()
and cs.msgs.LastMsg()
return nil
, ergo any subscription containing a time or time delta will crash the server:
https://github.com/nats-io/stan-server/blob/master/server/server.go#L1296-L1304 (edited for line number change)
This is because of tracing, which should be moved in performRedelivery
so that sub
fields are accessed under lock. At the same time, I'd like to revisit performDurableRedelivery
so that it has its own delivery and not use common code in performRedelivery
which deals with potentially expiration and queue subscribers. Although it would cause some code duplication, it will be much clearer.
==================
WARNING: DATA RACE
Read by goroutine 48:
runtime.convT2E()
/usr/local/go/src/runtime/iface.go:128 +0x0
github.com/nats-io/stan-server/server.(*StanServer).performAckExpirationRedelivery()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:825 +0x95
github.com/nats-io/stan-server/server.(*StanServer).sendMsgToSub.func1()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:975 +0x36
Previous write by goroutine 25:
github.com/nats-io/stan-server/server.(*subStore).Remove()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:233 +0x80
github.com/nats-io/stan-server/server.(*StanServer).processUnSubscribeRequest()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:1106 +0xb43
github.com/nats-io/stan-server/server.(*StanServer).(github.com/nats-io/stan-server/server.processUnSubscribeRequest)-fm()
/Users/ivan/dev/go/src/github.com/nats-io/stan-server/server/server.go:560 +0x37
github.com/nats-io/nats.(*Conn).waitForMsgs()
/Users/ivan/dev/go/src/github.com/nats-io/nats/nats.go:1355 +0x3aa
If a durable starts, then stops, then the server is restarted, the durable will no longer receive messages.
This is introduced by #58.
When a server is restarted, it gets new subjects for pub/sub/etc... which makes an already "connected" STAN application "fail" publishing new messages since it uses the old subjects, so those messages go nowhere.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.