linkedin / ambry Goto Github PK
View Code? Open in Web Editor NEWDistributed object store
Home Page: https://github.com/linkedin/ambry/wiki
License: Apache License 2.0
Distributed object store
Home Page: https://github.com/linkedin/ambry/wiki
License: Apache License 2.0
Defunct blobs are
This will need some thinking through and a write-up before implementation.
The current constructor of ByteBufferReadableStreamChannel
takes in a single ByteBuffer
.
public ByteBufferReadableStreamChannel(ByteBuffer buffer);
This functionality can be expanded by adding another constructor that takes in a List
of ByteBuffer
.
public ByteBufferReadableStreamChannel(List<ByteBuffer> buffers);
(The first constructor would call into the second constructor to avoid duplicate code).
This would help support more use cases.
Zopkio is a well tested distributed testing framework and would be ideal for writing integration tests. We should slowly start moving our integration tests into that. A good first step would be to setup Zopkio and write a very simple test -
Once this works, we can opportunistically move existing test cases to that.
More information about Zopkio can be found here - https://github.com/linkedin/Zopkio
In SimpleOperationTracker
, if replicas.size() is less than successTarget, an IllegalArgumentException
will be thrown. This exception however will eventually close the router, but this exception should not be at that serious level. Reporting this issue to seek a reasonable solution.
ResponseHandler
class which is responsible for failure detection may be refactored to make it a bit more intuitive without changing the core implementation (which works just fine). We should do this after all the router changes are in.
features request:
When getting chunks of a composite blob, if the metadata chunk is successfully retrieved (not deleted or expired), then the data chunks should be successfully retrieved no matter what - if not, it is an internal server error. That is, the router should never return Blob_Deleted
or Blob_Expired
for a data chunk.
However, the following can occur:
Blob_Expired
error.Blob_Deleted
error.Although ambry is eventually consistent, if the metadata chunk did not encounter an error, then the data chunks shouldn't either. In a way, we are talking about atomicity here (get all chunks or none at all).
Gopal and I discussed this and we think the right approach here is to include deleted and expired GetOptions
when data chunks are fetched for a composite blob. Because if the GetManager
is at a point where it initiates requests for data chunks, that means it has already fetched the metadata chunk successfully.
The coverage of FrontendIntegrationTest
can be improved.
(more can be added)
Ideally, FrontendIntegrationTest
in isolation should cover as much code as possible in ambry-rest and ambry-frontend (internal server errors probably cannot be tested since the frontend is expected to be treated like a black box).
Each of the enhancements can be converted into their own issues with more details. This one covers them at a high level.
selectorActiveConnections
for one, is never registered. Might as well look at the rest of the metrics in NetworkMetrics
.
Currently, for connections over SSL, the ambry selector returns an established connection after a poll even before the handshake is completed. This is not very useful as the caller will then have to repeatedly check outside of the selector poll on whether the handshake is complete, which is not ideal.
Instead, let the selector do the work as part of the poll. The selector should only return a connection after the connection is established and the handshake is complete. If the handshake does not complete for any reason, then it should be treated as a connection establishment failure. This way the callers of the selector can be agnostic to SSL, and can interact with the selector in a completely non-blocking way.
If someone is looking to migrate to Ambry after trying out other options, they might have some files on-premise or some other object store. So, definitely they might be looking to migrate those to ambry in a simpler manner. Would be great if we have a migrator service.
Some requirements for a migrator service:
Add more unit tests for GetBlobOperation
, and possibly GetManager
. The coverage for these are already greater than 90%, but there are a few scenarios that we could simulate to increase confidence. Many of these are marked as todos in GetBlobOperationTest.java.
If a connection is explicitly closed using the close(connid)
method, then the closed connection immediately gets added to disconnected
list. This list is cleared at the start of every poll()
. This means that an explicitly closed connection will not be returned as part of disconnections after a poll. This is unintuitive and incorrect.
Instead, the selector should add these disconnections to a temporary list and use it to populate the disconnected
list during a poll. Basically any additions to connected
, disconnected
, completedSends
and completedReceives
should happen as part of a poll.
Hi,
From what I've read in the wiki, Ambry zeroes-out deleted blobs on the disk after a deletion occurs, but it is unclear to me what happens with the deleted blobs. The wiki mentions that Compaction isn't currently supported. Does Ambry currently do anything with the deleted blob space, or is that deleted space simply not available after a delete happens? Based on my testing, it appears that the log size continues to grow regardless if there is deleted space.
Thanks again,
Mike
Hi,
Does Ambry allow you to use custom blob IDs, or does the REST API let you look for a blob that has a specific metadata value?
I'm thinking about a use case where a client would want to check the server to see if a blob with a particular hash exists before POSTing a blob that is identical to one that is already stored in Ambry.
Thanks,
Mike
Selector has a sendInFlight
metrics in NetworkMetrics
. This metrics is only dec when a normal write is completed, missing the places where exceptions happen.
close() in Transmission throws IOException, whereas both the implementation (plaintext and ssl) doesn't throw them. This patch is to revisit them and ensure they do throw IOExceptions instead of catching them. Caller might have do something about it and let not the transmission suppress them.
Integration tests take longer and slow down builds. This is because the gradle test task also runs as part of the gradle build task.
Investigate if the unit tests alone could be run as part of build, and the integration tests could be moved out to another task, and fix if that is possible.
The API public Future<Void> processRequest(RestRequest restRequest, Callback<Void> callback);
could be modified to take in an optional BlobInfo
object (either as a new API or a modification to the current API) to avoid generating BlobInfo
again in case it is required (this usually happens when certifying POST requests).
Instead of having a sleep time of 1 or 2 ms, let the selector sleep for a longer time (shorter than request timeouts), say 300 ms like in the server. If activities in other threads (ChunkFiller or callback invoker threads) make the operation managers eligible to create more requests, then simply wake up the selector so that the RequestResponseHandler thread inside which the selector sleeps gets to work and poll the managers.
This way the thread works deterministically and busy looping is avoided without affecting operation latency.
ReadableStreamChannel
contains two APIs that do not currently have the general utility that ReadableStreamChannel
itself is meant for.
public void setDigestAlgorithm(String digestAlgorithm)
throws NoSuchAlgorithmException;
public byte[] getDigest();
For now (based on their current utility), these APIs belong in RestRequest
and should be moved there.
The operation manager poll can throw runtime exceptions, even due to faults in components outside of its own control (for example, due to faults in caller provided callbacks and channels). Currently any unhandled exception can shut down the RequestResponseHandler
.
A similar issue exists with exceptions during chunk filling.
We don't need to overcomplicate things, but at the very least, the operation managers could catch these exceptions and continue by removing these faulty operations from its list, so that any such errors are isolated to only the operation that caused it.
Chunk Filler currently goes to sleep for a predetermined duration (currently 10 ms or so) if no work is done in an iteration. This can increase latencies of incoming put operations if the sleep starts just before data becomes ready to be filled.
When the sleep was removed, we saw considerable improvement in the tests.
To avoid busy looping, particularly at light put operation loads, add a wait-notify mechanism to deterministically wake up the chunk filler thread when data becomes available for filling.
The new Blob formats introduced to support chunked blobs are currently disabled in code as we wanted to verify that the whole flow works and be open for any changes until all operations are implemented. Now that all the operations are implemented and merged, and the whole end to end flow for PutBlob, GetBlob, GetBlobInfo and DeleteBlob for simple, composite and legacy blobs have been tested and verified for correctness even for very large blobs (tested for up to 12G), the new format can be enabled.
This requires the following:
PutMessageFormatInputStream
to use Blob_Format V2.We have two time out for a request. One at the networkClient layer where in we timeout incase we couldnt get a connection. Another one is at the OperationManager layer where in we timeout if we dont get a response after sometime(in other words, we haven't got the response in time even though the request was sent out). In later cases, the operationManager don't inform the NetworkClient regarding the timed outones. NetworkClient might be holding some unnecessary data for those requests even though its a no op. So, creating this ticket to do those cleanups.
Creating this issue to explore the possibility of not creating a copy of the data while creating chunks that need to be put (at the NonBlockingRouter
).
Currently, creating put chunks involves copying data from the buffer provided from the ReadableStreamChannel
(the chunks themselves are implementations of AsyncWritableChannel
). Since ReadableStreamChannel
implementations are not allowed to overwrite/reuse buffers until they receive callbacks from the AsyncWritableChannel
, it might be possible to avoid the copy.
The main problem will be the fact that the chunk size enforced at the NonBlockingRouter
will not be equal to the sizes of the individual buffers received. This means that
P.1) Composite byte buffers will have to be created.
P.2) The modules downstream will have to understand these composite byte buffers (i.e. network modules).
P.3) There will be many boundary conditions (for e.g. a source buffer's data might be across chunk boundaries).
It will be useful to evaluate if this is possible (without introducing too much complexity) since avoiding the copy might translate into better performance.
Minor fixes to quick start doc
Make same changes in README.
Compaction on a high level can be designed with or w/o log segments. W/o log segments mean, a replica consisting of one big log file. W/ log segments mean, a replica consisting of N number of log segments.
Before diving deep into any other system for compaction design, on a high level we need to decide if need log segments or not. If not for this, we can't go ahead with the design. So, first step is the document the pros and cons for designing compaction w/ and w/o log segments.
We also need to consider what is the pros and cons of having log segments for ambry (store) in general and not just from compaction perspective.
AC:
Publish a document that speaks about pros and cons of having log segments vs not.
Get feedback from others
Narrow down to either one of the them
Creating this issue to start an initial discussion on and list places where the ByteBufferPool
can be used. Some of the tasks will require issues of their own.
P.1) Improve the current implementation of ByteBufferPool
(SimpleByteBufferPool
) or create a new implementation that actually pools buffers. This will require design documentation and discussions.
P.2) Use the ByteBufferPool
(this is independent of P.1) in the NonBlockingRouter
while filling chunks and also in BoundedByteBufferReceive
. Using it in BoundedByteBufferReceive
might require a bit of effort but it has to be done for pooling to be correct and effective since GETs are expected to be a large chunk of traffic in any use-case involving Ambry.
(Edits by @vgkholla to delete tests that are no longer relevant)
The following tests were failing intermittently or hanging and were commented out in PR #294. This ticket tracks their fixes.
In BlockingChannelConnectionPoolTest:
com.github.ambry.network.BlockingChannelConnectionPoolTest > testBlockingChannelInfoForPlainText FAILED
java.lang.AssertionError at BlockingChannelConnectionPoolTest.java:242
com.github.ambry.network.BlockingChannelConnectionPoolTest > testSSLBlockingChannelConnectionPool FAILED
java.lang.Exception at BlockingChannelConnectionPoolTest.java:396
In SimpleByteBufferPoolTest:
one thread:
at java.lang.Thread.join(Thread.java:1354)
at com.github.ambry.utils.SimpleByteBufferPoolTest.testOneExpiredAnotherServed(SimpleByteBufferPoolTest.java:220)
another thread:
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
at com.github.ambry.utils.SimpleByteBufferPoolTest$BufferConsumer.run(SimpleByteBufferPoolTest.java:259)
NetworkClient
does not have any metrics currently. This task involves going over the class and adding metrics wherever applicable. Some of the metrics required are:
NetworkClient
's queue before it gets sent out to the Selector
.Selector
, if possible.Additionally, note that previously NetworkSend
was only used by the server to send responses. So the NetworkRequestMetrics
within it are populated at the server and only includes metrics from the time the request arrives to the time the response is sent (basically request queuing time, send time and total time). Also, NetworkRequestMetrics
assumes that NetworkSend
objects are always responses. That needs to change.
I suggest the following approach:
NetworkRequestMetrics
work. They are initialized with ServerMetrics
histograms and the fields get updated along the way.NetworkClient
and get rid of assumptions. This may involve renaming variables and methods or adding new variables.NetworkClient
initialize the NetworkSend
metrics it creates appropriately (currently it initializes it with null).It might also help taking a look at the BlockingChannel
and BlockingChannelPool
related metrics to see if there are anything relevant to pick up from there.
Acceptance Criteria:
NetworkSend
and NetworkRequestMetrics
are made agnostic of whether the Send
is a request or a response.NetworkClient
, which now creates NetworkSend
with a null NetworkRequestMetrics
because of the above reason, is initialized appropriately with NetworkClient
metrics.NetworkClient
metrics.Currently RestServiceErrorCode.UnsupportedHttpMethod
maps to a 400 status code. Instead it should map to 405 and should return the methods allowed (POST, GET, DELETE and HEAD).
Does ambry support NFS or HDFS for storage?
Initially SecurityService
was envisioned as a service that would provide only security related functionality. But it is possible to execute more general functionality (for e.g. more fine grained metrics tracking, special response headers, capacity management etc) as a part of the implementation of SecurityService
. It would be good to consider a more appropriate name for SecurityService
.
Once Router configurations are finalized, please add them to the wiki here.
The protocol (PutRequest
and GetRequest
) and MessageFormat deal with InputStream
today. InputStream
as an interface allows for blocking reads. That is the co-ordinator way of doing things. The Router deals with ByteBuffers
as everything is non-blocking and goes out of its way today to create an InputStream
today to deal with the protocol, or read from an InputStream
into a ByteBuffer
when dealing with MessageFormat.
We should get to this once the coordinator is replaced (or we will have to make the change in the coordinator as well).
EDIT: Or ReadableByteChannel
may be more appropriate. Since ByteBuffer
has to allow for seeks and rewinds it might involve additional copies. ReadableByteChannel
is more in line with InputStream
(but non-blocking by definition).
Just noticed during some testing that the ambry selector can complete receive on a connection id when the send is not complete. A quick look at the code indicates that the interest ops are only set to READ when the send is completed, so not sure how this happens. This seems like a bug and needs to be investigated and fixed.
This happened when there were timeout errors and such happening, but interestingly, did not notice any error on the said connection (if not it would not have been able to complete the receive, I think)
Will try to add more details.
2016/06/27 17:08:14.590 INFO [NetworkClient] [RequestResponseHandlerThread-1] [ambry-frontend-nb] [] ***Connection checkout succeeded for lva
1-app2039.stg.linkedin.com:Port[15088:PLAINTEXT] with connectionId 0.0.0.0:-1-10.136.152.27:15088_9
2016/06/27 17:08:14.710 INFO [NetworkClient] [RequestResponseHandlerThread-1] [ambry-frontend-nb] [] ***Receive completed for connectionId 0.0.0.0:-1-10.136.152.27:15088_9 and checking in the connection back to connection tracker
...
2016/06/27 17:08:16.591 INFO [NetworkClient] [RequestResponseHandlerThread-1] [ambry-frontend-nb] [] ***Connection checkout succeeded for lva1-app2039.stg.linkedin.com:Port[15088:PLAINTEXT] with connectionId 0.0.0.0:-1-10.136.152.27:15088_9
2016/06/27 17:08:16.591 INFO [Selector] [RequestResponseHandlerThread-1] [ambry-frontend-nb] [] ***Setting NetworkSend threw an exception for connection id: 0.0.0.0:-1-10.136.152.27:15088_9
2016/06/27 17:08:16.592 ERROR [NonBlockingRouter] [RequestResponseHandlerThread-1] [ambry-frontend-nb] [] Aborting, as requestResponseHandlerThread received an unexpected error:
java.lang.IllegalStateException: Attempt to begin a networkSend operation with prior networkSend operation still in progress.
at com.github.ambry.network.Transmission.setNetworkSend(Transmission.java:65)
Hi - When I try to set "netty.server.so.backlog" to a higher number in the frontend.properties config file, I get this warning:
[2016-06-14 15:07:35,502] WARN Property netty.server.so.backlog is not valid (com.github.ambry.config.VerifiableProperties)
I believe it is because this line has a typo -- "sobacklog" instead of "so.backlog":
With support for large blobs, POST requests can be very large and data can arrive at a rate faster than the frontend can process.
To avoid memory pressure, data should be read from the channel only when frontend is ready to receive more. Netty provides support for on-demand reading from the channel via the auto-read feature (docs).
This support can be built into NettyRequest
. At a high level, when a POST is detected, a pre-configured number of chunks are loaded and auto-read is set to false
. Channel#read()
calls are triggered when callbacks arrive from the AsyncWritableChannel
(because "empty" slots become available). Once the request is complete, auto-read is set to true
again.
To support all of this, the constructor of NettyRequest
(and its children) has to be changed to receive a Channel
and the tests have to be changed accordingly. The production code should be simple enough to change but the test code might take longer.
Acceptance Criteria:
NettyRequest
to apply backpressure and read from channel on-demand so that there is no memory pressure in case the Router
needs time.During a get, once the content is deserialized, MessageFormat
returns an InputStream
. In order to convert this to a ByteBuffer
, the router has to do another copy today.
Instead, have MessageFormat
return a ByteBufferInputStream
(which is what it returns today anyway). And once #320 goes in, use the ByteBufferInputStream#getByteBuffer()
to get a ByteBuffer
out of the returned stream from MessageFormat
.
See context here: https://github.com/linkedin/ambry/pull/308/files#r63651491
Hi,
Thanks for releasing this awesome code.
I just wanted to know whether it was possible (or totally unnecessary and missing the point?) to combine LinkedIn's Ambry with something like Amazon S3.
Or in other words, the use of Ambry on top of Amazon cloud services and whether this makes sense or not?
Thanks
Andrew
Should add metrics to get insight into:
PutRequest allocates a buffer of header/metadata size (entire size - blob content size) and then re-uses the same buffer to send the actual blob content as well. This is causing some performance problems and Ming also saw a significant improvement in his perf test results locally.
Fix suggested:
Ideally we were thinking of making a fix to have a buffer size same as socket buffer size. But then, we are making 3 copies of the data as of now in three different places. 1: At the coordinator or router layer. 2: At the PutRequest layer just before writing the data to the channel/network. 3: For ssl connections, we need to encrypt the data and hence one more copy in SSLTransmission.
So, we could change the signature of PutRequest to take in ByteBuffer rather than InputStream and hence avoid the 2nd copy altogether. But this fix has to wait until coordinator is phased out.
In non-chunked responses (i.e. responses in which Content-Length has been set), there is the possibility of a race condition b/w the front end and the client.
The client has three events of interest:
The front end (using NettyResponseChannel
) has two events of interest:
LastHttpContent
to the channel.LastHttpContent
is a Netty concept to indicate end of stream but clients detect end of stream by using the Content-Length (in case of non-chunked responses). This creates a race condition b/w Step 3 at the client and Step 2 at the client. Due to this, we sometimes get a ClosedChannelException
and it is safe to ignore this particular ClosedChannelException
and not print it in the logs.
The challenge would be to ensure that only this case is not printed in the logs but any other valid ClosedChannelException
s are.
ClosedChannelStackTrace.txt
All the tests for the router that we have currently are unit tests that mocks the selector and the server. This masks any issues related to integration such as #297. It will be good to prioritize on creating these tests to give us more confidence on the router and any incremental changes we make.
Currently, much of the code in ambry-server, ambry-store and amby-replication is being tested via integration tests in ambry-server. Unit testing coverage has to be improved in these packages so that failures are caught faster, bugs are easy to isolate and debugging is easier.
Doing this will require understanding of each of these components. A starting point would be running the unit tests in each of these packages, checking the coverage, understanding the classes that need more coverage and writing appropriate tests. Aim to cover all lines as well as cases.
The quick start given in wiki as of now covers plain text interactions. But Ambry has the capability to support ssl interactions as well. Few more steps are required apart from referring to ssl hardware layouts and setting some ssl properties. To be precise, steps required to generate local certificates, key store and trust store that could be used for ssl interactions aren't documented yet. Creating this ticket for the same.
Action Items:
Documents steps required to generate local certificates, key store and trust store.
Modify frontend.ssl.properties and server.ssl.properties to set ssl properties accordingly.
Deploy server and frontend and ensure they get deployed successfully w/o any issues
Perform PUTs, GETs and DELETEs (via SSL)
In the Introduction to Ambry blog article, it talks about adding authentication and security support to the REST API. Has any support been added for this already, or is still on the todo list?
Thanks,
Mike
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.