dream-lab / elfstore Goto Github PK
View Code? Open in Web Editor NEWElfStore: Edge local federated Store
License: Apache License 2.0
ElfStore: Edge local federated Store
License: Apache License 2.0
For GetBlock
, as part of the FindBlock
, in the current implementation a fog on receiving the request checks its local data structure to see if it contains the block locally and adds itself to the list of candidate fogs. It then looks into the bloom filters of its neighbours and on possibility of containing the block, it makes an api call to the neighbour to check whether the neighbour actually contains it or not. These neighbours are also added to the list. Similar procedure is done for the buddies with one change, the buddy also contacts its own neighbours as well to find the suitable replica fogs. However this leads to a very high number of network calls which can be reduced by using the successful response from bloom filters as an indicator of possible containment of the block.
This can be done using only the contacted fog's local data and the bloom filters of its neighbours. If the bloom filter of a neighbour indicates that the block may be contained by the fog, then it will be returned as a candidate fog to the client. The client then uses this list and tries to read a block from them. In case the block is not found among these fogs, then the block is not present on the fog and its neighbors and thus the buddies should be contacted to get the data. A similar bloom filter handling can be done for the buddies as well. Also a force flag can be passed which when unset will do the efficient version mentioned above while a set flag uses the current implementation to make sure proper replicas are returned as part of FindBlock
.
The frequent exchange of bloom filters requires a lot of network exchange between the Fog nodes in the system. The use of compressed bloom filters and spectral bloom filters is one area to look into if the regular bloom filter exchange becomes an overhead.
Have a (python based?) shell to perform add edge, create stream, append file, put block (with and w/o lock), update block, get file, get block, list stream, list block, etc.
lot of consequences
Use this to update end time for last block, last seq no, etc.
Need to couple this with a session to write a series of blocks.
The current implementation finds the replica locations and these locations are then written to by the client. The order we took is : first write the local replica then write the remote ones. However in case if there are fog failures or the edge failures, then the write may not succeed and retry is needed to achieve the expected reliability. To meet the desiderata in terms of number of replicas and expected reliability, one order that can be taken is write the remote replicas first and lastly write the local replica.
Does the client set this seq no (assuming single put client per stream)?
Does the system set this seq no (assuming concurrent client)? Difficult!
We could have the notion of a session with single client at a time?
This will support putnext and getnext ops.
Right now, there is a priority across multiple edges that fail, but for a single edge failure, all blocks have same priority.
When an edge fails, can we recover blocks which have much lower reliability than others? How do we find the current reliability of the blocks?
We wish to implement the following features in the future:
Need proper exception handling on both the server and client side. The server side (FogServer and EdgeServer) has exception handling added which should be revisited while the client side lacks the exception handling currently.
It might happen that replica identification returns a set of fogs but some of the fogs are not able to serve the write request due to several reasons (edge failures, edge storage below the disk watermark, etc). In such a case, during the retry phase, the api for replica identification must support a list of blacklisted fogs which will not be picked for writing again and will only search through the rest of the fogs.
In the current implementation, an edge device is removed from total usage in terms of read() and put() by its managing Fog if it misses a certain number of heartbeats. However this heartbeat misses may be due to the presence of network failures in which case the edge device may come back once the network failure is healed. What should be the proper behaviour of the Fog then in terms of allowing or discarding this edge device once it comes back?
Currently we are assuming edge reliability to be a fixed value. However that is rarely the case as the reliability of the edge device changes with usage over time. We need to accomodate the changing reliability of the edge device in the future implementation.
During the running of ElfStore on D20 setup, some issues were observed wherein some of the quadrants had negative values for allocations. This was caused due to the usage of Math.ceil()
instead of Math.floor()
. The bug fixes were done, we need to add more comprehensive test cases to address the corresponding bug fixes.
related to #22
Currently a Fog maintains all its state in memory. As a result , if a Fog restart is done, all this state is lost which might require all its edge devices sending their data+metadata information along with the neighbors and buddies doing their part. Instead provide a facility wherein the Fog can checkpoint its state and can be restarted back with all the metadata preserved.
Helps retrieve new data by a subscriber, pub-sub
With simultaneous edge failures (2 to be precise) on a D20 setup, the blocks of neither of the two failing edge devices were recovered successfully.
One issue that can be causing failure in recovery might be a lack of implementation of java.util.concurrent.RejectedExecutionHandler
which is a handler for tasks that cannot be executed by a ThreadPoolExecutor when there are no more threads or space left in the queue for holding the tasks. In such as case, the implementation of this interface may spin waiting to add the lost block into the recovery queue till it succeeds.
During replica identification for writing blocks, the current api takes the edge related information such as ip , port to make the Fog aware of it being an edge device. Instead pass a boolean flag to handle it.
Buddy creates stats for its neighbors and itself. Passes intermediate stats to its buddy pool. O(mn + mb) messages instead of O(m) messages (m: # of peers, b: buddy pool size - 1; n: neighbor count).
This will require 1-level indirection to the buddy to decide the actual fog to hold replica. It will also make global stats more appox.
Currently the StreamMetadata
object doesn't contain the mapping of the blockId with its MD5 checksum. This mapping is only maintained at the fog where the stream was registered. As a result, when a client or a fog uses getStreamMetadata()
, the object returned doesn't contain this mapping. However this mapping should not be returned to the client directly while it is useful for the fog devices (recovery is one of the possible scenarios where this is useful when a fog device dies). So this mapping should be added to the StreamMetadata
object as a dynamic property.
How do we update the bloom filter and metadata index when an edge dies?
Maintain one bloom filter per edge. Construct parent fog bloom filter as OR of all of these.
Loss of an edge requires just local bloom recomputation.
Ensure that different replicas are placed on different buddy's or their neighbors. [This Allows the block to survive loss of a single buddy (permanent) and its neighbors (transient).]
Related to loss and recovery of buddy, #22
This new feature allow updates for the data using FindBlock
setting the force flag to true (details at #39). All replicas are fetched and their content is updated. The block should also maintain block version number as an internal property.
how does this relate to blockchain?
When an edge dies and is removed from the fog/its index, and it rejoins later with replicas, should we include its blocks (after MD5 verification) to available replicas? May cause excess reliability? Handle it using #13 ? What if it joins quickly and re-replication is ongoing?
Currently the system supports searching data blocks via the read() api which takes the blockId as the query parameter. This issue extends the search functionality using the metadata parameters of the data blocks apart from the blockId.
Each fog where we place replica should return the exact reliability of the edge where the replica is being put on. This will help client calculate exact reliability of the block.
Question: what do we do with this info? Can we "drop" the last replica blocks if required reliability is achieved?
It has been observed with both D20 and D272 setup that as edge devices fail incrementally, the complete recovery of blocks succeeds only for a few (2 in both setups) failures and for the following failures, all the blocks are not recovered properly. Ideally the system should be able to handle failures to the point when there is enough storage available to replicate the lost blocks (link this issue with replica identification to make sure the storage assumption holds true).
As a part of verification for the correctness of the replica identification algorithm, print the following things which enables easy verification.
The current implementation finds the replica locations and these locations are then written to by the client. The order we took is : first write the local replica then write the remote ones. However in case if there are fog failures, then the write may not succeed and retry is needed to achieve the expected reliability.
Talk to parent fog to get alternative in case fog failure?
Related to general concept of handling fog failures/buddy failures, #22
The current implementation contains the bug in that the metadata has a last blockId maintained along with only a mapping of blockId and the MD5 checksum of it. Thus there is no way that blocks can be consumed in the order in which they were added to the stream thus not providing a sequential access pattern for the blocks. For the correction, a list of blocks can be used to fix it. Two cases need to be carefully looked into
For case 1, only one writer can append blocks at any time thus there is no race condition. However in case 2, multiple writers may be writing to the stream at the same time and there can be interleaving between the writers. It is not necessary that the last blockId should strictly increase and this interleaving is perfectly fine. So proper handling of race condition should be done i.e. atomic updates for the last blockId, list of blocks and mapping of blockId to the checksum as a whole should be done.
The past stress test runs show that even though storage is available on the edge devices, it is not used fully due to Fog devices unable to return heartbeat responses to the edge as well as its buddies and to whom it is a neighbor. This issue was seen when we were using TNonBlockingServer
which uses a single network I/O thread which is also used for handling the request itself. Since the current implementation uses TThreadedSelectorServer
which allows separate thread pools for network I/O as well as request handling, stress test reruns are required to validate the correct behaviour.
Will not change thew sequence number, but bytes will change. Optionally, block metadata.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.