dicl / fast_fair Goto Github PK
View Code? Open in Web Editor NEWEndurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree
License: GNU General Public License v3.0
Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree
License: GNU General Public License v3.0
Current implementation of range scan can read duplicated keys repeatedly in multi-threads environment and can return value buffer, unsorted by keys. The node scan mechanism of FAST_FAIR proposed in FAST'18 paper changes node scan direction only when insertion and deletion operations are switched each other during scanning. It is understood that this mechanism can correctly address exact matching query, but not for range scan. Please refer to below example.
"v" is scanner's pointer
Insertion 5, 6 Switch counter = 2 Switch counter = 2 Switch counter = 2 Scan buffer (10) Scan buffer (10, 6) Scan buffer (10, 6, 10) v v v (10, 20, 30, 40, 50) -----> (5, 6, 10, 20, 30, 40, 50) -----> (5, 6, 10, 20, 30, 40, 50)
In current implementation, switch counter is only changed with odd or even number depending on whether insertion and deletion operations are switched or not. Our proposed solution is to constantly increase switch counter whenever insertion and deletion operations perform. If applying this solution, range scanner thread can realize that concurrent insertion or deletion is working in the same node, and then restart scanning with rolling back the increased buffer's index number.
Hello, I would like to ask if you can open source WORT and FPTree source code, if you can, thank you very much~~~~~~QAQ
We found FAIR algorithm proposed in FAST'18 has a problem in handling node splits to be concurrent and crash-recoverable. Original FAIR algorithm can make an incorrect tree structure due to internal node split and key deletion. Please refer to our pull request for more detail ( #4 ).
Hi @deukyeon ,
Recently, we were working on a test framework to identify bugs in frameworks that use persistent memory. We used our tool to check FAST_FAIR benchmark and we were able to find some bugs in them. More descriptions about these bugs and the corresponding fixes are listed below. We fixed these bugs in our local repository, but I thought maybe you are interested in fixing them in your repository as well. Thank you for sharing your work with the community. It helped us a lot to develop and test our tool. Feel free to reach out and ask us any questions about the fixes.
Missing flush in btree constructor(btree.h:824 in constructor of btree() ):
clflush((char*)root, sizeof(page));
Missing flush in btree constructor (btree.h:825 in the constructor of btree() ):
clflush((char*)this, sizeof(btree), false, true);
Current implementation for insertion operation cannot correctly handle crash-inconsistent data after crash occurs in the middle of node split. Below algorithm shows the current insertion implementation provided by this repository.
The key problem is that this insertion algorithm does not involve any recovery mechanism even if writer thread finds that the smallest key of next sibling node is larger than new key. If we use this algorithm as-is, the following problem can occur after system crash occurs in the middle of node split.
1. Initial tree has only one leaf node (1, 5, 9, 15) where the maximum number of entries is four. 2. Writer thread tries to split first to insert new key 20. 3. (1, 5, 9, 15) is split to (1, 5) --> (9, 15) 4. **System crash** occurs before allocating new root and inserting middle key into it. **System resume** 5. Following writers insert new keys 20, 30, and 40. When inserting 40, node split is conducted, and then allocating new root node. 6. After previous insertions, the final tree structure is formed like below. (20) / \ (1, 5)-->(9, 15)-->(20, 30, 40) 7. Reader thread tries to search key 5, but it will return failure even if key 5 is valid in this tree structure.
The lazy recovery mechanism, originally proposed in FAST'18 paper, performs recovery whenever a thread finds the smallest key in the right side node is bigger than the requested key. It seems the current implementation choice is because of performance. However, it would be expected that performance become somewhat different if the implementation is changed to correctly address lazy recovery mechanism.
Can you please point, where do you make old node destruction, if split or merge occurs?
Want to use your source.
Hi All,
For PMDK environment when I first run the pool initialization and data generation everything is ok. When you turn off the program and run again, read data from the pool that already has data, it will be dumped by the system with the error "Segmentation fault (core dumped)".
Thank you very much.
According to lines the btree_search() function member:
page *t;
while((t = (page *)p->linear_search(key)) == p->hdr.sibling_ptr) {
p = t;
if(!p) {
break;
}
}
if(!t || (char *)t != (char *)key) {
printf("NOT FOUND %lu, t = %x\n", key, t);
return false;
}
linear_search() searchs a key in the leafnode and returns the corresponding value. Then the tree_search function compares the key with the found value in the leafnode, if doesn't match, takes it as not found and output a message. It is NOT right if the key and its corresponding value are not equivalent.
It has a few extra bugs with some special workloads: fastfair can not support zero value; it may run into fault when two consecutive Records has the same value;
Hi all,
Thank you for sharing your work with the community. Recently, we find that the current implementation of lock-free inner node search may reach the incorrect child pointer due to the read-write conflict.
Consider the example below, concurrent insert and search execute in the following order. The search operation may ignore the key of 10 in (1) because its left and right pointers are the same due to an on-going shifting. Then, it might load the next key of 10 after the insert operation has been finished and return ptr in the end. However, p1 is what the search operation expects to return. The incorrect result is caused by the state change between two load operations.
Search(3): (1) skip 10 (2) return ptr
(p1=p1) (3<10)
v v
Insert(5,ptr): (p1,10,p2) --> (p1,10,p2,10,p2) --> (p1,10,p1,10,p2) --> (p1,5,ptr,10,p2)
A possible solution is to maintain a "low_key" field in each inner node. Do you have any better solution towards this problem.
The implement of B+Tree help us get a better understanding of the paper "Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree". And thank your lab for your contribution to NVM research. Does your lab have any plan to open source other implements of your papers, such as "WOART ". Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.