dmemsys / smart Goto Github PK

This is the implementation repository of our OSDI'23 paper: SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory.

License: MIT License

CMake 1.05% Python 29.56% Shell 0.83% C++ 66.36% C 2.20%

disaggregated-memory rdma adaptive-radix-tree osdi

smart's Introduction

SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory

This is the implementation repository of our OSDI'23 paper: SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory. This artifact provides the source code of SMART and scripts to reproduce all the experiment results in our paper. SMART, a diSaggregated-meMory-friendly Adaptive Radix Tree, is the first radix tree for disaggregated memory with high performance.

SMART

Supported Platform

We strongly recommend you to run SMART using the r650 instances on CloudLab as the code has been thoroughly tested there. We haven't done any test in other hardware environment.

If you want to reproduce the results in the paper, 16 r650 machines are needed; otherwise, fewer machines (i.e., 3) is OK. Each r650 machine has two 36-core Intel Xeon CPUs, 256GB of DRAM, and one 100Gbps Mellanox ConnectX-6 IB RNIC. Each RNIC is connected to a 100Gbps Ethernet switch.

Create Cluster

You can follow the following steps to create an experimental cluster with 16 nodes on CloudLab:

Log into your own account.
Now you have logged into Cloublab console. If there are not 16 r650 machines available, please submit a reservation in advance via Experiments|-->Reserve Nodes.
Click Experiments|-->Create Experiment Profile. Upload ./script/cloudlab.profile provided in this repo. Input a file name (e.g., SMART) and click Create to generate the experiment profile for SMART.
Click Instantiate to create a 16-node cluster using the profile (This takes about 7 minutes).
Try logging into and check each node using the SSH commands provided in the List View on CloudLab. If you find some nodes have broken shells (which happens sometimes in CloudLab), you can reload them via List View|-->Reload Selected.

Source Code (Artifacts Available)

Now you can log into all the CloudLab nodes. Using the following command to clone this github repo in the home directory of all nodes:

git clone https://github.com/dmemsys/SMART.git

Environment Setup

You have to install the necessary dependencies in order to build SMART. Note that you should run the following steps on all nodes you have created.

Set bash as the default shell. And enter the SMART directory.
```
sudo su
chsh -s /bin/bash
cd SMART
```

Install Mellanox OFED.

# It doesn't matter to see "Failed to update Firmware"
# This takes about 8 minutes
sh ./script/installMLNX.sh

Resize disk partition.

Since the r650 nodes remain a large unallocated disk partition by default, you should resize the disk partition using the following command:

# It doesn't matter to see "Failed to remove partition" or "Failed to update system information"
sh ./script/resizePartition.sh
# This takes about 6 minutes
reboot
# After rebooting, log into all nodes again and execute:
sudo su
resize2fs /dev/sda1

Enter the SMART directory. Install libraries and tools.

cd SMART
# This takes about 3 minutes
sh ./script/installLibs.sh

YCSB Workloads

You should run the following steps on all nodes.

Download YCSB source code.

sudo su
cd SMART/ycsb
curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.11.0/ycsb-0.11.0.tar.gz
tar xfvz ycsb-0.11.0.tar.gz
mv ycsb-0.11.0 YCSB

Download the email dataset for string workloads.
```
gdown --id 1ZJcQOuFI7IpAG6ZBgXwhjEeKO1T7Alzp
```
We first generate a small set of YCSB workloads here for quick start.
```
# This takes about 2 minutes
sh generate_small_workloads.sh
```

Getting Started (Artifacts Functional)

HugePages setting.

sudo su
echo 36864 > /proc/sys/vm/nr_hugepages
ulimit -l unlimited

Return to the SMART root directory (./SMART) and execute the following commands on all nodes to compile SMART:
```
mkdir build; cd build; cmake ..; make -j
```
Execute the following command on one node to initialize the memcached:
```
/bin/bash ../script/restartMemc.sh
```
Execute the following command on all nodes to split the workloads:
```
python3 ../ycsb/split_workload.py <workload_name> <key_type> <CN_num> <client_num_per_CN>
```
- workload_name: the name of the workload to test (e.g., a / b / c / d / la).
- key_type: the type of key to test (i.e., randint / email).
- CN_num: the number of CNs.
- client_num_per_CN: the number of clients in each CN.
Example:
```
python3 ../ycsb/split_workload.py a randint 16 24
```
Execute the following command in all nodes to conduct a YCSB evaluation:
```
./ycsb_test <CN_num> <client_num_per_CN> <coro_num_per_client> <key_type> <workload_name>
```
- coro_num_per_client: the number of coroutine in each client (2 is recommended).
Example:
```
./ycsb_test 16 24 2 randint a
```
Results:
- Throughput: the throughput of SMART among all the cluster will be shown in the terminal of the first node (with 10 epoches by default).
- Latency: execute the following command in one node to calculate the latency results of the whole cluster:
```
python3 ../us_lat/cluster_latency.py <CN_num> <epoch_start> <epoch_num>
```
  Example:
```
python3 ../us_lat/cluster_latency.py 16 1 10
```

Reproduce All Experiment Results (Results Reproduced)

We provide code and scripts in ./exp folder for reproducing our experiments. For more details, see ./exp/README.md.

Paper

If you use SMART in your research, please cite our paper:

@inproceedings {smart2023,
  author = {Xuchuan Luo and Pengfei Zuo and Jiacheng Shen and Jiazhen Gu and Xin Wang and Michael R. Lyu and Yangfan Zhou},
  title = {{SMART}: A High-Performance Adaptive Radix Tree for Disaggregated Memory},
  booktitle = {17th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 23)},
  year = {2023},
  isbn = {978-1-939133-34-2},
  address = {Boston, MA},
  pages = {553--571},
  url = {https://www.usenix.org/conference/osdi23/presentation/luo},
  publisher = {{USENIX} Association},
  month = jul,
}

smart's People

Contributors

Stargazers

Watchers

Forkers

hust-anhang ssk015 hercules-liu liyutingxxn yux20000304 baotonglu xiaorz qiansuimingmingming spongeann wanghenshui lokiagg zhaohan2001 jxb2018 wanghanshuo1220 namjeongseok

smart's Issues

Put read delegation and write combination together

Hi, Xuchuan
I still don't understand using both read delegation and write combining mechanisms at the same time.

What is the meaning of using the same time window in the paper? My understanding from the code is to extend the write combining time window from the end of the leaf lock to after the leaf unlock, i.e., to include the full tree access process just like search. But how does this mechanism allow causally related write and read operations in two threads to be assigned to two different time windows? The paper points out that the read and write clients compete for the same local lock, and I don't think I've seen any code for that.

Finally, the code uses read/write_window, r/w_lock to keep track of the corresponding time window, and only opens a new window when read/write_window is 0 at the same time (i.e., sets the corresponding read/write_window). In my understanding, a lock node's read_window and write_window can have both read and write to a key occurring at the same time, so there is still a case where causality is not guaranteed. Can you explain in more detail the consistency guarantees provided?

Failed to get my node ID

When I change the parameter NET_DEV_NAME ，if I change it to the IB network interface like "ib0", it returns the wrong IPv4 address. And when I change it to the Ethernet interface like"eno1" ，the IP address is right but I get the wrong my_node_ID，like if my IP address is 192.168.0.167, "compute server 166 start up" show in the terminal instead of compute server 0. I wonder how I can correctly set this parameter.

How to implement Delete operation for SMART?

Hello, Tree.cpp doesn't provide the function "Deletion." How can I implement it?

I look forward to your early reply. Thanks a lot!

Settings of MN and CN

I want to change the settings of my MN and CN because there are not so many CPU cores on my machine. I wonder where the default setting of MN and CN is defined in the code。

How to set up the master node

I wonder how the master node is set up in the experiment. As you say, the IP address of a master node of the r650 cluster is the node which can directly establish SSH connections to other nodes. So how can I make a node that can connect to other nodes in Cloudlab. Is it OK if I just change the master_ip parameter in the code?

Thank you!

how to reproduce fig18.c?

Hello, SMART is a great job, and I am interested in it. Could you please tell me how to get sherman's result? I have tried to use "cmake -DSTATIC_MN_IP=on -DENABLE_CACHE=on -DLONG_TEST_EPOCH=off -DSHORT_TEST_EPOCH=off -DMIDDLE_TEST_EPOCH=off -DENABLE_CACHE_EVICTION=off -DON_CHIP=on .." to run sherman, but the result is similar to SMART. How can I get the sheman's result, as the fig18c shows?
Looking forward to your early reply. Thanks a lot!

concurrency issue

Hi, Xuchuan

I am working on addressing concurrency issue with the index cache while executing SMART's code. I am using a server with 18 threads, YCSB C, and an all-write workload. The error message indicates:
`==22691==ERROR: AddressSanitizer: heap-use-after-free on address 0x60400017e291 at pc 0x000000560d06 bp 0x7fb089ef7980 sp 0x7fb089ef7978
READ of size 8 at 0x60400017e291 thread T2
==22691==AddressSanitizer: while reporting a bug found another one. Ignoring.
#0 0x560d05 in GADD(GlobalAddress const&, int) /mnt/codes/SMART/include/GlobalAddress.h:40
#1 0x560d05 in Tree::insert(std::array<unsigned char, 8ul> const&, unsigned long, CoroContext*, int, bool, bool) /mnt/codes/SMART/src/Tree.cpp:131
#2 0x524ca4 in thread_run(int) /mnt/codes/SMART/test/ycsb_test.cpp:182
#3 0x7fb38ab209ff (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xd09ff)
#4 0x7fb38b4c36b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
#5 0x7fb389e6051c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10751c)

0x60400017e291 is located 1 bytes inside of 40-byte region [0x60400017e290,0x60400017e2b8)
freed by thread T13 here:
#0 0x4ea638 in operator delete(void*, unsigned long) ../../../../gcc-7.5.0/libsanitizer/asan/asan_new_delete.cc:140
#1 0x572bae in RadixCache::_safely_delete(CacheEntry*) /mnt/codes/SMART/src/RadixCache.cpp:357

previously allocated by thread T17 here:
#0 0x4e92b0 in operator new(unsigned long) ../../../../gcc-7.5.0/libsanitizer/asan/asan_new_delete.cc:80
#1 0x57e129 in RadixCache::add_to_cache(std::array<unsigned char, 8ul> const&, InternalPage const*, GlobalAddress const&) /mnt/codes/SMART/src/RadixCache.cpp:25
`

May I ask if you know why this is happening? How can it be fixed?

Thank you!

Error while conducting the YCSB test

I encounter an "Server 0 Counld't incr value and get ID: SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, retry..."when doing the ycsb test. I wonder how to solve this problem.

How to get uniform YCSB-D workload?

Hello, in the workload_spec file, using requestdistribution=latest to generate YCSB-D workload, and how to generate uniform YCSB-D? Because YCSB-D's requestdistribution=latest, how do you distinguish whether the YCSB-D is uniformly or zip-fianally distributed?

I want to run the project on cloudlab and can you help me?

Hello,I appreciate your work sincerely. Now I want to run the project on cloudlab and fail to start my project on cloudlab. So can you help me and I earnestly request your assistance in conducting experiments on CloudLab.

Question about the concurrent control method

Hi,

I found that some parts of your code utilize RDMA atomic operations and READ/WRITE on the same memory region. (e.g., The node_type in Header). However, it may harm the correctness of synchronization.
The Infiniband Architecture Specification 1.4 has pointed out in section 10.7.2.3 that it's unsafe to simultaneously use atomic and nonatomic operations on the same memory region.
I believe the reason is that "atomic" is only effective for the NIC execution but not for the flushing through PCIe.
Recent paper SIGMOD'23 Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMA and this QA https://lore.kernel.org/linux-rdma/20200512113512.GK4814@unreal/T/ confirms my point.

Thus, I want to ask: Is it a bug in your code? If it isn't, how did you further ensure the correctness of synchronization?

Wrong Node ID and IP address

When I run the YCSB test on my server, "compute server 65535 start up [0.0.0.0]" appears on the terminal, which is the wrong ID and wrong IP ，I wonder what could contribute to this error