Jylis is a distributed in-memory database for Conflict-free Replicated Data Types (CRDTs), built for speed, scalability, availability, and ease of use.
Visit the website for more information.
A distributed in-memory database for Conflict-free Replicated Data Types (CRDTs). :seedling: :left_right_arrow:
Home Page: https://jemc.github.io/jylis
License: Mozilla Public License 2.0
Jylis is a distributed in-memory database for Conflict-free Replicated Data Types (CRDTs), built for speed, scalability, availability, and ease of use.
Visit the website for more information.
After restarting a single-node Jylis server with disk persistence, GET
ting a large TLOG
key (>100k items) causes Jylis resource starvation, CPU goes to 100%, Jylis can't process other queries, and the executing query doesn't seem to return. This query took about 1000ms before the restart. A 100 item limited GET
succeeds, as does adding new items with INS
.
Jylis at rest after starting up and ingesting the TLOG
from disk:
Jylis after sending the TLOG GET
:
I tried easing up the item count and it had interesting results. I did this to try to keep a batch of items "hot" in memory to see if it would speed up the next query. 10k items worked fine. Then I bumped up the count to 20k and that was fine. I worked up to 80k items this way before Jylis became excessively unresponsive.
Then I restarted Jylis and the web app and tried a 45k item batch. It took 57 seconds to return from the database, which was close to the 60 second app timeout limit. I pulled the same 45k batch again and Jylis responded in 431ms. If I wait for several minutes with the processes still running, the long request happens again, followed by shorter requests after that. I'm not sure if this is intentionally being cached in Jylis or if it's a side effect of the garbage collector, but I thought I should point out the behavior.
Connection
, but other than that I'm calling the Hiredis read
and write
methods.Connection.current.write ["TLOG", "GET", "temperature"]
result = Connection.current.read
Using redis-cli
to execute TLOG GET temperature
doesn't experience the problem.
If the app server is stopped/reloaded/times out, the query continues to run in Jylis.
No other Jylis queries can be made while the problem query is running.
No other clients can connect to Jylis while the problem query is running.
I realize this use of Jylis (>100k items in a TLOG
) may be abusive to Jylis' ideal use case and pushing it beyond its intended limits. The intent of this test was to find the breaking point of Jylis. If that's the case, I have no problem trimming the TLOG
at this point (or setting a limit until the cursor is implemented). However, if you would like to make Jylis performant in this scenario, I can certainly share the TLOG
data so that the issue can be reproduced.
I'm a little concerned that the extreme delays started happening after a deploy (a new Jylis image may have been pulled in addition to the service restarting). Not sure if this is a regression or if I just got lucky from running Jylis for so long without a restart.
This also exposes a couple issues that I am concerned about that are not related to a large data set: One running query seems to tie up the Jylis connection. No other queries can be made and no clients can connect when a query is running.
When using disk persistence and intentionally causing a partition split, I noticed that data written on one side of the split is not synced to disk on the nodes on the other side of the split after the partition has been repaired and replication has completed. This could cause problems if the nodes on one side of a split are restarted during the split, as they would lose the data they had replicated from the other side of the partition. I have set up an example to reproduce the issue.
docker-compose.yaml
to a directory:version: "3"
services:
db1:
image: jemc/jylis
ports:
- "6379:6379"
volumes:
- ./data1:/data
command:
- "--disk-dir=/data"
- "--addr=db1:9999:db1"
db2:
image: jemc/jylis
ports:
- "6382:6379"
volumes:
- ./data2:/data
command:
- "--disk-dir=/data"
- "--addr=db2:9999:db2"
- "--seed-addrs=db1:9999:db1"
links:
- db1
- db3
db3:
image: jemc/jylis
ports:
- "6383:6379"
command:
- "--addr=db3:9999:db3"
- "--seed-addrs=db1:9999:db1"
links:
- db1
cli1:
image: redis:4.0-alpine
restart: "no"
command: redis-cli -h db1
links:
- db1
cli2:
image: redis:4.0-alpine
restart: "no"
command: redis-cli -h db2
links:
- db2
cli3:
image: redis:4.0-alpine
restart: "no"
command: redis-cli -h db3
links:
- db3
data1
and data2
to hold the disk persistance data. We'll leave db3
as in-memory only.$ mkdir data1
$ mkdir data2
Run docker-compose up -d
to start the cluster.
Run docker-compose run cli1
to connect a CLI to db1
. Note: It helps to open the CLIs in separate terminals.
Run the command MVREG SET foo 1
.
Run docker-compose run cli2
to connect a CLI to db2
.
Run the command MVREG GET foo
, which should return 1) "1"
.
Comment out all of the links
and --seed-addrs
for db2
in the yaml to separate the node from the cluster.
db2:
image: jemc/jylis
ports:
- "6382:6379"
volumes:
- ./data2:/data
command:
- "--disk-dir=/data"
- "--addr=db2:9999:db2"
# - "--seed-addrs=db1:9999:db1"
# links:
# - db1
# - db3
Shut down the cluster with docker-compose down
.
Bring the cluster back up with docker-compose up -d
On cli1
run MVREG GET foo
. The result is 1) "1"
.
On cli2
run MVREG GET foo
. The result is (empty list or set)
, but should be "1"
.
The data written to db1
was not persisted to db2
's disk
On cli2
run MVREG SET foo 2
.
Uncomment the yaml for db2
so that the node will rejoin the cluster.
Shut down the cluster with docker-compose down
.
Bring the cluster back up with docker-compose up -d
On cli2
run MVREG GET foo
. This will return 1) "2" 2) "1"
.
On cli2
run MVREG SET foo 3
.
On cli1
run MVREG GET foo
. This will return "3"
.
Current state: Value "1"
is persisted to disk on db1
and "3"
is on disk on db2
. They both have a value of "3"
in memory.
Comment out the links
and --seed-addrs
for db2
again.
Shut down the cluster with docker-compose down
.
Bring the cluster back up with docker-compose up -d
On cli1
run MVREG GET foo
. This will return 1) "1"
.
On cli2
run MVREG GET foo
. This will return 1) "3"
.
This shows that the values synced in the replication process have not been persisted to disk.
Redis has a FLUSHALL
command that deletes all keys from all internal keyspaces.
Adding a similar feature to Jylis has been requested by @solnic.
Outstanding questions:
There were some portability issues raised in #3 that still need to be investigated and resolved.
I have been using a TLOG
to store time-series data for a small project to evaluate some technology. I am retrieving the data from Jylis and displaying it on a graph:
What I've noticed is that the query time is roughly 100ms per 10,000 records. Considering I now have close to 90,000 data points in the TLOG
, the response time is about 900ms. This is causing a noticeable delay when loading the graph. If I only return the first 100 records, the response time is about 10ms.
Based on this I'm wondering if implementing a cursor in Jylis for the TLOG
data type makes sense. That way I could fetch the first 10k data points at a reasonable response time. Then if the user decides to scroll the graph I can fetch the next 10k, and so on.
An alternative I've thought about is sharding the TLOG
on the application side if this isn't an appropriate feature for Jylis. For example, I could create a new TLOG
for each device for each day, and then put logic in my application to fetch from the appropriate TLOG
based on the section of data the user is retrieving.
Thoughts?
When using disk persistence, after restarting the database and refreshing the UI before the logs had finished being read (it takes a couple minutes), I noticed the graph displayed the oldest values but was missing the most recent:
Of course, this makes sense logically because the log is read from oldest to newest. However, from a UX standpoint (or at least most that I can think of) the recent value is the most relevant and the older values get less relevant the further back in time you go. In the graph above the measurements for the day are not present, but data from a few days ago is. I think the same holds true for other data types like MVREG
: The last known value is expected when reading the logs. Although by the nature of a database that favors availability it's acceptable for values to bounce around when the log is being read, in practice my concern is that it may be unexpected and the application may appear "buggy".
I'm not sure how difficult this would be to implement, so at this point I just wanted to point it out and see what you think. I don't think it's an urgent priority. I think the issue could also be avoided for now by having a cluster and performing a rolling restart so that the latest values are synced from the other nodes.
We may want to add a feature to disable (or opt in?) to SYSTEM
commands that aren't read only.
See discussion here: #13 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.