jemc / jylis Goto Github PK

View Code? Open in Web Editor NEW

74.0 9.0 6.0 1.77 MB

A distributed in-memory database for Conflict-free Replicated Data Types (CRDTs). :seedling: :left_right_arrow:

Home Page: https://jemc.github.io/jylis

License: Mozilla Public License 2.0

Makefile 2.05% Pony 82.74% Ruby 14.07% Dockerfile 1.14%

crdt in-memory-database distributed-database pony-language

jylis's Introduction

`>jylis`

Jylis is a distributed in-memory database for Conflict-free Replicated Data Types (CRDTs), built for speed, scalability, availability, and ease of use.

Visit the website for more information.

jylis's People

Contributors

Stargazers

Watchers

Forkers

srenatus amclain cih-y2k dkullmann disruptek markoutso

jylis's Issues

TLOG experiences performance issues with large recordset

After restarting a single-node Jylis server with disk persistence, GETting a large TLOG key (>100k items) causes Jylis resource starvation, CPU goes to 100%, Jylis can't process other queries, and the executing query doesn't seem to return. This query took about 1000ms before the restart. A 100 item limited GET succeeds, as does adding new items with INS.

Jylis at rest after starting up and ingesting the TLOG from disk:

Jylis after sending the TLOG GET:

I tried easing up the item count and it had interesting results. I did this to try to keep a batch of items "hot" in memory to see if it would speed up the next query. 10k items worked fine. Then I bumped up the count to 20k and that was fine. I worked up to 80k items this way before Jylis became excessively unresponsive.

Then I restarted Jylis and the web app and tried a 45k item batch. It took 57 seconds to return from the database, which was close to the 60 second app timeout limit. I pulled the same 45k batch again and Jylis responded in 431ms. If I wait for several minutes with the processes still running, the long request happens again, followed by shorter requests after that. I'm not sure if this is intentionally being cached in Jylis or if it's a side effect of the garbage collector, but I thought I should point out the behavior.

Additional Notes

I am using the Hiredis Ruby gem. I store the current connection in Connection, but other than that I'm calling the Hiredis read and write methods.

Connection.current.write ["TLOG", "GET", "temperature"]
result = Connection.current.read

Using redis-cli to execute TLOG GET temperature doesn't experience the problem.
If the app server is stopped/reloaded/times out, the query continues to run in Jylis.
No other Jylis queries can be made while the problem query is running.
No other clients can connect to Jylis while the problem query is running.

I realize this use of Jylis (>100k items in a TLOG) may be abusive to Jylis' ideal use case and pushing it beyond its intended limits. The intent of this test was to find the breaking point of Jylis. If that's the case, I have no problem trimming the TLOG at this point (or setting a limit until the cursor is implemented). However, if you would like to make Jylis performant in this scenario, I can certainly share the TLOG data so that the issue can be reproduced.

I'm a little concerned that the extreme delays started happening after a deploy (a new Jylis image may have been pulled in addition to the service restarting). Not sure if this is a regression or if I just got lucky from running Jylis for so long without a restart.

This also exposes a couple issues that I am concerned about that are not related to a large data set: One running query seems to tie up the Jylis connection. No other queries can be made and no clients can connect when a query is running.

Roadmap

1.0 Milestone

Performance

Profile, benchmark and optimize cluster serialization; consider less human-readable protocols.

Beyond

Replicated data is not synced to disk

When using disk persistence and intentionally causing a partition split, I noticed that data written on one side of the split is not synced to disk on the nodes on the other side of the split after the partition has been repaired and replication has completed. This could cause problems if the nodes on one side of a split are restarted during the split, as they would lose the data they had replicated from the other side of the partition. I have set up an example to reproduce the issue.

Steps To Reproduce

Save the following docker-compose.yaml to a directory:

version: "3"
services:
  db1:
    image: jemc/jylis
    ports:
      - "6379:6379"
    volumes:
      - ./data1:/data
    command:
      - "--disk-dir=/data"
      - "--addr=db1:9999:db1"

  db2:
    image: jemc/jylis
    ports:
      - "6382:6379"
    volumes:
      - ./data2:/data
    command:
      - "--disk-dir=/data"
      - "--addr=db2:9999:db2"
      - "--seed-addrs=db1:9999:db1"
    links:
      - db1
      - db3

  db3:
    image: jemc/jylis
    ports:
      - "6383:6379"
    command:
      - "--addr=db3:9999:db3"
      - "--seed-addrs=db1:9999:db1"
    links:
      - db1

  cli1:
    image: redis:4.0-alpine
    restart: "no"
    command: redis-cli -h db1
    links:
      - db1

  cli2:
    image: redis:4.0-alpine
    restart: "no"
    command: redis-cli -h db2
    links:
      - db2

  cli3:
    image: redis:4.0-alpine
    restart: "no"
    command: redis-cli -h db3
    links:
      - db3

Create directories data1 and data2 to hold the disk persistance data. We'll leave db3 as in-memory only.

$ mkdir data1
$ mkdir data2

Run docker-compose up -d to start the cluster.
Run docker-compose run cli1 to connect a CLI to db1. Note: It helps to open the CLIs in separate terminals.
Run the command MVREG SET foo 1.
Run docker-compose run cli2 to connect a CLI to db2.
Run the command MVREG GET foo, which should return 1) "1".
Comment out all of the links and --seed-addrs for db2 in the yaml to separate the node from the cluster.

db2:
  image: jemc/jylis
  ports:
    - "6382:6379"
  volumes:
    - ./data2:/data
  command:
    - "--disk-dir=/data"
    - "--addr=db2:9999:db2"
  #   - "--seed-addrs=db1:9999:db1"
  # links:
  #   - db1
  #   - db3

Shut down the cluster with docker-compose down.
Bring the cluster back up with docker-compose up -d
On cli1 run MVREG GET foo. The result is 1) "1".
On cli2 run MVREG GET foo. The result is (empty list or set), but should be "1".
The data written to db1 was not persisted to db2's disk
On cli2 run MVREG SET foo 2.
Uncomment the yaml for db2 so that the node will rejoin the cluster.
Shut down the cluster with docker-compose down.
Bring the cluster back up with docker-compose up -d
On cli2 run MVREG GET foo. This will return 1) "2" 2) "1".
On cli2 run MVREG SET foo 3.
On cli1 run MVREG GET foo. This will return "3".
Current state: Value "1" is persisted to disk on db1 and "3" is on disk on db2. They both have a value of "3" in memory.
Comment out the links and --seed-addrs for db2 again.
Shut down the cluster with docker-compose down.
Bring the cluster back up with docker-compose up -d
On cli1 run MVREG GET foo. This will return 1) "1".
On cli2 run MVREG GET foo. This will return 1) "3".
This shows that the values synced in the replication process have not been persisted to disk.

Add equivalent of FLUSHALL command

Redis has a FLUSHALL command that deletes all keys from all internal keyspaces.

Adding a similar feature to Jylis has been requested by @solnic.

Outstanding questions:

How should this behave with disk persistence enabled? Should it also clear the disk files? If not, then the data could get revived from disk in the future.
How should this behave when clustered? Should it propagate the deletions to all other instances in the cluster? If not, then I'd expect other cluster instances to eventually revive the old data.

Investigate portability issues with the docker image and prebuilt binary.

There were some portability issues raised in #3 that still need to be investigated and resolved.

Feature: TLOG cursor

I have been using a TLOG to store time-series data for a small project to evaluate some technology. I am retrieving the data from Jylis and displaying it on a graph:

What I've noticed is that the query time is roughly 100ms per 10,000 records. Considering I now have close to 90,000 data points in the TLOG, the response time is about 900ms. This is causing a noticeable delay when loading the graph. If I only return the first 100 records, the response time is about 10ms.

Based on this I'm wondering if implementing a cursor in Jylis for the TLOG data type makes sense. That way I could fetch the first 10k data points at a reasonable response time. Then if the user decides to scroll the graph I can fetch the next 10k, and so on.

An alternative I've thought about is sharding the TLOG on the application side if this isn't an appropriate feature for Jylis. For example, I could create a new TLOG for each device for each day, and then put logic in my application to fetch from the appropriate TLOG based on the section of data the user is retrieving.

Thoughts?

Read persistent data in reverse order

When using disk persistence, after restarting the database and refreshing the UI before the logs had finished being read (it takes a couple minutes), I noticed the graph displayed the oldest values but was missing the most recent:

Of course, this makes sense logically because the log is read from oldest to newest. However, from a UX standpoint (or at least most that I can think of) the recent value is the most relevant and the older values get less relevant the further back in time you go. In the graph above the measurements for the day are not present, but data from a few days ago is. I think the same holds true for other data types like MVREG: The last known value is expected when reading the logs. Although by the nature of a database that favors availability it's acceptable for values to bounce around when the log is being read, in practice my concern is that it may be unexpected and the application may appear "buggy".

I'm not sure how difficult this would be to implement, so at this point I just wanted to point it out and see what you think. I don't think it's an urgent priority. I think the issue could also be avoided for now by having a cluster and performing a rolling restart so that the latest values are synced from the other nodes.

Security: Option to disable (or opt in?) to some SYSTEM commands.

We may want to add a feature to disable (or opt in?) to SYSTEM commands that aren't read only.

See discussion here: #13 (comment)