chat-wane / lseqtree Goto Github PK

View Code? Open in Web Editor NEW

55.0 55.0 10.0 2.24 MB

A data structure for distributed arrays using the LSeq allocation strategy

Home Page: http://chat-wane.github.io/LSEQTree

License: MIT License

JavaScript 99.91% Shell 0.09%

lseqtree's People

Contributors

Stargazers

Watchers

Forkers

jesuspatate andres-root wooandoo sergeeeek aral sfescape godofwatermelon nhaouari nybblr davidwu226

lseqtree's Issues

LSEQ + Tombstones

( Follow up of the discussion from issue #10 )

LSEQ, Logoot, Treedoc do not require tombstones (although the latter is hybrid). They need some sort of causality tracking mechanism that will track the special case of "insert an element and delete this element" which does not commute. They also need to know if a particular element is not inside the data structure because it is not arrived yet, or because it has been deleted (to avoid insert(e); delete(e); insert(e)).

So you have Network -> Causality Tracking -> CRDT (-> Editor )

You could use tombstones, but the complexity in space is not great and cannot be safely garbage collected. The causality tracking mechanisms (such as version vectors, interval version vectors, or version vectors with exceptions) allows compressing the tombstones in their structure. Their complexity depends of the number of team members actively collaborating instead of the number of insertions in the structure.

The fundamental difference between CRDTs with and without tombstones is that the former's identifiers do not require deleted elements to be totally ordered while the latter's identifiers do.

Fast access to the last index used

When inserting a new element within the sequence, the function get(index) is called twice to get the identifier of the previous and the next elements.
The main target of this data structure is the collaborative editing of document. In this case, the previously inserted element are likely to be reused for the next allocation of identifier.

For instance, let us consider the sequence "QWER". Inserting a character "T" at the end will access to the identifier of "R" and the maximal bound. Inserting again at the end will access to the identifier of the previously inserted "T" and again, the maximal bound...

Run code within browsers

Make the tree structure usable within browsers using browserify.

Factorize lseq.get(i) && lseq.get(i+1) in insert

Since lseq.get(i) && lseq.get(i+1) are neighbours, it may be possible to travel within the same path as long as possible instead of reprocessing everything from start.

Does this library use Tombstones?

I realize this project is over 2 years old, but I'm implementing a LSEQ tree style CRDT for a collaborative text editor I'm building for fun. I've been doing research on CRDT's. Through my research, I came across this repo.

I understand that Logoot and LSEQ CRDTs don't use Tombstones. Or at least they don't need to. I also read #11 in which you state this library doesn't use Tombstones. However, while I was playing around in my terminal, I came across what I thought were tombstones.

This is a screen grab of the tree before removing the 'A' at position 0.

This is a screen grab after removing the 'A'. Notice the null node in it's place.

Isn't that a tombstone? Or is my understanding of tombstones incorrect?

If that is a tombstone, would it be possible to implement an LSEQ tree without using them?

Couldn't you remove the node directly and replace it with the leftmost child, like in a BST? I realize this might cause the structures of the LSEQ tree to differ from user to user depending on when they receive the edits from the other users in the network.

ex.

However, if the elements/characters stay in the same order, would it matter if the structure changed?

I would love to hear your insight into the matter.

add toText() method

Is there a significant performance improvement to be had by optimizing for returning the full text of the stored tree?

Byzantine LSEQ-Node.add problem

This may be a "won't fix" but I thought about a byzantine problem arising from just accepting random adds. Someone who wants to mess up your local state or through some bug you may get and add request where the subcounters or children of an add are not singly linked. The add expects singly linked nodes so a solution may be adding a checkLimb function. (I use the semantic of limb for singly linked lseq-nodes)

Delay conversion fromNodetoId

In lseqtree.js , inserting an element first gets the previous and next as nodes. Then the two nodes are converted to ID which is a more compact representation used in the older package lseqarray. It uses this representation to generate a new ID and send it.

We want to keep the possibility to parse and send ID. However, it would be nice to delay conversion at the latest point in the code. For instance, just before sending them.

Optimize json output by adding toJSON()

You can reduce the size of the JSON output significantly just by dropping the subCounter field as well as empty children fields.

toJSON() {
return JSON.stringify(this, function(key, value) {
return key === 'subCounter' || (key === 'children' && value.length === 0) ? undefined : value
})
}

You would need to add a check for children in fromJSON to either default children to [] or skip processing children if the field is undefined.

Space improvement of node

LSEQ uses an exponential tree. Thus the root as k children, each of these children has k+1 children and so on. Nevertheless, it requires 1 additional bit to encode the path at each level.

Currently, each path is encoded with type integer. However, it does not require the 64 bit of the integer. Ideally, to get closer of the real value, it must be encoded using UINT8[].

Error: Cannot find module 'lodash.merge'

The runtime error mentioned in the title is thrown when LSEQTree is installed in a third-party project via npm (Node: v8.11.1, npm: v6.0.1)

Full error:

Error: Cannot find module 'lodash.merge'
    at Function.Module._resolveFilename (module.js:547:15)
    at Function.Module._load (module.js:474:25)
    at Module.require (module.js:596:17)
    at require (internal/module.js:11:18)
    at Object.<anonymous> (/Users/aral/indie/net/meta/spikes/crdt/lseqtree-test/node_modules/lseqtree/lib/lseqtree.js:3:15)
    at Module._compile (module.js:652:30)
    at Object.Module._extensions..js (module.js:663:10)
    at Module.load (module.js:565:32)
    at tryModuleLoad (module.js:505:12)
    at Function.Module._load (module.js:497:3)

The issue appears to be with the manner in which lodash merge is being imported.

The issue doesn’t manifest when running tests on a LSEQTree working itself via npm test.

Fix: require('lodash.merge') →require('lodash/merge')

Improve time complexity of getIndex

Currently linear in number of insertions + position dependant, i.e., insert at the end less efficient than insert at beginning.

Inefficient LSEQNode.getIndex

Currently, when the insertion is performed repeatedly at the end of the sequence, most of processing time is consumed by the getIndex function.

Todo:

The current function aims to start from the closer bound of the array in the inspected LSEQNode (beginning or the end). But it should consider that all its children have an equal number of children themselves. First, fix this dumb mistake...
Then, search if there are structures to get quick access to elements in the tree. i.e. some kind of reversed index logarithmically growing.

A few "scary" questions...

I'm somewhat new to CDRT's, and I was hoping you could give me an insight into how to use this library.

Is the unique identifier used for .applyInsert the same as the one inside the couple created by the remote .insert? Or should it be generated in some other way?

var lseq = require("lseqtree")

var a = new lseq(0)
var b = new lseq(1)

var ei = a.insert('A', 0)
var index = b.applyInsert(ei._e, ei._i) // <-- Correct?

Am I correct in assuming the index returned by .applyInsert is incremented by 1 because the "Begin" virtual leaf is always at index 0? The above example sets index to 1, even though intuition would have 'A' be inserted at index 0.
How do I create a human-readable text document from the tree? In the other direction, is there a more efficient way to generate a tree from an existing document than just doing "Insert" for every character?

Thanks for your patience, looking forward to any advice you can provide.

Why the random strategy choice?

As your GH is linked in the paper which describes LSEQ and I see other questions here, I come here to ask another scary question: why is the allocation strategy random and isn't just alternating?

As I understand it, the goal of the random strategy is to have a doc which can handle both add a lot just after an atom (boundary–) and just before an atom (boundary+). It won't really affect the performance (n % 2 is surely quicker than generating pseudo-randomly 0 or 1 but it isn't done many times) but I wanted to know if there was a reason that I missed?

My final goal is to implement LSEQ in Go. :)

Mutable subCounters

If the add function on the LSEQ-Node travels down a limb towards an element it currently increases all the subCounters even if it ends up not adding any more elements.
I also would like to talk to about some confusion I have about how the lseq crdt works, is there any way I could contact you?

Performance comparison between lseqarray and lseqtree

Comparing perf between lseqarray and lseqtree raised this issue: lseqarray is far better on insertions at the end of the sequence. Obviously, lseqarray is very efficient in this case because it does not require any shifting in the underlying array (and very inefficient in the antagonist inserting behaviour: requires N shiftings). On the other side, lseqtree uses a tree as underlying model. Thus, each object in the first array have sub-arrays etc... Therefore, the shiftings are bounded by the sub-array length. It implies better perf on insertion at the beginning and at random position (making lseqtree safer to use). However, inserting at the end is less efficient.... Problem is: from an algorithmic point of view, they should be even. As consequence, it is possible to improve a lot the efficiency of the code.

Specification of functions applyInsert and applyRemove

LSEQArray and LSEQTree must have the same specification in order to be fully compatible without additional effort. Nevertheless, current LSEQTree has not the same returning values in applyInsert and applyDelete. In particular, when the element already appears in the tree.

chat-wane / lseqtree Goto Github PK

lseqtree's People

Contributors

Stargazers

Watchers

Forkers

lseqtree's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs