jupyter-server / pycrdt Goto Github PK

View Code? Open in Web Editor NEW

28.0 6.0 6.0 768 KB

CRDTs based on Yrs.

Home Page: https://jupyter-server.github.io/pycrdt

License: MIT License

Python 58.64% Rust 41.36%

crdt yjs

pycrdt's People

Contributors

Stargazers

Watchers

Forkers

patrick91 alicevik22 davidbrochart zsailer datalayer-externals jbdyn

pycrdt's Issues

Text data type does not delete as expected

Description

The Text data type does not delete characters when a slice's start index is equal to the length of the given slice.

Reproduce

from pycrdt import Doc, Text
doc = Doc()
doc["text"] = text = Text()
text += "test"
print(text)   # prints 'test'
del text[2:4] # slice length = 4 - 2 = 2 == start, alternatively `del text[2:]`, 
print(text)   # prints 'test' again, but should print 'te'
del text[1:4] # slice length = 4 - 1 = 3 != start, alternatively `del text[1:]`
print(text)   # prints 't' as expected

Expose functionality to manager Y.Doc updates without instantiating Y.Doc (diffUpdate, mergeUpdates, encodeStateVectorFromUpdate)

Problem

I am working on a platform that has centralized storage for the YDoc. As the centralized server does not need to know anything about the YDoc there is really no point in instantiating a YDoc object and I could get away with only using the encodeStateVectorFromUpdate, diffUpdate, mergeUpdates methods (https://docs.yjs.dev/api/document-updates#example-syncing-clients-without-loading-the-y.doc).

Currently neither https://github.com/y-crdt/ypy nor this project expose those methods. I think it would be rather handy dandy if these basic buffer manipulation methods would be exposed. That way users can simply use those.

Proposed Solution

Expose the update API (in both v1 and v2) format https://github.com/yjs/yjs?tab=readme-ov-file#update-api that work directly on buffers(so mergeUpdates, encodeStateVectorFromUpdate, diffUpdate, convertUpdateFormatV1ToV2, convertUpdateFormatV2ToV1, mergeUpdatesV2, encodeStateVectorFromUpdateV2, diffUpdateV2),

Additional context

I have been following the "separation" between pycrdt and ypy and I do think that this issue may be more applicable to https://github.com/y-crdt/ypy than this project. Then again, ypy is somewhat unmaintained and I do wonder if this request indeed falls outside of the future of this project.

Better syntax for accessing existing shared types

Problem

Currently, one must bind an empty shared type to Ydoc keys before accessing their values:

    ydoc["cells"] = Array()
    assert ydoc["cells"].to_py() == [{"metadata": {"foo": "bar"}, "source": "1 + 2"}]
    #      ^- not equal to the empty Array() assigned to this key immediately before,
    #         but rather the value coming from another provider

There are two drawbacks to only supporting this way of accessing a shared type:

This requires 1 additional line of code per shared type for the assignment statement, and can get verbose if one is using many. However, in Yjs, ydoc.getArray(...) can be used inline.

The inline assignment operator := (available in Python 3.8+) does not work either:

/Users/dlq/micromamba/envs/rtcdev/lib/python3.11/ast.py:50: in parse
    return compile(source, filename, mode, flags,
E     File "/Volumes/workplace/pycrdt-websocket/tests/test_pycrdt_yjs.py", line 92
E       assert (ydoc["cells"] := Array()).to_py() == [{"metadata": {"foo": "bar"}, "source": "1 + 2"}]
E               ^^^^^^^^^^^^^
E   SyntaxError: cannot use assignment expressions with subscript

From @davidbrochart:

BTW you can write doc["my_array"] = my_array = Array() if you want a one-liner.

The syntax requires assigning an empty shared type in order to access an existing, non-empty shared type. This is stateful and confusing, because the value of ydoc["cells"] is not the value of ydoc["cells"] that was just set in the immediately preceding line. This generally violates how most programming languages work. I understand that this is permitted by Python, but the public API should not rely on exotic behavior exclusive to Python.

Proposed Solution

TBD.

Additional context

Example and original discussion sourced from jupyter-server/pycrdt-websocket#11

Allow nested transactions

Currently, nested transactions are not allowed because that would lead to multiple TransactionMut on a document:

def foo(doc):
    with doc.transaction():
        text = doc.get_text("text")
        text += ", World!"

with doc.transaction():
    text = doc.get_text("text")
    text += "Hello"
    foo(doc)  # will fail

This hurts modularity, for instance if we wanted foo to be used independently. Now foo has to check if there is already a transaction on the document:

def foo(doc, txn=None):
    if txn is None:
        with doc.transaction():
            text = doc.get_text("text")
            text += ", World!"
    else:
        text = doc.get_text("text")
        text += ", World!"

with doc.transaction() as txn:
    text = doc.get_text("text")
    text += "Hello"
    foo(doc, txn)

See an example of such a workaround in jupyter-ydoc. This is not only more complicated, but this doesn't even do what is expected: the changes in foo are "merged" into the parent transaction, which might not be desirable because we wanted them to be grouped into their own transaction.
I think that for nested transactions to work, the context manager should only create the transaction at exit, and make the changes then. This means that every change made in the context manager should be registered first.

Python 3.12 wheels for Windows

FYI the latest release for some reason is missing these

Provide more context for this new library and its relation with `ypy` in the documentation

For the sake of transparency and for folks landing on the documentation page directly, it could be useful to know more about the history behind pycrdt and how it compares to ypy.

Maybe summarizing the discussion from jupyter-server/team-compass#55 (or linking to the issue) would be enough.

y-websocket provider

coming from this issue y-crdt/ypy#154

i don't find how to synchronize map changes between different python clients. using websocket, with awareness...
can someone give me an example ?

New strategy for document validation

Problem

The current method for validating documents is having a copy of the document, applying changes to it, and if it's still a valid document, applying the changes to the original document. It's expensive because documents and operations are duplicated.

Proposed Solution

If updates are stored e.g. in a YStore, a better solution could be to always apply changes to the original document, and if it fails validation, create a new document from the stored updates (and not store the last update).

jupyter-server / pycrdt Goto Github PK

pycrdt's People

Contributors

Stargazers

Watchers

Forkers

pycrdt's Issues

Description

Reproduce

Problem

Proposed Solution

Additional context

Problem

Proposed Solution

Additional context

Problem

Proposed Solution

Recommend Projects

Recommend Topics

Recommend Org

Jobs