GithubHelp home page GithubHelp logo

Message ordering about cabal-core HOT 10 OPEN

cabal-club avatar cabal-club commented on May 27, 2024
Message ordering

from cabal-core.

Comments (10)

hackergrrl avatar hackergrrl commented on May 27, 2024 1

extra 4mb in the case that a group gets to 1000k+ people

It was 1k users @ 10k messages. So even with a modest number of users, the cabal's size will grow much faster than a timestamping method that is independent of the number of participants. Since this is a db change and not just a protocol change, it will be harder to modify in the future.

@okdistribute have you noticing this problem with clock drift happening much in cabal?

Another solution could be a node using cabal peers as NTP servers. A node could look at the reported system time of its peers, and decide if it wants to offset its own clock to try and match theirs. It wouldn't be a trusted clock, but might be good enough for a medium-trust environment.

Another trick, which I learned from writing realtime multiplayer games, is to figure out your peers' clock offset, and apply those offsets locally. The idea is that when connecting to a peer, you share your system time with each other. You can use this to figure out the offset of that peer's clock relative to yours, and apply that to the messages you receive from them. This is cool because it doesn't matter what time anyone's clock is set to: each peer has a subjective relative view! The downside is that you'd only know offsets for peers you can directly connect to. There might be some kind of scheme where you receive transitive information about other peers' reported offsets, but I haven't thought that through & there's always security challenges when you're trusting other peers' subjective reports.

from cabal-core.

okdistribute avatar okdistribute commented on May 27, 2024 1

@noffle like I said, I haven't noticed it because I'm always online. It's more of a its-not-a-problem-until-it-is, you can leave it the same and probably nothing bad will happen until someone's messages are sorted at the beginning of time, and no one sees them, but then no one would probably notice

To note, that's 4mb if you download the entire history of the cabal, which hopefully wouldn't have to be the case if we have an easy way to request the last X messages (which may be easier if they're sorted by a number!)

from cabal-core.

okdistribute avatar okdistribute commented on May 27, 2024 1

@noffle revisiting this as some folks today noticed their clocks were a couple seconds off from each other while chatting.. XD

using cabal peers as NTP servers is an interesting hack for sure!

from cabal-core.

hackergrrl avatar hackergrrl commented on May 27, 2024

@okdistribute I agree that clock drift is problematic. IRC "solves" this by being realtime and not storing messages, which we don't have the luxury of.

This would mean including a vector clock timestamp in each message sent, right? The clock will grow linearly with N, the number of participants in the cabal. I'm concerned about this. A cabal with 1000 users (over its lifetime) and a per-clock size of 4 bytes (32-bit unsigned int), means every cabal message has an additional overhead of ~4kb. A cabal with 10k messages will then contain up to 4mb of vector clock data.

Another approach, which hyperlog used (and we use in Mapeo) is to include an array of links to the message IDs that are considered to be at the current head of the message graph. This has a tendency to converge toward 1-link-per-message. (I can expand on this if that doesn't make sense to some folx, but I know @okdistribute you're already familiar with this data structure). The benefit is that this won't grow with the number of participants, but instead with the # of forks at any given time. A message in cabal can be referenced by a public key and a sequence number, which comes to ~36 bytes as the average overhead on messages.

from cabal-core.

okdistribute avatar okdistribute commented on May 27, 2024

OK thanks @noffle, that makes sense. Does that cause performance tradeoffs for indexing or displaying the most recent messages in order? Just curious as usually disk space & performance have tradeoffs.

from cabal-core.

cinnamon-bun avatar cinnamon-bun commented on May 27, 2024

A simple approach to experiment with is using a single monotonically increasing clock for all users. When you post a message, give it the current time or the highest time you've seen in any other message (plus one), whichever is higher.

Sorting messages by this value puts them into one of the possible causal orderings.

However, a bad actor can post a very high timestamp and cause everyone trouble with integer overflow. Maybe Cabal users trust each other enough to not do that? Or maybe it's possible to ignore messages with times too far in the future.

This does not give the same richness of causal information as the backlinks approach described by @noffle . That extra info might enable a better user experience when old messages arrive?

from cabal-core.

cinnamon-bun avatar cinnamon-bun commented on May 27, 2024

Oops, I think I reinvented Lamport clocks. 😬 I guess it's time for me to read some theory...

from cabal-core.

hackergrrl avatar hackergrrl commented on May 27, 2024

from cabal-core.

cblgh avatar cblgh commented on May 27, 2024

when i was in berlin, peg linked me a paper on bloom clocks. i haven't done any deep reading of the paper, so i can't really say anything about tradeoffs, but i imagine it solves the space-efficiency problem of pure vector clocks. here's also a really nice tutorial on bloom filters peg had lying around

paper abstract

The bloom clock is a space-efficient, probabilistic data structure designed to determine the partial order of events in highly distributed systems. The bloom clock, like the vector clock, can autonomously detect causality violations by comparing its logical timestamps. Unlike the vector clock, the space complexity of the bloom clock does not depend on the number of nodes in a system. Instead it depends on a set of chosen parameters that determine its confidence interval, i.e. false positive rate. To reduce the space complexity from which the vector clock suffers, the bloom clock uses a 'moving window' in which the partial order of events can be inferred with high confidence. If two clocks are not comparable, the bloom clock can always deduce it, i.e. false negatives are not possible. If two clocks are comparable, the bloom clock can calculate the confidence of that statement, i.e. it can compute the false positive rate between comparable pairs of clocks. By choosing an acceptable threshold for the false positive rate, the bloom clock can properly compare the order of its timestamps, with that of other nodes in a highly accurate and space efficient way.

from wikipedia:

In 2019, Lum Ramabaja developed Bloom Clocks, [7] a probabilistic data structure whose space complexity does not depend on the number of nodes in a system. If two clocks are not comparable, the bloom clock can always deduce it, i.e. false negatives are not possible. If two clocks are comparable, the bloom clock can calculate the confidence of that statement, i.e. it can compute the false positive rate between comparable pairs of clocks.

that said i've never heard of bloom filters being used productively lol

from cabal-core.

okdistribute avatar okdistribute commented on May 27, 2024

This is all useful information and the indexing/sorting problem makes me think it might be worth the extra 4mb in the case that a group gets to 1000k+ people to reduce sorting time on messages.

Implementing a bloom clock/filter could be cool but seems like more work

from cabal-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.