GithubHelp home page GithubHelp logo

Comments (11)

aphillips avatar aphillips commented on May 25, 2024

I don't want to add a circular reference here: I commented on url#199 to the effect that 'code unit' requires that the character encoding (probably UTF-16 if working in terms of the DOM) be specified. If we're working in terms of UTF-8 byte strings, code point order would be better. Otherwise sort won't do anything reasonable.

from infra.

annevk avatar annevk commented on May 25, 2024

We should settle this in a similar way to how we settle sizing of strings I think, as discussed in #74. And this would be easy to fix once #73 is done.

from infra.

annevk avatar annevk commented on May 25, 2024

We should also fix byte sorting for whatwg/fetch#454.

from infra.

annevk avatar annevk commented on May 25, 2024

This should probably do something like https://en.wikipedia.org/wiki/Comparison_sort, but not sure yet. See also advice from @domenic in #74.

from infra.

annevk avatar annevk commented on May 25, 2024

I'm thinking it should be something like this:

To sort JavaScript strings perform a comparison sort using the following three-way comparison operation, given A and B:

  • If A is B, then return 0.
  • If A is a prefix of B, then return -1.
  • If B is a prefix of A, then return 1.
  • For each code unit a and b in A and B:
    • If a is b, then continue.
    • If a is less than b, then return -1.
    • Return 1.

This will get a little repetitive though to do for byte sequences, JavaScript strings, and strings.

from infra.

aphillips avatar aphillips commented on May 25, 2024

For the W3C spec IndexedDb, I suggested some text that may come in handy in annotating this:

This matches the Array.prototype.sort on an Array of Strings. This ordering compares the 16-bit code units in each string, producing a highly efficient, consistent, and deterministic sort order. The resulting list will not match any particular alphabet or lexicographical order, particularly for code points represented by a surrogate pair.

from infra.

domenic avatar domenic commented on May 25, 2024

Personally I think we should be able to just define a "less than" operation on two strings. We don't need the 0/1/-1. Mathematically at least only less than is necessary (plus an implicit "equals" concept).

I would also define "code unit comparison", and then say that a "sort" operation takes a comparison. I would not tie together the definition of sort and the specific comparison we use.

We should probably also define ascending vs. descending sort (and say ascending is the default), although we could wait until we have a definite consumer for that.

I like @aphillips's text particularly for pointing out the advantages and caveats.

As for the repetition, this kind of thing makes me wonder about our implicit concept of "sequence" that we are using for both strings + byte sequences. It's kind of a list even, but I'm not sure I'd want to go that far in the layering... Still, it would be nice to generalize the sorting to work on all three sequence types.

from infra.

annevk avatar annevk commented on May 25, 2024

Perhaps they can be lists with immutable size/length. Even if you just define "less than", you'd still have most of the steps I outlined above, no? Apart from the first step that is.

from infra.

domenic avatar domenic commented on May 25, 2024

Hmmm, yeah, I guess that's true.

from infra.

annevk avatar annevk commented on May 25, 2024

Note that if we settle #91 on mutable treating these as list-likes seems likely. We already have places that use "append" on strings, although they sometimes get passed a code point and sometimes a string (which would more argue for "extend"/"concat" if we're being picky).

from infra.

annevk avatar annevk commented on May 25, 2024

(CSS Typed OM uses "code point order" for some things.)

from infra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.