GithubHelp home page GithubHelp logo

tc39 / proposal-array-unique Goto Github PK

View Code? Open in Web Editor NEW
137.0 16.0 7.0 64 KB

ECMAScript proposal for Deduplicating method of Array

TypeScript 96.80% Shell 3.20%
javascript ecmascript array unique deduplicate proposal polyfill

proposal-array-unique's Introduction

Array deduplication proposal

ECMAScript proposal for Deduplicating method of Array.

Proposal Stage-1 CI & CD

NPM

Motivation

Deduplication is one of the most common requirements in Data processing, especially in large Web Apps nowadays.

In Lodash, *uniq* methods are also very popular:

# Name Downloads
1 uniq 5,546,070
7 uniqby 447,858
28 uniqwith 15,077

But [...new Set(array)] in ECMAScript 6 isn't enough for Non-primitive values, and now, we may need a Array.prototype.uniqueBy().

Core features

While Array.prototype.uniqueBy() invoked with:

  1. no parameter, it'll work as [...new Set(array)];

  2. one index-key parameter (Number, String or Symbol), it'll get values from each array element with the key, and then deduplicates the origin array based on these values;

  3. one function parameter, it'll call this function for each array element, and then deduplicates the origin array based on these returned values.

Notice:

  • the Returned value is a new array, no mutation happens in the original array
  • Empty/nullish items are treated as nullish values
  • 0 & -0 are treated as the same
  • All NaNs are treated as the same

Typical cases

[1, 2, 3, 3, 2, 1].uniqueBy();  // [1, 2, 3]

const data = [
    { id: 1, uid: 10000 },
    { id: 2, uid: 10000 },
    { id: 3, uid: 10001 }
];

data.uniqueBy('uid');
// [
//     { id: 1, uid: 10000 },
//     { id: 3, uid: 10001 }
// ]

data.uniqueBy(({ id, uid }) => `${id}-${uid}`);
// [
//     { id: 1, uid: 10000 },
//     { id: 2, uid: 10000 },
//     { id: 3, uid: 10001 }
// ]

Polyfill

A polyfill is available in the core-js library. You can find it in the ECMAScript proposals section.

A simple polyfill from the proposal repo write in TypeScript.

Proposer

proposal-array-unique's People

Contributors

jack-works avatar techquery avatar zloirock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proposal-array-unique's Issues

The semantic of 0,-0,NaN

It seems the proposal choose SameValueZero which follow Set and Array.prototype.includes(). I'm ok with that, I only create this issue to make sure everyone is happy with that.

Consolidation behavior

Should the unique operation keep the first value with a given surrogate key or the last, and at what index? Is there reason to make it author-configurable?

const data = [
    { id: 1, name: "1a" },
    { id: 2, name: "2a" },
    { id: 1, name: "1b" },
];
data.uniqueBy(v => v.id)[0].name; // ???

Implementation recommendations

This is more an FYI than a specific feature request. For ideal performance, implementations should consider the following:

  • Using a specialized append-only set rather than the built-in ES one, as only "has" and "add" are ever used.
  • For integer arrays, using a sparse bit set rather than a traditional set to improve both speed and memory.
  • For polyfills, using the obvious Array.from(set) rather than a parallel array.

How to deal with empty items

What [1, , 2, , , 2, 1].unique() should return?

At least there are three options:

  1. Treat empty items as undefined, so returns [1, undefined, 2].
    This matches what [...new Set(arr)] do as README, but I suppose it's not intentional. Note if unique(f), f would be called on every empty items as undefined, (or called with no param?) (To avoid runtime error, if f is a key, I guess it should be treat as x => x?.[key])
  2. Skip all empty items and keep them, so returns [1, , 2, , , ,]
  3. Skip all empty items and drop them, so returns [1, 2]

Personally I prefer the last option.

“index” vs. function parameter: what’s the difference?

Question about https://github.com/TechQuery/array-unique-proposal#core-features:

While Array.prototype.unique() invoked with:

  1. no parameter, it'll work as [...new Set(array)];

  2. one index-key parameter (Number, String or Symbol), it'll get values from each array element with the key, and then deduplicates the origin array based on these values;

  3. one function parameter, it'll call this function for each array element, and then deduplicates the origin array based on these returned values.

How is 2 different from 3?

Can this be more general?

As others have pointed out, we already have [...new Set(array)] for primitives. So, I think that the question this proposal is addressing can be generalized to, "use something other than === to find duplicate objects in a Set".

Here are a couple other potential ways to solve this problem off the top of my head (I haven't thought them through; these are just ideas):

  • [...new Set(array, { comparator: (a, b) => a.id === b.id })] -- this would set a comparator on the Set to be used for all future additions to that specific Set.
  • Object.prototype[@@setKey] -- if present on an object, [@@setKey] should be used instead of the object pointer when checking for equality in the Set.

Callback arguments

In examples in the README and the polyfill, in case, if it's used with a callback, a callback takes only one argument - a value. Maybe makes sense to make it consistent with the rest array methods (like .filter) and pass 3 arguments - value, index, and array?

Alternatives that you can do today

Here's a quadratic one-liner:

[...new Set(arr.map(v => v.uid))].map(uid => arr.find((v) => v.uid === uid))

Make it NlogN by sorting the input array and performing a binary search.

Better yet, make it linear-time by using another object, like this:

Object.values(Object.fromEntries(arr.map(v => [v.uid, v])))

A Map would also work.

Live demo: https://jsfiddle.net/j5sb2cu4/

Clarification of the behavior

// How should be handled keys on `null` / `undefined`?
[{ a: 1 }, null, { a: null }].uniqueBy('a'); // => ???
[{}, undefined].uniqueBy('a');               // => ???
// How it should work when the resolver is not: `undefined` / callback / property key
[Object, Object, Object].uniqueBy({ toString() { return '' + Math.random() } }); // => ???
// Which arguments should be passed to the callback - only item, or, like in the rest array methods, 3 args?
[1, 2, 3].uniqueBy(function () { console.log(arguments.length) }); // => ???

The polyfill from the repo looks too raw for the answer to those questions.

Stage 1 Entrance Criteria

https://tc39.es/process-document/

  • Identified “champion” who will advance the addition (Jack Works)
  • Prose outlining the problem or need and the general shape of a solution
  • Illustrative examples of usage
  • High-level API
  • Discussion of key algorithms, abstractions and semantics
  • Identification of potential “cross-cutting” concerns and implementation challenges/complexity
  • A publicly available repository for the proposal that captures the above requirements
  • Polyfills / demos

Does .unique() mutate the original array?

README.md doesn't adequately convey whether Array.prototype.unique() modifies the array or whether it returns a new array.

  1. no parameter, it'll work as [...new Set(array)];
    Seems to create a new array since it implies data.unique() instead of data = data.unique().
  1. one index-key parameter (Number, String or Symbol), it'll get values from each array element with the key, and then deduplicates the origin array based on these values;
    The "deduplicates the origin array" implies mutating the original array.

For case 3, the above applies.

Typical cases section doesn't give any hints either. It would help if there was a simple data-row with returned value in comment indicating whether it was either modified or not.

Finally, the TypeScript polyfill looks like it doesn't modify the original array.

Proposal a more widely applicable method for implementation.

This method supports the use of more complex uniqueness determination logic for objects.
This method return a new array.

interface Array<T> {
    uniqueOf(comparator: (a: any, b: any) => boolean): T[];
}

Array.prototype.uniqueOf = function <T>(this: T[], comparator: (a: any, b: any) => boolean) {
    const result: T[] = [];
    let flag: boolean;
    for (const a of this) {
        flag = true;
        for (const b of result) {
            if (comparator(a, b)) {
                flag = false;
                break;
            }
        }
        if (flag) result.push(a);
    }
    return result;
};

demo

[{ n: 1, m: 1 }, { n: 1, m: 0 }, { n: 1, m: 1 }, { n: 2, m: 2 }, { n: 1, m: 0 }].uniqueOf((a, b) => a.n == b.n && a.m == b.m);
// [{ n: 1, m: 1 }, { n: 1, m: 0 }, { n: 2, m: 2 }];

[1, 1, 2, 3, 4, 4, 5].uniqueOf((a, b) => a == b);
// [1, 2, 3, 4, 5];

Alternative names

I like the name unique, but another option to keep in mind is distinct.

Pros: I can't think of anything now.
Cons: Longer to write. Synonyms of "distinct" are less related to the behavior of the method than "unique", so distinct may be more misleading than unique.

If anyone has any "pro" to add to this list, please let me know. Otherwise I'll close this issue if unique is found to be better.

Another alt has already been mentioned in the README.md, uniq, which is more concise

Multiple keys

It's common to have multiple keys for unique, so let's also support Array<number|string|symbol> for the param:

let attrUpdates = [
  {node: nodeA, attr: 'x', value: 1},
  {node: nodeA, attr: 'y', value: 2},
  {node: nodeB, attr: 'x', value: 3},
  {node: nodeA, attr: 'x', value: 4},
]

attrUpdates.unique(['node', 'attr']);
// produce
[
  {node: nodeA, attr: 'x', value: 4},
  {node: nodeA, attr: 'y', value: 2},
  {node: nodeB, attr: 'x', value: 3},
]

Status of Array.prototype.unique Proposal

Hi everyone,

I recently came across the Array.prototype.unique proposal and noticed that it hasn't progressed to the next stage. I couldn’t find details explaining why, and I’m quite curious about what has been influencing its journey through the ECMAScript stages.

If anyone has updates or insights on the factors or challenges that have impacted this proposal, I would greatly appreciate hearing more about it. Understanding the development process behind these features is really interesting to me.

Thanks for all your work on this!

@TechQuery
@Jack-Works

Cheers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.