GithubHelp home page GithubHelp logo

liljenzin / confluent Goto Github PK

View Code? Open in Web Editor NEW
9.0 2.0 2.0 51 KB

Confluent Sets and Maps

Home Page: https://www.liljenzin.se/confluent

License: MIT License

C++ 100.00%
persistent-data-structure set-operations algorithms

confluent's Introduction

Confluent Sets and Maps

Confluent sets and maps are sorted associative containers written in C++11.

Containers can be merged at optimal cost in time and memory both when one of the input containers is small and when the symmetric difference is small.

Sets of the same type share equal nodes. Maps share key nodes with sets of compaitble type and value nodes with other maps of the same type.

Inserted elements must be comparable, hashable and copy-constructible.

Source code

You find the current source code on github.

git clone https://github.com/liljenzin/confluent

Documentation

You find the current API documentation and code examples here.

The used merge algorithms are documented here.

Performance characteristics

#include "set.h"

confluent::set<int> A = { a_1, a_2, ... };
confluent::set<int> B = { b_1, b_2, ... };

confluent::set<int> C = A;  // copy construction

confluent::set<int> D = A | B;  // set union
confluent::set<int> E = A & B;  // set intersection
confluent::set<int> F = A - B;  // set difference
confluent::set<int> G = A ^ B;  // set symmetric difference

confluent::set<int> H(A.begin() + offset1, A.begin() + offset2);

A.insert(k1);
A.erase(k2);

if (A == B) then { ... };  // Tests that A and B contain the same elements.
size_t h = hash(A):  // Returns the combined hash value of all elements in A.

for (int i : A) { ... }

size_t u = std::max(A.size(), B.size());
size_t v = std::min(A.size(), B.size(), (A ^ B).size());

(1) Constructing the sets A and B from ranges is O(n) if input is sorted, otherwise O(n*log n).

(2) Constructing the set C as a copy of another set is constant in time and memory.

(3) Constructing the sets D, E, F and G by merging two sets is O(v*log(u/v)).

(4) Constructing the set H as a slice of another set is O(log(n)).

(5) Inserting or erasing individual keys is O(log n).

(6) Testing if two sets contain the same elements or accessing the combined hash value of all elements in a set is contant in time.

(7) Iterating all elements in a set is O(n) amortized time.

#include <string>
#include "map.h"

confluent::map<int, std::string> A = { a_1, a_2, ... };
confluent::map<int, std::string> B = { b_1, b_2, ... };

confluent::map<int, std::string> C = A;  // copy construction
confluent::set<int> D = A.key_set();  // extracts the set of keys in the map

// The union of two maps will create a new map where elements are unique with
// respect to keys, with precedence for elements from the lhs input map.
confluent::map<int, std::string> E = A | B;  // union

// The union or intersection of two maps will create a new map as if they were
// sets of the value_type, i.e. both keys and mapped values will be matched in
// the merge operations.
confluent::map<int, std::string> F = A & B;  // intersection
confluent::map<int, std::string> G = A - B;  // difference

// The union or intersection of a map and a set will create a new map, while
// only the keys will be matched in the merge operations.
confluent::map<int, std::string> H = A & B.key_set();
confluent::map<int, std::string> I = A - B.key_set();  // difference

confluent::set<int> J(A.begin() + offset1, A.begin() + offset2);

A.insert(v1);
A.insert_or_assign(v2);
A.erase(k1);
A.erase(k2, m2);

std::string m = A.at(k3);

if (A == B) then { ... };  // Tests that A and B contain the same elements.
size_t h = hash(A):  // Returns the combined hash value of all elements in A.

for (const std::pair<int , std::string>& value : A) { ... }

size_t u = std::max(A.size(), B.size());
size_t v = std::min(A.size(), B.size(), (A-B).size() + (B-A).size());
size_t w = std::min(A.size(), B.size(), (A.key_set() ^ B.key_set()).size());

(1) Constructing the maps A and B from ranges is O(n) if input is sorted, otherwise O(n*log n).

(2) Constructing the map C as a copy of another map is constant in time and memory.

(3) Constructing the map D containing the keys in a map is constant in time and memory.

(4) Constructing the maps E, F and G by merging two maps is O(v*log(u/v)).

(5) Constructing the maps H and I by merging a map and a set is is O(w*log(u/w)).

(6) Constructing the map J as a slice of another map is O(log(n)).

(7) Inserting or erasing individual keys or values is O(log n).

(8) Testing if two maps contain the same elements or accessing the combined hash value of all elements in a map is contant in time.

(9) Iterating all elements in a map is O(n) amortized time.

Applications

Confluent sets and maps are powerful alternatives to the standard counterparts in applications that work with many containers of the same type, in particular when containers of different size and/or containing overlapping content have to be merged.

Persistent data models

Confluent sets and maps are backed by immutable trees and keeping copies of previous versions when containers are modified adds no extra cost in time.

typedef confluent::set<int> Set;

std::vector<Set> history;
Set actual_data;

actual_data.insert(1);
history.push_back(actual_data);

actual_data.insert(2);
history.push_back(actual_data);

actual_data.insert(3);
history.push_back(actual_data);

Using containers as keys in hash tables

Confluent sets and maps can be copied, hashed and tested for equality in constant time and can therefore be used as key types in generic hash tables such as std::unordered_set and std::unordered_map.

typedef confluent::set<int> Set;

struct Hasher {
  size_t operator()(const Set& set) const { return set.hash(); }
};

std::unordered_map<Set, std::string, Hasher> ht;

Set s1 = { 1, 2, 3 };
Set s2 = { 2, 3 };

ht[s1] = "a mapped value";
ht[s2] = "another mapped value";

Three-way-merge

Confluent maps can be used to implement efficient three-way-merge of dictionaries based on simple set operations, eleminating the need for complicated code and suplementary structures often used to keep track of changes.

A three-way-merge is an operation that merges the changes in two braches created from a common ancestor, like push and pull operations in git.

Provided a common ancestor A and two branches B and C, if the branches contain no conflicts, the following function will produce a new container that applies the changes in both branches:

typedef confluent::map<Key, T> map;

map merge(map A, map B, map C) {
	return (A - (A - B) - (A - C)) | (B - A) | (C - A);
}

Let n be the total number of elements and let m be the number of changes in the branches.

The cost of the merge is O(m*log(n/m)), which is optimal for a sorted index structure.

The merge can easily be extended to also handle conflicts, which is demonstrated with complete code in the example program PhoneNumbers.cc.

confluent's People

Contributors

liljenzin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

confluent's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.