GithubHelp home page GithubHelp logo

usethesource / capsule Goto Github PK

View Code? Open in Web Editor NEW
403.0 30.0 28.0 4.58 MB

The Capsule Hash Trie Collections Library

License: BSD 2-Clause "Simplified" License

Java 99.80% Shell 0.20%
hashmap trie immutable immutable-collections persistent-data-structure hashset java performance

capsule's Introduction

The Capsule Hash Trie Collections Library

Status

capsule build status

Synopsis

Capsule aims to become a full-fledged (immutable) collections library for Java 11+ that is solely built around persistent tries. The library is designed for standalone use and for being embedded in domain-specific languages. Capsule still has to undergo some incubation before it can ship as a well-rounded collection library. Nevertheless, the code is stable and performance is solid. Feel free to use it and let us know about your experiences!

Getting Started

Binary builds of Capsule are deployed in the usethesource repository. In case you use Maven for dependency management, you have to add another repository location to your pom.xml file:

<repositories>
  <repository>
    <id>usethesource</id>
    <url>https://releases.usethesource.io/maven/</url>
  </repository>
</repositories>

Furthermore, you have to declare Capsule as a dependency.

To obtain the latest release for Java 11+, insert the following snippet in your pom.xml file:

<dependency>
  <groupId>io.usethesource</groupId>
  <artifactId>capsule</artifactId>
  <version>0.7.1</version>
</dependency>

To obtain the latest available version for Java 8, insert the following snippet in your pom.xml file:

<dependency>
  <groupId>io.usethesource</groupId>
  <artifactId>capsule</artifactId>
  <version>0.6.4</version>
</dependency>

Snippets for other build tools and dependency management systems may vary slightly.

Exploring Capsule

Build the library and spawn a Java shell to interactively explore Capsule, e.g.:

$ ./gradlew clean build
$ jshell --class-path ./build/libs/capsule-*-SNAPSHOT.jar

|  Welcome to JShell
|  For an introduction type: /help intro

jshell> var set = io.usethesource.capsule.Set.Immutable.of(1, 2);
set ==> {1, 2}

Background: Efficient Immutable Data Structures on the JVM

The standard libraries of recent Java Virtual Machine languages, such as Clojure or Scala, contain scalable and well-performing immutable collection data structures that are implemented as Hash-Array Mapped Tries (HAMTs). HAMTs already feature efficient lookup, insert, and delete operations, however due to their tree-based nature their memory footprints and the runtime performance of iteration and equality checking lag behind array-based counterparts.

We introduce CHAMP (Compressed Hash-Array Mapped Prefix-tree), an evolutionary improvement over HAMTs. The new design increases the overall performance of immutable sets and maps. Furthermore, its resulting general purpose design increases cache locality and features a canonical representation.

References and Further Readings

Talks

Publications

capsule's People

Contributors

davylandman avatar dependabot[bot] avatar jurgenvinju avatar msteindorfer avatar slawo-ch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

capsule's Issues

Tutorial available?

  • Is there a tutorial available how to use PersistentTrieMap API?

Best practice?

  • How to use the __...methods?

  • How to best initialize the structure?

  • Is there a data structure for persistent lists?

Question on nodes array

For Map tries...

I'm trying to wrap my head around the implementation (great work and kudos to all). No, I'm not a masochist!

Re: BitmapIndexedMapNode 'nodes' array

  1. Does it grow and/or shrink?
  2. Won't grow beyond 64 in size?

TIA

Sunset Maven build descriptor

The Capsule project contains two build descriptors, one for Maven and one for Gradle. The former is out of sync and, e.g., does not cover recent features (cf. #35). In order to avoid feature disparity and the two descriptors getting out of sync, I suggest removing the Maven build descriptors in favor of Gradle.

Steps for sunsetting the Maven build descriptor:

Introduce `Iterator.seek(int next)` as a basic building block for faster composition and transitive closure

The seek method is described as follows, here https://arxiv.org/abs/1210.0481 :

seek(int seekKey): Position the iterator at a least
upper bound for seekKey,
i.e. the least key ≥ seekKey, or
move to end if no such key exists.
The sought key must be ≥ the
key at the current position.

Using this building block algorithms for trie compose ("join") and closure can make use of the orderedness of the hash keys:

  • already the partial order on elements allows for skipping certain commutative orderings, this may shave off half of an iteration of the smallest trie
  • the full order on hash codes (up to collisions) allows for skipping ahead to the relevant part of the other iterator. The sparser the match between two relations, the faster the algorithm will go.

This seek method can be used in many applications and it abstracts from the internal details of the data-structure. All it depends on is the hashCode/equals contract. Most implementation in capsule will use the structure of the trie to make sure seek is done as quickly as possible. It would be best if we implement seek for all tries in capsule, IMHO.

Hash Collision in the PersistentTrieSetMultimap can cause equality issues

I've had some big data structures that when printing looked the same, yet equals was saying they weren't.

After quite some stepping I found it out.

PersistentTrieSetMultimap$HashCollisionNode is missing it's own equals method that goes through the collisionContent collection of itself and the other to see if they might be the same collision node. Instead it defaults to the Object.equals which is reference equality.

We are (still?) on capsule 0.3.0, if possible I would appreciate a bug fix release :)

This package contains no tests

If this library is to be used as a reference implementation to CHAMP it should contain tests that will allow porters to verify the implementation.
The tests will also uncover bugs in this implementation.
Please cover this package with tests we can all use.

Code cleanup

There are some other parts of the app where static analysis has flagged me, like lack of equals and hashCode on some inner classes inside AbstractSpecialisedImmutableMap. Plenty of redundant casts and extra typing that could be replaced by diamonds. Indentation is also all over the place, with a mix of tabs and two and four spaces.

I could open another PR and help, if contributions are accepted.

`values` fails on `AbstractSpecialisedImmutableMap` if there are duplicate values

For small maps, there are some special instances which use fields instead of a Trie.

For a map of 2 entries, if you call .values() on it, and the values are the same, you get an exception that no duplicate keys are allowed.

Looking at the code in io.usethesource.capsule.AbstractSpecialisedImmutableMap<K, V> we see there is a TODO warning for this case:

    @Override
    public Collection<V> values() {
        // TODO: will fail if two values are equals; return listOf(...)
        return AbstractSpecialisedImmutableSet.setOf(val1, val2);
    }

epl derivatives and bsd license

I was curious about the statement in the README Capsule was recently extracted from the usethesource/vallang project . Is Capsule an EPL derivative work? IANAL, so but I was reading about EPL earlier and it seems much less permissive than BSD and I'm wondering about how users should reason about this.

Lists

Do you intend to implement other types of collections for example: Lists, Linked Lists, Queues, etc ...?

Question: Why not Vectors?

With a full grok of capsule attained, kudos again, I'm wondering why not Vectors (as in Clojure vectors)?

Does the sequential nature diminish the space/performance advantage of Trie?

Curious but not critical.

Question about `dataMap` for singleton node

Is this logic for selecting the newDataMap for a new singleton node correct?

Specifically, in the case where shift is not 0, newDataMap is set to bitpos(mask(keyHash, 0)). But keyHash is the hash of the key that's being removed, right? (And the data map should reflect the key that remains.)

Or am I misunderstanding what's going on in that code?

Reintroduce CapsuleCollectors.toMap?

Hi,

I see that CapsuleCollectors.toMap is commented in the code. Is there any reason not to make it available?

That's how I reimplemented it:

    public static <T, K, V> Collector<T, ?, io.usethesource.capsule.Map.Immutable<K, V>> toMap(
        Function<? super T, ? extends K> keyMapper, Function<? super T, ? extends V> valueMapper
    ) {
        /** extract key/value from type {@code T} and insert into map */
        final BiConsumer<io.usethesource.capsule.Map.Transient<K, V>, T> accumulator =
            (map, element) -> map.__put(keyMapper.apply(element), valueMapper.apply(element));

        return new DefaultCollector<>(
            io.usethesource.capsule.Map.Transient::of,
            accumulator,
            (left, right) -> {
                left.__putAll(right);
                return left;
            },
            io.usethesource.capsule.Map.Transient::freeze,
            CapsuleCollectors.UNORDERED
        );
    }

Another one would be the same but using a function to merge values in case one already exists in the transient map.

Thanks!

Merge experimental branches of capsule to master

Instead of long-living experimental feature branches I'm considering to merge them into the master branch into special "experimental" packages. The API and implementations in those packages will be unstable until the code matures and moves to stable packages.

Merging the the branches should simplify code evolution of old and new features, and also show give users a sneak-peak of upcoming or experimental features.

Ordered collections

Will capsule also provide sets/maps that maintain insertion order, or is this out of scope? And will it provide lists? Sorry for asking these questions here, didn't know where else to ask.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.