GithubHelp home page GithubHelp logo

dataframe's Introduction

dataframe

Apache 2.0

Another dataframe library for Java, inspired by Tablesaw, built on nio buffers.

To add a dependency on dataframe using Maven, use the following:

<dependency>
  <groupId>tech.bitey</groupId>
  <artifactId>dataframe</artifactId>
  <version>1.2.11</version>
</dependency>

Requires Java 17 or higher. The last version supporting Java 11 was 1.1.7.

What's different about this dataframe library?

  • It's geared towards making it easier to ship around tabular data for Java backend developers - rather than for data science. This is not Pandas for Java.
  • Data is stored in ByteBuffers, so the dataframes can read/write to Channels with minimal overhead (save to files, send over network).
  • Optimized for space. For example, booleans take one bit each, DateTimes take one long (with microsecond precision).
  • Nulls are stored in a separate bitset (also backed by ByteBuffer), taking up two bits per Column length. No extra space is used if all values are non-null.

Features

  • Supports the most common types: String, int, long, short, byte, boolean, double, float, Date, DateTime, and BigDecimal; as well as Time, UUID, Instant, and InputStream.
  • Column and DataFrame are immutable. Columns can be created from collections, arrays, streams, or with builders.
  • Read/write to File or Channel with minimal overhead. Supports memory-mapping from a file.
  • Read/write CSV files
  • Read from ResultSet, write with PreparedStatement
  • Backing ByteBuffers can be on heap or off as a global property: -Dtech.bitey.allocateDirect=true or false, defaults to false
  • Column implements List. DataFrame implements List<Row>. If the DataFrame has a key column it can be viewed as a NavigableMap<T, Row>
  • Basic filtering, joining, grouping
  • No additional dependencies
  • Extensive testing

Sample Use Cases

  • Great as a ResultSet cache. Have an expensive query that needs to run every time your app starts and it's slowing down your development? Cache it locally on disk in a DataFrame! Because DataFrame can be viewed as a ResultSet, you can plug it into existing code with minimal changes. Or cache it in a service and pull it over the network (reads/writes directly to Channel).
  • Great for generating Excel reports via POI. Stage the data in a DataFrame first, then write POI code against the DataFrame. This separates concerns and is easier than writing POI directly against a ResultSet.

dataframe's People

Contributors

biteytech avatar dependabot[bot] avatar

Stargazers

woodslee avatar  avatar  avatar  avatar Tomas Brunken avatar Edward.Li avatar  avatar Anand Nadar avatar  avatar Ashwin Jayaprakash avatar Brian Orwe avatar  avatar  avatar  avatar tehe avatar Robert von Burg avatar Slawomir Dymitrow avatar Tristan Juricek avatar Chris... avatar Thierry Uriot avatar Jonas avatar Majdeddine Saadaoui avatar Thomas Darimont avatar  avatar Pierre Lecerf avatar Minh Hoang avatar

Watchers

James Cloos avatar woodslee avatar

Forkers

woodslee

dataframe's Issues

Left or right join on more than one column?

For performing an inner join, there's this method:

DataFrame join(DataFrame df, String[] leftColumnNames, String[] rightColumnNames);

And for left/ right join we have:

DataFrame joinManyToOne(DataFrame df, String leftColumnName);
DataFrame joinOneToMany(DataFrame df, String rightColumnName);
DataFrame joinLeftOneToMany(DataFrame df, String rightColumnName);

But those methods only allow specifying one join column. Is there a way to perform a left/right join on more than one column?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.