GithubHelp home page GithubHelp logo

generatedtables.jl's Introduction

GeneratedTables

About

This is an experimental package to prototype data tables using @Generated types. Currently it only functions on Julia v0.4. If we can get generated types to work in Julia v0.5, then I may port TypedTables.jl over to this new formalism.

Overview

The idea is that a Table contains the columns as its fields. We can access the columns with a standard field reference, like:

using GeneratedTables
table = Table{(:FirstName, :LastName, :DOB)}(fnames, lnames, dobs)
table.FirstName[3] # == fnames[3]

The GeneratedTypes.jl package is used to make possible the definition of custom fields. The definition of a Table is:

@Generated immutable Table{Names, Types <: Tuple}
    # Plus sanity checks:
    #     - Names is a tuple instance of unique Symbols,
    #     - Types is a Tuple-type containing the field types.

    exprs = [:( $(Names[i])::$(Types.parameters[i]) ) for i = 1:N]
    return Expr(:block, exprs...)
end

This solves the overly verbose syntax issues associated with TypedTables.jl and simultaneously the speed issues of DataFrames.jl (that is, when using a naive, row-by-row approach).

Methods of construction

We can create a variety of table elements, including Cell, Column, Row and Table. Upon construction, only the field name is necessary (the type is inferred):

julia> using GeneratedTables

julia> cell = T.Cell{:a}(1)
Cell:
 ┌───┐
 │ a │
 ├───┤
 │ 1 │
 └───┘

 julia> fieldnames(cell)
 1-element Array{Symbol,1}:
  :a

 julia> cell.a
1

Columns are like Cells in that they have only one field, but they expect a container with Vector-like capabilities as input:

julia> col = Column{:a}([1, 2, 3])
3-row Column:
    ╒═══╕
Row │ a │
    ├───┤
  112233 │
    ╘═══╛

julia> col.a
3-element Array{Int64,1}:
 1
 2
 3

julia> col[2]
2

These containers support common operations like iteration, indexing, push!, etc. We can use vcat to build Columns out of Cells (or other Columns):

julia> vcat(cell, cell)
2-row Column:
    ╒═══╕
Row │ a │
    ├───┤
  1121 │
    ╘═══╛

Rows contain multiple fields, indicated by a tuple of symbols, like:

julia> vcat(cell,cell)
2-row Column:
    ╒═══╕
Row │ a │
    ├───┤
  1121 │
    ╘═══╛

julia> row = T.Row{(:a, :b, :c)}(1, 2.0, true)
3-element Row:
 ╓───┬────────┬───╖
 ║ a │ b      │ c ║
 ╟───┼────────┼───╢
 ║ 12.0000 │ T ║
 ╙───┴────────┴───╜

julia> fieldnames(row)
3-element Array{Symbol,1}:
 :a
 :b
 :c

julia> row.b
2.0

The can also be constructed by a hcat of Cells:

julia> hcat(Cell{:a}(1), Cell{:b}(2.0), Cell{:c}(true))
3-element Row:
 ╓───┬────────┬───╖
 ║ a │ b      │ c ║
 ╟───┼────────┼───╢
 ║ 12.0000 │ T ║
 ╙───┴────────┴───╜

Finally, Tables are containers with multiple rows and columns. Their fields can be whatever storage you prefer:

julia> t = Table{(:a,:b,:c)}([1,2,3], [2.0,4.0,6.0],[true,false,false])
3-row × 3-column Table:
    ╔═══╤════════╤═══╗
Row ║ a │ b      │ c ║
    ╟───┼────────┼───╢
  112.0000 │ T ║
  224.0000 │ F ║
  336.0000 │ F ║
    ╚═══╧════════╧═══╝

julia> t.c
3-element Array{Bool,1}:
  true
 false
 false

julia> t[2]
3-element Row:
 ╓───┬────────┬───╖
 ║ a │ b      │ c ║
 ╟───┼────────┼───╢
 ║ 24.0000 │ F ║
 ╙───┴────────┴───╜

Semantically, they follow the convention that they are a storage vector of Rows (e.g. upon indexing or iteration), although in-memory they are stored as separate columns of data. (In the future, we may also introduce a DenseTable or similar which is precisely an in-memory Vector{Row{...}}).

Future work

I'm still working hard on supporting common data table capabilities like selecting, mutating, filtering, sorting, and joining. Many of these are already quasi-supported by Julia's inbuilt functions (e.g. try filter() on a Table using a function that maps Rows to Bool).

A "complete" solution would include a thought-out hashing and/or sorting scheme, that may be leveraged by different types of join or for tables with one (or more) keys made up of one (or more) rows.

generatedtables.jl's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.