heavyai / metis Goto Github PK

View Code? Open in Web Editor NEW

39.0 27.0 9.0 29.54 MB

Tools for massively parallel and multi-variate data exploration

License: Other

JavaScript 98.03% Shell 1.97%

visualization charting sql crossfilter mapd

metis's Introduction

Metis ·

Tools for massively parallel and multi-variate data exploration.

Quickly build interactive visualizations powered by the speed of HeavyDB.

Data Layer

Modules for building declarative and cross-filtering data pipelines

View Layer

Modules for bootstrapping a multi-dimensional visualization framework

Thrift Layer

Modules for utilizing the MapD Core backend via the Thrift protocol

License

This project is licensed under the Apache License, Version 2.0.

metis's People

Contributors

Stargazers

Watchers

Forkers

cuulee domoritz thomasg77 stjordanis muskanmahajan37 isabella232 shaunstoltz ajunlonglive yagnak

metis's Issues

Create Example of Using with MapD Raster Chart

Explore Use of Vega-Lite as Higher-Level API

Vega-Lite provides a higher level visualization grammar that ties together encodings and data transformations.

It would useful to explore how that grammar maps directly to Vega encodings and the mapd-data-layer transformations.

One possible outcome of this exploration is defining a higher-level parser for vega-lite specifications. This parser would translate vega-lite spec to a vega spec and a data transform spec (to be used by the data layer)

Support Parsing of Expression Types

Demo links lead to 404

Support `Case` and `Coalesce` Expressions

Settle on Data Node Constructor API

Current API

const graph = createGraph()

const root = graph.data({
  source: "flights",
  name: "root"
})

const child = graph.data({
  source: "root",
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ]
})

const grandchild = graph.data({
  source: "child",
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ]
})

Proposed API 1

const graph = createGraph()

const root = graph.createRoot({
  source: "flights",
  name: "root"
})

const child = root.createChild({
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ]
})

const grandchild = child.createChild({
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ]
})

Proposed API 2

const graph = createGraph()

const root = graph.createRoot({
  source: "flights",
  name: "root"
})

const childNode = createNode({
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ]
})

const grandchildNode = createNode({
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ]
})

const child = root.pushChild(childNode)
const grandchild = child.pushChild(grandchildNode)

The two API proposals seem compatible as well.

Graph State is currently represented as:

const state = {
  root: {
    source: "flights",
    name: "root"
  },
  child: {
    source: "root",
    name: "child",
    transform: [
      {
        type: "filter",
        expr: "recipient_party = 'D'"
      }
    ]
  },
  grandchild: {
    source: "child",
    name: "grandchild",
    transform: [
      {
        type: "aggregate",
        fields: ["*", "amount"],
        ops: ["average", "average"],
        groupby: "recipient_party"
      }
    ]
  }
}

In the latter two proposals it would be represented as:

const state = {
  root: {
    source: "flights",
    name: "root",
    children: [
      child
    ]
  }
}

const childState = {
  source: root,
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ],
  children: [
    grandchild
  ]
}

const grandchildState = {
  source: child,
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ],
  children: []
}

Support All Numerical / String / Null Filter Transforms

Support Project Transform in Favor of "Formula" Transform

The Project Transform will basically be the Formula Transform.

It will support string expressions as well as object type expressions.

Expressions can also be specified as an array.

{
  type: "project",
  expr: Array<string | Expression> | string | Expression,
  as?: Array<string> | string
}

{
  type: "project",
  expr: {
     type: "date_trunc",
     unit: "month",
     field: "tweet_time",
     as: "key0"
  }
}
// SELECT date_trunc(month, tweet_time) as key0

{
  type: "project"
  expr: ["conv(lon)", "conv(lat)", "lang", "followers"],
  as: ["x", "y", "size", "color"]
}
// SELECT conv(lon) as x, conv(lat) as y, lang as size, followers as color

Support Multiplicative Sampling Transform

{
  type: "sample",
  method: "multiplicative",
  size: number,
  limit: number
}

const ratio = Math.min(limit/size, 1.0)
const threshold = Math.floor(4294967296  * ratio);

`MOD(${table}.rowid * 265445761, 4294967296) < ${threshold}`

Failure When Child Nodes of A Node w/ Crossfilter Transform Do Not Have Resolve Filter

Support Joins

SELECT ticker_subticker_map.ticker as ticker,end_month_date,AVG(avg_amount) as aov,COUNT(DISTINCT(final_transactions.resolved_mem_id)) as num_buyers,COUNT(final_transactions.resolved_mem_id) as num_purchases
FROM final_transactions
JOIN cohort_members_true as coh
ON coh.resolved_mem_id = final_transactions.resolved_mem_id
JOIN ticker_subticker_map
ON ticker_subticker_map.subticker = final_transactions.ticker AND date_date >= COALESCE(ticker_subticker_map.acquisition_date, date_date)
JOIN (SELECT
         MIN(start_week_date) AS start_week_date,
                                    MAX(end_week_date)   AS end_week_date,
                                    end_month_date
         FROM calendar_months
         WHERE start_week_date >= '2014-01-01'
         GROUP BY end_month_date) as tw
ON date_date BETWEEN tw.start_week_date AND tw.end_week_date
WHERE 1=1
  AND final_transactions.date_date >= '2014-01-01' AND final_transactions.transaction_base_type = 'debit'
  AND date_date > coh.birth_month
GROUP BY ticker_subticker_map.ticker, end_month_date
LIMIT 10;

Improve Crossfilter/ResolveFilter API

Currently "crossfiltering" behavior is implemented through the transforms Crossfilter and ResolveFilter.

The Crossfilter transform represents a set of filter transformations that should be applied to child nodes. These filters only get applied when the child nodes explicitly allowed them through the ResolveFilter transform .

For instance, a parent can have this Crossfilter transform

const xfilterDataNode = graph.data({
  source: "flights_donotmodify",
  name: "xfilter",
  transform: [
    {
      type: "crossfilter",
      signal: "xfilter",
      filter: [
        {
           type: "filter",
           id: "amount-filter",
           expr: {
              type: "between"
              field: "amount",
              left: 50,
              right: 100
           }
        },
        {
           type: "filter",
           id: "party-filter",
           expr: {
              type: "="
              left: "party",
              right: "D"
           }
        }
      ]
    }
  ]
});

And a child can resolve it like so (ignoring the party-filter)

const childDataNote = graph.data({
  source: "xfilter",
  name: "child",
  transform: [
    {
      type: "resolveFilter",
      filter: { signal: "xfilter" },
      ignore: ["party-filter"]
     }
  ]
});

Open to any other possible ideas.

Document Transform and Expression JSON API

These are currently documented as flow types. It would be helpful to document concrete examples of the transforms and how they translate into SQL

Release Version 1.0

data-layer: SQL parser should escape single quotes in string values

String values containing single quotes that are passed to mapd-data-layer, such as for a SQL CASE statement, are currently not having their single quotes escaped, so a string like 'Chicago O'Hare International' will cause a malformed SQL query.

For example:

CASE 
WHEN origin_name IN 
('Chicago O'Hare International','William B Hartsfield-Atlanta Intl','Dallas-Fort Worth International','Los Angeles International','Phoenix Sky Harbor International') 
THEN origin_name 
ELSE 'undefined' 
END AS key1

Implement Relation Builder API as Functional Helpers

Implement Relation Builder API as Node Method

The general idea is to add to the dataNode instances helper methods for constructing and setting transform objects.

For instance, this:

// extract and between would be expression creators
node.project("key1", extract("day", "contrib_date"))
node.filter(between("amount", [0, 100]))

would be equivalent to:

node.transform({
  type: "project",
  expr: {
    type: "extract",
    unit: "day",
    field: "contrib_date"
  },
  as: "key1"
})

node.transform({
  type: "filter",
  expr: {
    type: "between",
    field: "amount",
    left: 0,
    right: 100
  }
})

Support Relative Time Expressions To Be Used in Between Expression

declare type NowExpression = {|
  type: "now"
|}

declare type RelativeTimeExpression = {|
  type: "relative",
  interval: "minute" | "hour" |  "day" | "week" | "month" | "quarter" | "year",
  step: number
|}

High-Level Helper: Lightweight Crossfilter Mananger

Should provide a lightweight layer of abstraction to manage the "crossfiltering" behavior among nodes.

The aim is to help with adding / modifying / removing crossfilters.

An example is something like https://github.com/mapd/mapd-data-layer/blob/master/example/vega/src/crossfilter.js

Explore Rewriting Parser and Writer in ReasonML

@nytai

We know that the module will be easier to read/write in ReasonML. I wonder if there are also performance benefits as well.

Support Subquery Wherever There Can be An Expression

good test case:

select count(*) from (
    select distinct user_b from twitter_edges where user_a in (
        select distinct user_b from twitter_edges where user_a in (
            select distinct user_b from twitter_edges where user_a in (
                select distinct user_b from twitter_edges where user_a = '40981798'
            )
        )
    )
)

Extensible and Modular Parser and Writer

The goal of this feature is to expose the SQL writer as a module that can be extended by the user.

The user would be able to declare a new type of transform or expression by registering a "definition" of it, along with a function that parses it.

const writer = createSQLWriter()
writer.registerParser(typeDef, parser)
writer.writeSQL(DataState)

This would be the same writer module used internally by the graph instance.

const graph = createGraph()
graph.getWriter().registerParser(transformDef, parser)