GithubHelp home page GithubHelp logo

heavyai / metis Goto Github PK

View Code? Open in Web Editor NEW
39.0 27.0 9.0 29.54 MB

Tools for massively parallel and multi-variate data exploration

License: Other

JavaScript 98.03% Shell 1.97%
visualization charting sql crossfilter mapd

metis's Introduction

Metis · Build Status

Tools for massively parallel and multi-variate data exploration.

Quickly build interactive visualizations powered by the speed of HeavyDB.

Data Layer

Modules for building declarative and cross-filtering data pipelines

View Layer

Modules for bootstrapping a multi-dimensional visualization framework

Thrift Layer

Modules for utilizing the MapD Core backend via the Thrift protocol

License

This project is licensed under the Apache License, Version 2.0.

metis's People

Contributors

bmatcuk avatar clhenrick avatar cmatzenbach avatar domoritz avatar jonvuri avatar jp-harvey avatar mapd-bot avatar mrblueblue avatar nytai avatar thomasg77 avatar thomasoniii avatar tmostak avatar uyanga-gb avatar vrajpandya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metis's Issues

Explore Use of Vega-Lite as Higher-Level API

Vega-Lite provides a higher level visualization grammar that ties together encodings and data transformations.

It would useful to explore how that grammar maps directly to Vega encodings and the mapd-data-layer transformations.

One possible outcome of this exploration is defining a higher-level parser for vega-lite specifications. This parser would translate vega-lite spec to a vega spec and a data transform spec (to be used by the data layer)

Settle on Data Node Constructor API

Current API

const graph = createGraph()

const root = graph.data({
  source: "flights",
  name: "root"
})

const child = graph.data({
  source: "root",
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ]
})

const grandchild = graph.data({
  source: "child",
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ]
})

Proposed API 1

const graph = createGraph()

const root = graph.createRoot({
  source: "flights",
  name: "root"
})

const child = root.createChild({
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ]
})

const grandchild = child.createChild({
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ]
})

Proposed API 2

const graph = createGraph()

const root = graph.createRoot({
  source: "flights",
  name: "root"
})

const childNode = createNode({
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ]
})

const grandchildNode = createNode({
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ]
})

const child = root.pushChild(childNode)
const grandchild = child.pushChild(grandchildNode)

The two API proposals seem compatible as well.

Graph State is currently represented as:

const state = {
  root: {
    source: "flights",
    name: "root"
  },
  child: {
    source: "root",
    name: "child",
    transform: [
      {
        type: "filter",
        expr: "recipient_party = 'D'"
      }
    ]
  },
  grandchild: {
    source: "child",
    name: "grandchild",
    transform: [
      {
        type: "aggregate",
        fields: ["*", "amount"],
        ops: ["average", "average"],
        groupby: "recipient_party"
      }
    ]
  }
}

In the latter two proposals it would be represented as:

const state = {
  root: {
    source: "flights",
    name: "root",
    children: [
      child
    ]
  }
}

const childState = {
  source: root,
  name: "child",
  transform: [
    {
      type: "filter",
      expr: "recipient_party = 'D'"
    }
  ],
  children: [
    grandchild
  ]
}

const grandchildState = {
  source: child,
  name: "grandchild",
  transform: [
    {
      type: "aggregate",
      fields: ["*", "amount"],
      ops: ["average", "average"],
      groupby: "recipient_party"
    }
  ],
  children: []
}

Support Project Transform in Favor of "Formula" Transform

The Project Transform will basically be the Formula Transform.

It will support string expressions as well as object type expressions.

Expressions can also be specified as an array.

{
  type: "project",
  expr: Array<string | Expression> | string | Expression,
  as?: Array<string> | string
}
{
  type: "project",
  expr: {
     type: "date_trunc",
     unit: "month",
     field: "tweet_time",
     as: "key0"
  }
}
// SELECT date_trunc(month, tweet_time) as key0
{
  type: "project"
  expr: ["conv(lon)", "conv(lat)", "lang", "followers"],
  as: ["x", "y", "size", "color"]
}
// SELECT conv(lon) as x, conv(lat) as y, lang as size, followers as color

Support Multiplicative Sampling Transform

{
  type: "sample",
  method: "multiplicative",
  size: number,
  limit: number
}
const ratio = Math.min(limit/size, 1.0)
const threshold = Math.floor(4294967296  * ratio);

`MOD(${table}.rowid * 265445761, 4294967296) < ${threshold}`

Support Joins

SELECT ticker_subticker_map.ticker as ticker,end_month_date,AVG(avg_amount) as aov,COUNT(DISTINCT(final_transactions.resolved_mem_id)) as num_buyers,COUNT(final_transactions.resolved_mem_id) as num_purchases
FROM final_transactions
JOIN cohort_members_true as coh
ON coh.resolved_mem_id = final_transactions.resolved_mem_id
JOIN ticker_subticker_map
ON ticker_subticker_map.subticker = final_transactions.ticker AND date_date >= COALESCE(ticker_subticker_map.acquisition_date, date_date)
JOIN (SELECT
         MIN(start_week_date) AS start_week_date,
                                    MAX(end_week_date)   AS end_week_date,
                                    end_month_date
         FROM calendar_months
         WHERE start_week_date >= '2014-01-01'
         GROUP BY end_month_date) as tw
ON date_date BETWEEN tw.start_week_date AND tw.end_week_date
WHERE 1=1
  AND final_transactions.date_date >= '2014-01-01' AND final_transactions.transaction_base_type = 'debit'
  AND date_date > coh.birth_month
GROUP BY ticker_subticker_map.ticker, end_month_date
LIMIT 10;

Improve Crossfilter/ResolveFilter API

Currently "crossfiltering" behavior is implemented through the transforms Crossfilter and ResolveFilter.

The Crossfilter transform represents a set of filter transformations that should be applied to child nodes. These filters only get applied when the child nodes explicitly allowed them through the ResolveFilter transform .

For instance, a parent can have this Crossfilter transform

const xfilterDataNode = graph.data({
  source: "flights_donotmodify",
  name: "xfilter",
  transform: [
    {
      type: "crossfilter",
      signal: "xfilter",
      filter: [
        {
           type: "filter",
           id: "amount-filter",
           expr: {
              type: "between"
              field: "amount",
              left: 50,
              right: 100
           }
        },
        {
           type: "filter",
           id: "party-filter",
           expr: {
              type: "="
              left: "party",
              right: "D"
           }
        }
      ]
    }
  ]
});

And a child can resolve it like so (ignoring the party-filter)

const childDataNote = graph.data({
  source: "xfilter",
  name: "child",
  transform: [
    {
      type: "resolveFilter",
      filter: { signal: "xfilter" },
      ignore: ["party-filter"]
     }
  ]
});

Open to any other possible ideas.

data-layer: SQL parser should escape single quotes in string values

String values containing single quotes that are passed to mapd-data-layer, such as for a SQL CASE statement, are currently not having their single quotes escaped, so a string like 'Chicago O'Hare International' will cause a malformed SQL query.

For example:

CASE 
WHEN origin_name IN 
('Chicago O'Hare International','William B Hartsfield-Atlanta Intl','Dallas-Fort Worth International','Los Angeles International','Phoenix Sky Harbor International') 
THEN origin_name 
ELSE 'undefined' 
END AS key1

Implement Relation Builder API as Node Method

The general idea is to add to the dataNode instances helper methods for constructing and setting transform objects.

For instance, this:

// extract and between would be expression creators
node.project("key1", extract("day", "contrib_date"))
node.filter(between("amount", [0, 100]))

would be equivalent to:

node.transform({
  type: "project",
  expr: {
    type: "extract",
    unit: "day",
    field: "contrib_date"
  },
  as: "key1"
})

node.transform({
  type: "filter",
  expr: {
    type: "between",
    field: "amount",
    left: 0,
    right: 100
  }
})

Support Subquery Wherever There Can be An Expression

good test case:

select count(*) from (
    select distinct user_b from twitter_edges where user_a in (
        select distinct user_b from twitter_edges where user_a in (
            select distinct user_b from twitter_edges where user_a in (
                select distinct user_b from twitter_edges where user_a = '40981798'
            )
        )
    )
)

Extensible and Modular Parser and Writer

The goal of this feature is to expose the SQL writer as a module that can be extended by the user.

The user would be able to declare a new type of transform or expression by registering a "definition" of it, along with a function that parses it.

const writer = createSQLWriter()
writer.registerParser(typeDef, parser)
writer.writeSQL(DataState)

This would be the same writer module used internally by the graph instance.

const graph = createGraph()
graph.getWriter().registerParser(transformDef, parser)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.