GithubHelp home page GithubHelp logo

vegas-viz / vegas Goto Github PK

View Code? Open in Web Editor NEW
729.0 43.0 98.0 888 KB

The missing MatPlotLib for Scala + Spark

License: MIT License

Scala 95.28% Jupyter Notebook 4.72%
plotting scala datascience

vegas's Introduction

Vegas

Vegas

TravisCI codecov

Vegas aims to be the missing MatPlotLib for the Scala and Spark world. Vegas wraps around Vega-Lite but provides syntax more familiar (and type checked) for use within Scala.

Quick start

Add the following jar as an SBT dependency

libraryDependencies += "org.vegas-viz" %% "vegas" % {vegas-version}

And then use the following code to render a plot into a pop-up window (see below for more details on controlling how and where Vegas renders).

import vegas._
import vegas.render.WindowRenderer._

val plot = Vegas("Country Pop").
  withData(
    Seq(
      Map("country" -> "USA", "population" -> 314),
      Map("country" -> "UK", "population" -> 64),
      Map("country" -> "DK", "population" -> 80)
    )
  ).
  encodeX("country", Nom).
  encodeY("population", Quant).
  mark(Bar)

plot.show

"Readme Chart 1"

See further examples here

Rendering

Vegas provides several options for rendering plots. The primary focus is using Vegas within interactive notebook environments, such as Jupyter and Zeppelin. Rendering is provided via an implicit instance of ShowRender, which tells Vegas how to display the plot in a particular environment. The default instance of ShowRender uses a macro which attempts to guess your environment, but if for some reason that fails, you can specify your own instance:

// for outputting HTML, provide a function String => Unit which will receive the HTML for the plot
// and use vegas.render.ShowHTML to create an instance for it
implicit val renderer = vegas.render.ShowHTML(str => println(s"The HTML is $str"))

// to specify a function that receives the SpecBuilder instead, use vegas.render.ShowRender.using
implicit val renderer = vegas.render.ShowRender.using(sb => println(s"The SpecBuilder is $sb"))

The following examples describe some common cases; these should be handled by the default macro, but are useful to see (in case you need to construct your own instance of ShowRender):

Notebooks

Jupyter - Scala

If you're using jupyter-scala, then can include the following in your notebook before using Vegas.

import $ivy.`org.vegas-viz::vegas:{vegas-version}`
implicit val render = vegas.render.ShowHTML(publish(_))

Jupyter - Apache Toree

And if you're using Apache Toree, then this:

%AddDeps org.vegas-viz vegas_2.11 {vegas-version} --transitive
implicit val render = vegas.render.ShowHTML(kernel.display.content("text/html", _))

Zeppelin

If you're using Apache Zeppelin:

%dep
z.load("org.vegas-viz:vegas_2.11:{vegas-version}")
implicit val render = vegas.render.ShowHTML(s => print("%html " + s))

The last line in each of the above is required to connect Vegas to the notebook's HTML renderer (so that the returned HTML is rendered instead of displayed as a string).

See a comprehensive list example notebook of plots here

Standalone

Vegas can also be used to produce standalone HTML or even render plots within a built-in display app (useful if you wanted to display plots for a command-line-app).

The construction of the plot is independent from the rendering strategy: the same plot can be rendered as HTML or in a Window simply by importing a different renderer in the scope.

Note that the rendering examples below are wrapped in separate functions to avoid ambiguous implicit conversions if they were imported in the same scope.

A plot is defined as:

import vegas._

val plot = Vegas("Country Pop").
  withData(
    Seq(
      Map("country" -> "USA", "population" -> 314),
      Map("country" -> "UK", "population" -> 64),
      Map("country" -> "DK", "population" -> 80)
    )
  ).
  encodeX("country", Nom).
  encodeY("population", Quant).
  mark(Bar)

HTML

The following renders the plot as HTML (which prints to the console).

def renderHTML = {
  println(plot.html.pageHTML) // a complete HTML page containing the plot
  println(plot.html.frameHTML("foo")) // an iframe containing the plot
}

Window

Vegas also contains a self-contained display app for displaying plots (internally it uses JavaFX's HTML renderer). The following demonstrates this and can be used from the command line.

def renderWindow = {
  plot.window.show
}

Make sure JavaFX is installed on your system or ships with your JDK distribution.

JSON

You can print the JSON containing the Vega-lite spec, without importing any renderer in the scope.

println(plot.toJson)

The output JSON can be copy-pasted into the Vega-lite editor.

Spark integration

Vegas comes with an optional extension package that makes it easier to work with Spark DataFrames. First, you'll need an extra import

libraryDependencies += "org.vegas-viz" %% "vegas-spark" % "{vegas-version}"
import vegas.sparkExt._

This adds the following new method:

withDataFrame(df: DataFrame)

Each DataFrame column is exposed as a field keyed using the column's name.

Flink integration

Vegas also comes with an optional extension package that makes it easier to work with Flink DataSets. You'll also need to import:

libraryDependencies += "org.vegas-viz" %% "vegas-flink" % "{vegas-version}"

To use:

import vegas.flink.Flink._

This adds the following method:

withData[T <: Product](ds: DataSet[T])

Similarly, to the RDD concept in Spark, a DataSet of case classes or tuples is expected and reflection is used to map the case class' fields to fields within Vegas. In the case of tuples you can encode the fields using "_1", "_2" and so on.

Plot Options

TODO

Contributing

See the contributing guide for more information on contributing bug fixes and features.

vegas's People

Contributors

aishfenton avatar asragab avatar chuyqa avatar ckchow avatar daniperez avatar dbtsai avatar f1yegor avatar izhangzhihao avatar jeremyrsmith avatar jopasserat avatar matteverson avatar metasim avatar nightscape avatar oshikiri avatar rgbkrk avatar rogermenezes avatar tjdett avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vegas's Issues

Implement missing BasicPlot examples

We have 14/95 currently. Question:

  • Worth doing them all? I think yes.. maybe? If we look at them as integration tests, then good to have that much coverage.
  • With so many we should split BasicPlots into several files to keep it maintainable. We could use: Area, Bar, Layer, Misc, etc. Based on the prefix's of the spec files they have.

Support Layered Specs

Vega-Lite now supports layers. Although currently poorly documented, it works pretty well so we should support it (easyish to add)

Make notebook import easier

Currently notebook import requires this somewhat ugly line:

implicit val displayer: String => Unit = (s) => println(s"%html $s")

Explore if we can hide it.

Problem is with Jupyter. You need access to it's "display" variable, so you'd need to pass that into some kind of init method. But then the displayer itself is an implicit, so needs to be in local scope. Hmmmm... To be explored.

Write notebook docs

  • Investigate CSS + nbconvert. Make sure it looks nice, and we can link between pages? (Sudeep)
  • Integrate into sbt build (Aish)
  • Write general overview on how to plot
  • Write withData tutorial (plus Spark integration)
  • Write rendering + notebook tutorial

Implement Basic Flink Support and Write Unit Tests

In addition to Spark I'd like to also be able to use Apache Flink, I have a PR sitting that POCs this possibility. It would work quite similarly to the spark integration (i.e. pass it a FlinkDataSet and a case class/or anything that extends Product with Serializable).

Spark Integration Issues

I've been trying to use case class RDDs in Jupyter, but I can't seem to load or call the right function

import vegas._
import vegas.render.HTMLRenderer._
implicit val displayer: String => Unit = { s => kernel.display.content("text/html", s) }
import vegas.spark.Spark._

Vegas("NAME OF CHART").withRDD(RDD[TESTCLASS]).<More parameters>.show

Name: Compile Error
Message: :97: error: value withRDD is not a member of vegas.DSL.SpecBuilder
Vegas("helloworld").withRDD(subsample2).show
^
StackTrace:

You're readme needs to be updated to. Your import statement is wrong, missing the spark part. Also the the spark section is wrong where you say you can call withData and pass an RDD in. The code in the spark module shows its withRDD. So which is it? I can't seem to get this to work, I would love to get this to work. Any help would be appreciated.

Implement encode*.{ Order, Path }

Order and Path are currently missing as encode options. They're also not documented in the vega-lite site so far, so not sure of their purpose.

Lets not add them yet, but documenting here for future release once they take more shape.

Can not build vegas-viz

Sorry for asking question if this is the not place for this kind of question. This is the error I get when building vegas. Anything I miss ?

[error] /Users/jzhang/github/Vegas/core/src/main/scala/vegas/render/WindowRenderer.scala:15: object WebConsoleListener is not a member of package com.sun.javafx.webkit
[error] import com.sun.javafx.webkit.WebConsoleListener
[error]        ^
[error] /Users/jzhang/github/Vegas/core/src/main/scala/vegas/render/WindowRenderer.scala:37: not found: value WebConsoleListener
[error]   WebConsoleListener.setDefaultListener(new WebConsoleListener {
[error]   ^
[error] /Users/jzhang/github/Vegas/core/src/main/scala/vegas/render/WindowRenderer.scala:37: not found: type WebConsoleListener
[error]   WebConsoleListener.setDefaultListener(new WebConsoleListener {
[error]                                             ^

Update notebook examples

/docs/* contains notebook examples but they're out of date now. These need to be updated.

Maybe code gen using the fixture would be nice?

Nulls in dataset result in NPE

When loading Movies dataset through Spark, a NPE exception is thrown.

Need to handle null values more gracefully. Since Vegas takes data as a Map[String,Any] we could check if DF cell is null, and then just set value to null (encoder already handles turning this into a Json Null at appropriate time).

More generally withDataFrame should set native values where possible. Currently it turns everything into a Map[String, String].

Plot not resizing in Zeppelin

Getting the following JS error. Probably because iFrame content hasn't finished rendering when I try to resize iFrame.

Uncaught TypeError: Cannot read property 'scrollHeight' of null

Scatter matrix or something comparable available?

I'm checking out Vegas at the moment for visualizing Spark mllib models. For example I want to visualize a kmeans clustering model with more than two dimensions/features.

Something like a scatter matrix (or at least a 3-dimensional scatter plot) would be perfect... are there any plans for this? Or am I missing something how to plot such mllib models properly with Vegas? The only way I found is to plot multiple 2-dimensional plots with the different feature combinations... but this is way too laborious for my purposes.

Example scatter matrix for multple dimensions: http://www.statmethods.net/graphs/images/spmatrix1.jpg

Thank you!

Remove unsupported JavaFX API use

The following isn't in all versions of JavaFX, need to find a new way to check unit test result.

import com.sun.javafx.webkit.WebConsoleListener

One possibility, check document content to make sure canvas has been inserted. OR change render type to be SVG, so we can check content of plot more.

FYI @dbtsai

Tidy up code gen

  • Remove generated code from check-in
  • Tidy up build.sbt (I think it can be simplified now)

Spark integration broken

Two bugs:

  • withDataFrame return type Any, rather than SpecBuilder.
  • vegas.spark package conflicts with spark implicit import in Spark 2.0. We need to rename to something else.

Vega-lite dependency broken

Currently vega-lite JS dependency points to the latest version, and the latest vega-lite update has broken Vegas ;(

Need to change how JS dependencies are loaded so that they are locked to a specific version.

Write contributor FAQ

There's a few idiosyncratic choices in the code that should be documented for people who want to submit s PR.

Implement Legend DSL

Encoding for color, size, opacity, and shape can also take a Legend parameter. This needs to be implemented.

Cannot pass array of maps to SpecBuilder.withData

I'm using vegas 0.2.3 with scala 2.10 in an Apache Toree notebook, and it appears that I can't pass an array of maps to withData. For example:

(Vegas().withData(
    Array(Map("id" -> 0, "val" -> 10.1),
          Map("id" -> 1, "val" -> 9.2))
    )
    .encodeX("id", Nominal)
    .encodeY("val", Quantitative)
    .mark(Bar).show)

gives the error:

Name: Compile Error
Message: <console>:93: error: overloaded method value withData with alternatives:
  (url: String,formatType: vegas.spec.FormatType)vegas.DSL.SpecBuilder <and>
  (values: Map[String,Any]*)vegas.DSL.SpecBuilder
 cannot be applied to (Array[scala.collection.immutable.Map[String,AnyVal]])
              (Vegas().withData(
                       ^

I tried using tupled, but I get the same overloaded method error. Obviously, for the small example I've shown here I can just get rid of the wrapping Array, but in general I want to programmatically supply data to withData. Is there a way to do this?

BTW, I can see from the README that I can recompile spark and get a withData method on RDDs and DataFrames, and this would solve my problem, but in my spark environment I'm stuck with scala 2.10 at the moment, which I think means I can't get a version of Vegas higher than 2.11.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.