GithubHelp home page GithubHelp logo

jfalkner / metrics Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 53 KB

A simple, Scala-based framework for extracting data and exporting it to JSON and/or CSV. Convenient for making data that Web/JS visualizations use or for users relying on tools such as JMP, R, Excel, etc.

License: BSD 3-Clause Clear License

Scala 100.00%

metrics's Introduction

Metrics

A simple, Scala-based framework for extracting data and exporting it to JSON and/or CSV. Convenient for making data that Web/JS visualizations use or for users relying on tools such as JMP, R, Excel, Tableau, Spotfire, etc.

Key Features

  • Data exports to CSV and JSON
    • CSV exports are "flat", meaning simple and succinct. Intended for Excel, JMP, R, and ilk
    • JSON exports have the full graph of data, such as values for histogram bins
  • Errors don't cause exports to blow up nor will nonsensical default values be exported (e.g. 0 or -1 for a number)
    • CSV cells show a blank if a value couldn't be calculated
    • JSON exports omit failed values. Pushes error handling to JS or viz tool
  • Convenience methods for writing succinct code without any data serialization boiler plate required

Usage

This is an API. It doesn't run on its own other than to provide tests and coverage.

# Build a JAR
sbt clean coverage test coverageReport

... 
[info] Statement coverage.: 100.00%
[info] Branch coverage....: 100.00%
[info] Coverage reports completed
[info] All done. Coverage was [100.00%]

Exporting CSV or JSON

See TestMetrics in MetricsSpec.scala for an example that tests all of the features. Below is brief example showing how data can be serialized to CSV.

// Make a `Metrics` instance to have it be serialized
  class Example extends Metrics {
    override val namespace = "Example"
    override val version = "_"
    override lazy val values: List[Metric] = List(
      Str("Name", "Data Scientist"),
      Num("Age", "123"),
      DistCon("Data", calcContinuous(Seq(0f, 1f, 0.5f), nBins = 3, sort = true)),
      Num("Borken", throw new Exception("Calculation failed!"))
    )
  }

// Export as CSV for JMP, R, Excel, etc. Notice the error doesn't break the export.
// Also notice that the data is flat and omits histogram bins.
CSV(Paths.get("example.csv", new Example())
# example.csv 
Name,Age,Data: Samples,Data: Bins,Data: BinWidth,Data: Mean,Data: Median,Data: Min,Data: Max,Borken
Data Scientist,21,3,3,0.33333334,0.5,0.5,0.0,1.0,

JSON serialization is done similarly, using a self-named serializer.

// Export as CSV for fully serialized data, convenient for Web/JS or data viz tools.
JSON("example.json", new Example())
# example.json
{
  "Name": "Data Scientist",
  "Age": 21,
  "Data": {
    "Min": 0.0,
    "Mean": 0.5,
    "Max": 1.0,
    "Bins": [1, 1, 1],
    "BinWidth": 0.33333334,
    "Samples": 3,
    "Median": 0.5
  }
}

Custom CSV formatting: subsets, ordering and renaming columns

A more advanced use of this API is to expose a View, which is exporting tabular data (CSV) with any arbitrary subset of values, in any column order and with optional custom naming for the columns. This is helpful since most of the usage of this API is to expose tabular exports for Excel, JMP, R, Tableau, Spotfire and ilk.

class ExampleView extends View {
  override lazy val name = "Example View"
  override lazy val description = "An example custom tabular export for the metrics API documentation."
  // show just three columns, "name", "mean" and "median, and force lowercase names -- for whatever reason that is preferred
  override lazy val metrics = List[Col] {
    Col("name", "Example", "Name"),
    Col("mean", "Example", "Data", Some("Mean")),
    Col("median", "Example", "Data", Some("Median")),
  }
}

Export of a view is usually handled by the database. See Cache.queriesToCsv for an example.

Suggested Versioning Conventions

These are helpful conventions to follow for supporting common use cases. These examples assume you are using semantic versioning, but any version string can be used in a similar fashion.

Use 'Code Version' and 'Spec Version"

Make metrics modules that have a build.sbt version that is mirrored as the "Code Version" in a companion object, and also have a "Spec Version" that represents the version of the underlying file format you are parsing. Together these give a way to later sort data based on if the metrics code was updates and/or if the underlying data format changed.

An example of this can be seen in MetricsWithVersion and MetricsWithVersion_1_2_3 here in the metrics-examples repo.

Use and object with apply to encapsulate version parsing logic

The main strategy is as simple as capturing the version detecting logic in an apply method of an object, which serves as the entry point to parsing data. An example is in MetricsWithVersion here in the metrics-example repo.

blank for making CSV headers

Having a val named blank that is a null or otherwise no-arg created instance of the current Metrics is required for making headers in CSV exports, namely View instances. With a blank you can easily make a view that is just a copy of all values.

An example of blank is in MetricsWithVersion here in the metrics-example repo.

metrics's People

Contributors

jfalkner avatar

Watchers

 avatar  avatar

metrics's Issues

Improve mean() to handle arrays of NaN?

A failure mode when using the BAM parsing code is when an unaligned BAM is specified. In this case, a bunch of NaN appear and the continuousDist calc ends up throwing an exception.

The exception likely should be avoided in favor of a smarter message that says don't run this on unaligned BAMs.

The more general question is if the code should gracefully squash NaN and return a 0, if no values exist.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.