A simple, Scala-based framework for extracting data and exporting it to JSON and/or CSV. Convenient for making data that Web/JS visualizations use or for users relying on tools such as JMP, R, Excel, Tableau, Spotfire, etc.
- Data exports to CSV and JSON
- CSV exports are "flat", meaning simple and succinct. Intended for Excel, JMP, R, and ilk
- JSON exports have the full graph of data, such as values for histogram bins
- Errors don't cause exports to blow up nor will nonsensical default values be exported (e.g. 0 or -1 for a number)
- CSV cells show a blank if a value couldn't be calculated
- JSON exports omit failed values. Pushes error handling to JS or viz tool
- Convenience methods for writing succinct code without any data serialization boiler plate required
This is an API. It doesn't run on its own other than to provide tests and coverage.
# Build a JAR
sbt clean coverage test coverageReport
...
[info] Statement coverage.: 100.00%
[info] Branch coverage....: 100.00%
[info] Coverage reports completed
[info] All done. Coverage was [100.00%]
See TestMetrics in MetricsSpec.scala for an example that tests all of the features. Below is brief example showing how data can be serialized to CSV.
// Make a `Metrics` instance to have it be serialized
class Example extends Metrics {
override val namespace = "Example"
override val version = "_"
override lazy val values: List[Metric] = List(
Str("Name", "Data Scientist"),
Num("Age", "123"),
DistCon("Data", calcContinuous(Seq(0f, 1f, 0.5f), nBins = 3, sort = true)),
Num("Borken", throw new Exception("Calculation failed!"))
)
}
// Export as CSV for JMP, R, Excel, etc. Notice the error doesn't break the export.
// Also notice that the data is flat and omits histogram bins.
CSV(Paths.get("example.csv", new Example())
# example.csv
Name,Age,Data: Samples,Data: Bins,Data: BinWidth,Data: Mean,Data: Median,Data: Min,Data: Max,Borken
Data Scientist,21,3,3,0.33333334,0.5,0.5,0.0,1.0,
JSON serialization is done similarly, using a self-named serializer.
// Export as CSV for fully serialized data, convenient for Web/JS or data viz tools.
JSON("example.json", new Example())
# example.json
{
"Name": "Data Scientist",
"Age": 21,
"Data": {
"Min": 0.0,
"Mean": 0.5,
"Max": 1.0,
"Bins": [1, 1, 1],
"BinWidth": 0.33333334,
"Samples": 3,
"Median": 0.5
}
}
A more advanced use of this API is to expose a View
, which is exporting tabular data (CSV) with any arbitrary subset
of values, in any column order and with optional custom naming for the columns. This is helpful since most of the usage
of this API is to expose tabular exports for Excel, JMP, R, Tableau, Spotfire and ilk.
class ExampleView extends View {
override lazy val name = "Example View"
override lazy val description = "An example custom tabular export for the metrics API documentation."
// show just three columns, "name", "mean" and "median, and force lowercase names -- for whatever reason that is preferred
override lazy val metrics = List[Col] {
Col("name", "Example", "Name"),
Col("mean", "Example", "Data", Some("Mean")),
Col("median", "Example", "Data", Some("Median")),
}
}
Export of a view is usually handled by the database. See Cache.queriesToCsv
for an example.
These are helpful conventions to follow for supporting common use cases. These examples assume you are using semantic versioning, but any version string can be used in a similar fashion.
Make metrics modules that have a build.sbt
version that is mirrored as the "Code Version" in a companion object, and
also have a "Spec Version" that represents the version of the underlying file format you are parsing. Together these
give a way to later sort data based on if the metrics code was updates and/or if the underlying data format changed.
An example of this can be seen in MetricsWithVersion
and MetricsWithVersion_1_2_3
here in the metrics-examples
repo.
The main strategy is as simple as capturing the version detecting logic in an apply
method of an object, which serves
as the entry point to parsing data. An example is in MetricsWithVersion
here in the metrics-example
repo.
Having a val named blank
that is a null
or otherwise no-arg created instance of the current Metrics
is required
for making headers in CSV exports, namely View
instances. With a blank
you can easily make a view that is just a
copy of all values.
An example of blank
is in MetricsWithVersion
here in the metrics-example
repo.