P6: Declarative Specification for Interactive Machine Learning and Visual Analytics
P6 is a research project for developing a declarative language to specify visual analytics processes that integrate machine learning methods with interactive visualization for data analysis and exploration. P6 uses P4 for GPU accelerated data processing and rendering, and leverages Scikit-Learn and other Python libraries for supporting machine learning algorithms.
Demo
Demos for using declarative specifications with clustering, dimension reduction, and regression here:
- K-Means Clustering and PCA
- RandomForest Regressor
- Hierarchical Clustering and Multiple Views
- Brushing and Linking with Dimension Reductions
Installation
To run P6, first install both the JavaScript and Python dependencies and libraries:
npm install
pip install -r python/requirements.txt
Development and Examples
For development and trying the example applications, use the following commands for starting the server and client
npm start
Or start server and client on two different terminals/consoles:
npm run server
npm run client
The example applications can be accessed at http://localhost:8080/examples/
Usage
//config
let app = p6()
.data({url: 'data/babies.csv'}) // input data
.analyze({
// analyze the data using sklearn.decomposition.PCA and store the result in a new variable 'PC'
PC: {
module: 'decomposition',
algorithm: 'PCA',
n_components: 2,
features: ['BabyWeight', 'MotherWeight', 'MotherHeight', 'MotherWgtGain', 'MotherAge']
}
})
app.layout({
container: "app", // id of the div
viewport: [800, 400]
})
.visualize({
chart: {
mark: 'circle', size: 8,
x: 'PC1', y: 'PC0',
color: 'clusters', opacity: 0.5,
}
})
API
P6 provides a JavaScript API with a declarative language for specifying operations in visual analytics processes, which include data processing, machine learning, visualization, interaction.
Data
data({source, selection, preprocess, transform})
- source: source of the dataset, example: {url: './data/babies.csv}
- select: select data subset by rows, columns, or data types. Example: {select: {nrows: 10000, columns: ['BabyWeight', 'BabyGender']}}
- nrows - number of rows
- columns - specify which data columns
- dtype - select
categorical
ornumerical
data
- preprocess: preprocess data by dtypes.
- Example for using one-hot encoding on categorical data: {preprocess: {categorical: 'OneHot'}}
- Example for dropping null values: {preprocess: {null: 'drop'}}
- Example for filling null values by columns: {preprocess: {null: {fill: {BabyWeight: 8}}}
Machine Learning and Analytics
analyze({algorithm, features, scaling, [parameters]})
- algorithm: supported algorithms and methods - clustering, dimension reduction, manifold
- features: data fields as the input to the specified
algorithm
. - scaling: use
StandardScaler
,LabelEncoder
minmax_scale
, or other preprocessors for scaling the input data - [parameters]: use the same name as the functions in Python libraries. As shown in the example shown above,
n_component
is directly passed tosklearn.decomposition.PCA
. More parameters can be set in this way.
Train model for classification and regression tasks
model({module, method, trainingData, features, target, [parameters]})
- module: Python library and module containing the
method
for fitting the model. Example:sklearn.linearmodel
. - method: the function to be called for fitting the model. Example:
LinearRegression
. - trainingData: data for training the model
- features: input features to the model
- target: the data field for prediction
- [parameters]: hyperparameters for the model
Visualization
To organize the views for visualization, the layout
function can be used for configuring the views and layouts.
View Layout
layout({id, width, height, padding, [options]})
To visualize data or analysis result, call `visualize' to transform data (optional), choose a visual mark, and specify the visual encoding for mapping data to visual marks.
Visual Encoding/Mapping
visualize({transform, visualMark, [encoding]})
Publication
Jianping Kelvin Li and Kwan-Liu Ma. P6: A Declarative Language for Integrating Machine Learning in Visual Analytics. IEEE Transactions on Visualization and Computer Graphics (Proc: VAST), 2020
Acknowledgement
This research was sponsored in part by the U.S. National Science Foundation through grant NSF IIS-1528203 and U.S. Department of Energy through grant DE-SC0014917.