GithubHelp home page GithubHelp logo

erikap / aggr-web Goto Github PK

View Code? Open in Web Editor NEW

This project forked from caspervg/aggr-web

0.0 2.0 0.0 20 KB

This repository contains the Ruby components of the Aggr project. It provides functionality to translate incoming JSON API aggregation requests to a semantic entity and place it in a triple-store.

License: MIT License

Ruby 100.00%

aggr-web's Introduction

Aggr-Web

Description

This repository contains the Ruby components of the Aggr project. It provides functionality to translate incoming JSON API aggregation requests to a semantic entity and place it in a triple-store, where it will be read and interpreted by the Aggr-Master and finally executed by a Aggr-Worker. Available on DockerHub: caspervg/aggr

Implemented by Casper Van Gheluwe (UGent) during the summer of 2016, as part of an internship at TenForce.

Components

  • web.rb
    • HTTP request handling
  • lib/additional_escape_helpers.rb
  • aggregation_service/request_validations.rb
    • Performs some validations on incoming HTTP requests to ensure that the required attributes in general, and for the request aggregation in particular, are included.
  • aggregation_service/sparql_queries.rb
    • Builds an executes a SPARQL query to insert a new aggregation request entity into the triple-store.

HTTP

POST /aggregations

Inserts a new Aggregation Request entity into the triple-store with given properties.

Headers

  • Content-Type: application/vnd.api+json

Payload

{
    "data": {
        "type": "aggregations",
        "attributes": {
            "dataset": "example_dataset_identifier",
            "input": "/user/example/example_input.csv",
            "output": "/user/example/output/",
            "aggregation_type": "kmeans",
            "provenance": true,
            "big_data": false,
            "input_class": "net.caspervg.aggr.ext.TimedGeoMeasurement",
            "output_class": "net.caspervg.aggr.ext.WeightedGeoMeasurement",
            "parameters": {
                "grid_size": 0.005,
                "levels": 3,
                "centroids": 25,
                "iterations": 50,
                "metric": "EUCLIDEAN",
                "others": [
                  "/user/test/test1.csv",
                  "/user/test/test2.csv"
                ],
                "key": "weight",
                "amount": 4,
                "dynamic": {
                  "query": "special sparql query to retrieve data from the triple store",
                  "source_key": "special_key_of_the_source_for_data"
                }
            },
            "environment": {
                "spark": "local[4]spark://spark-master:7077",
                "hdfs": "hdfs://namenode:8020"
            }
        }
    }
}
  • type: MUST be aggregations
  • attributes:
    • dataset: REQUIRED, unique identifier of the dataset to create
    • input: REQUIRED, location of a CSV file (or SPARQL HTTP endpoint) to read data from (ignored for average aggregation)
    • output: REQUIRED, location of a directory to store data in (in case attributes.big_data is set to true)
    • aggregation_type: REQUIRED, type of the aggregation to execute (MUST be one of grid, time, diff, combination, average or kmeans)
    • provenance: REQUIRED, determines if provenance (parents) of the aggregated measurements should be stored
    • big_data: REQUIRED, determines if the result will be stored as CSV (if true) or in the triple-store (if false)
    • input_class: REQUIRED, package and name of the class to use to read measurements (this class MUST be in the classpath of the Aggr-Master and Aggr-Worker and MUST implement the Measurement interface)
    • output_class: REQUIRED, package and name of the class to use to write measurements (this class MUST be in the classpath of the Aggr-Master and Aggr-Worker and MUST implement the Measurement interface)
    • parameters:
      • grid_size: sensitivity of the grid (for grid aggregations only)
      • levels: number of detail levels to generate (for time aggregations only)
      • centroids: number of centroids to generate (for kmeans aggregations only)
      • iterations: maximum number of iterations of the k-Means algorithm (for kmeans aggregations only)
      • metric: metric to use to calculate distances between centroids & measurements (for kmeans aggregations only, MUST be one of EUCLIDEAN, MANHATTAN, CHEBYSHEV, CANBERRA or KARLSRUHE
      • others if diff aggregation: array with one String element, the location of the subtrahend dataset (as CSV)
      • others if average aggregation: array of Strings, the locations of datasets to calculate average of (as CSVs)
      • key: key to extract value to subtract or average (for average or diff aggregations)
      • amount: expected number of values per combination (for average aggregations)
      • dynamic: extra dynamic properties for reading measurements, executing the aggregation or writing results. Keys supported: query, latitude_key, longitude_key, source_key, time_key
    • environment:
      • spark: REQUIRED, location of the Spark master. If empty, plain Java aggregations will be executed instead
      • hdfs: REQUIRED, location of the HDFS server. If empty, will assume the files are available locally.

Development

The Aggr-Web application can be ran in development mode (with automatic reloading) using the following command:

docker run -p 80:80 --name aggr-web --volume /home/casper/IdeaProjects/aggr-web:/app -e RACK_ENV=development semtech/mu-ruby-template:2.0.0-ruby2.3

Docker

A Docker configuration (Dockerfile) is provided, based on semtech/mu-ruby-template. When ran, it will automatically start a Ruby/Sinatra HTTP server that will accept aggregation requests. Images are also available on DockerHub: caspervg/aggr-web.

aggr-web's People

Contributors

caspervg avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.