GithubHelp home page GithubHelp logo

strhist's Introduction

GETTING STARTED

Abstract

This is an extension to the STHoles algorithm for RDF data, based on URI prefixes.

Building project

Prerequisites

  • git
  • mvn
  • java 1.7 or above

Instructions

  • Clone from bitbucket
  • Switch branch to 'multiproject'
  • Change directory into the project root, etc path_to_project/sthist/
  • Run 'mvn package' to build project
  • The jar can be found in path_to_project/sthist/evaluation/target/

Experiments & Evaluation

The experiment procedures are divided in 3 main executables for preparing the workload, refining and evaluating it. The abstract order of execution is as follows:

  1. Prepare training workload
  2. Refine training torkload
  3. Evaluate

PrepareTrainingWorkload.java: Queries a repository and intercepts the feedback, writing it as logs using java serialization by default in /var/tmp/.

RefineTrainingWorkload.java: Parses the logs and refines a histogram with the query feedback. The output destination is given by the user. The histogram is serialized at the end of refinement using JSON and VOID formats (VOID format can be loaded and visualized in Eleon.

Evaluate.java: Parses the histogram and executes point queries both in histogram and repository, so as to get the estimated and actual cardinality of each point query.

In our setup the above executables are being repeatedly run for each triple store using unix scripts. We process the final results using unix scripts in order to extract statistics as the absolute average error, the percentage of non root evaluation and the only-root evaluation (the final results are being printed as Year, Prefix, Act, Est, AbsErr%).

Running a demo

The code for the experiment setup can be found and tuned in 'package gr.demokritos.iit.irss.semagrow.tools'.

Setup Description

A histogram needs training workload in order to be refined so the first thing to be done is to prepare it for consumption. We use SPARQL for getting workload and we do that by quering a triple store. In case there is no triple store locally available, someone could use HTTP SPARQL querying in order to get the desired workload. The only thing to be changed is the instantiation of the Sail Repository, in getRepository() inside Utils.java. The workload are queries we artifially produced by trimming URIs (to create prefixes) we found on our triple store. Other setup adjustments can be done by editing PrepareTrainingWorkload.java, such as the query to be evaluated, the number of queries etc.

After having the training workload prepared, the histogram is ready to be instatiated and trained. This is done by running RefineTrainingWorkload.java, which parses the data from the previous step and creates a histogram and refines it based on that workload. At this point someone could visualize the histogram by using any of our serialization formats (json and void).

The evaluation of the histogram is being conducted as follows: We have prepared artificial point queries which are being fired both on the triple store (to get the actual cardinality) and the histogram (to get the estimated cardinality).

How to run

After cloning and building project using mvn, the executable can be run as follows (Dlog4j is optional). Example for preparing training workload: java -Dlog4j.configuration=file:/path_to_log4j.properties/ -cp /path_to_jar/ gr.demokritos.iit.irss.semagrow.tools.expirementfixedprefix.PrepareTrainingWorkload {various_options}

strhist's People

Contributors

acharal avatar kzama avatar nickozoulis avatar stasinos avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

lefteriskat

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.