GithubHelp home page GithubHelp logo

neuhausler / dockerized-dbpedia Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dbpedia/virtuoso-sparql-endpoint-quickstart

0.0 1.0 0.0 71 KB

creates a docker image with Virtuoso preloaded with the latest DBpedia dataset

Shell 89.74% Dockerfile 10.26%

dockerized-dbpedia's Introduction

Dockerized-DBpedia

Creates and runs a Virtuoso Open Source instance preloaded with a Databus Collection and the VOS DBpedia Plugin installed.

Usage

All you need to do is to set a password in the .env file change the COLLECTION_URI in docker-compose.yml and then run the dockerized-dbpedia.sh script in the project root directory.

This will build the image of the loader/installer process that will load data of the Databus Collection to the Virtuoso Open Source instance and install the DBpedia Plugin. Once the image has been built it runs 'docker-compose up' to start three containers:

Configuration

Before running the script you should configure the containers in the docker-compose.yml. Details for the parameters are listed below.

OpenLink VOS Instance

You can read the full documentation of the docker image here. The image requires one environment variable to set the admin password of the database:

  • DBA_PASSWORD: Your database admin password
  • VIRT_PARAMETERS_NUMBEROFBUFFERS: Defaults to 2000 which will result in a very long loading time. Increase this depending on the available memory on your machine. You can find more details in the docker image documentation.
  • VIRT_PARAMETERS_MAXDIRTYBUFFERS: Same as VIRT_PARAMTERS_NUMBEROFBUFFERS.

This password is only set when a new database is created. The example docker-compose mounts a folder to the internal database directory for persistence. Note that this folder needs to be cleared in order to change the password via docker-compose.

The second volume specified in the docker-compose file connects the downloads folder to a directory in the container that is accessible by the virtuoso load script. Accessible paths are set in the internal virtuoso.ini file (DirsAllowed). As the docker-compose uses the vanilla settings of the image the downloads folder is mounted to /usr/share/proj which is in the DirsAllowed per default.

Databus Download Client

This project uses the minimal DBpedia Databus download client. You can find the documentation here. If you haven't already, download and build the download client docker image. The required environment variables are:

  • TARGET_DIR: The target directory for the downloaded files
  • COLLECTION_URI: A collection URI on the DBpedia Databus

Loader/Installer

You can build the loader/installer container by running

cd ./dbpedia-loader
docker build -t dbpedia-virtuoso-loader .

You can configure the container with the following environment variables:

  • STORE_DATA_DIR: The directory of the VOS instance that the downloads folder is mounted to (/usr/share/proj by default). Since the Loader will tell the VOS instance to start importing files it needs to know where the files are going to be. Additionally the VOS instance needs to be given access to that directory.
  • STORE_DBA_PASSWORD: The admin password specified in the VOS instance (DBA_PASSWORD variable)
  • DATA_DIR: The directory of this container that the downloads folder is mounted to.
  • DOMAIN: The domain of your resource identifiers
  • [OPTIONAL] DATA_DOWNLOAD_TIMEOUT: The amount of seconds until the loader process stops waiting for the download to finish.
  • [OPTIONAL] STORE_CONNECTION_TIMEOUT: The amount of seconds until the loader process stops waiting for the store to boot up.

Example

The default docker-compose.yml will start a VOS instance with the DBpedia Plugin installed containing the data specified in the https://databus.dbpedia.org/kurzum/collections/agro collection (in this case mapping-based geo-data in Russian). Since the resource identifiers are Russian dbpedia identifiers the DOMAIN variable is set to "http://ru.dbpedia.org".

version: '3'
services:
  download:
    image: dbpedia/minimal-download-client:latest
    environment:
      COLLECTION_URI: https://databus.dbpedia.org/dbpedia/collections/latest-core/
      TARGET_DIR: /root/data
      GRAPH_MODE: "download-url" #download-url #this changes behaviour for default graph setting per file, graph will have no effect on rdf statements with context; to disable change to "no"
    volumes:
      - ./downloads:/root/data # has to point to TARGET_DIR
  store:
    image: openlink/virtuoso-opensource-7
    ports: ["${VIRTUOSO_HTTP_PORT}:8890","127.0.0.1:${VIRTUOSO_ISQL_PORT}:1111"]
    environment:
            DBA_PASSWORD: ${VIRTUOSO_ADMIN_PASSWD:?Set VIRTUOSO_ADMIN_PASSWD in .env file or pass as environment variable e.g.  VIRTUOSO_ADMIN_PASSWD= docker-compose up}
    volumes:
      - ./virtuoso-db:/opt/virtuoso-opensource/database
      - ./downloads:/usr/share/proj # has to point to STORE_DATA_DIR in 'load'
  load:
    image: dbpedia/dbpedia-virtuoso-loader:latest
    environment:
      STORE_DATA_DIR: /usr/share/proj
      STORE_DBA_PASSWORD: ${VIRTUOSO_ADMIN_PASSWD:?Set VIRTUOSO_ADMIN_PASSWD in .env file or pass as environment variable e.g.  VIRTUOSO_ADMIN_PASSWD= docker-compose up}
      STORE_ISQL_PORT: 1111 #docker takes care of routing in compose so this is independent of the actual exposed port of store container, don't touch unless you change virtuoso.ini settings
      DATA_DIR: /root/data
      DOMAIN: http://dbpedia.org 
    volumes:
      - ./downloads:/root/data # has to point to DATA_DIR

      

dockerized-dbpedia's People

Contributors

bperel avatar chile12 avatar gone-phishing avatar holycrab13 avatar jimkont avatar jj-author avatar mgns avatar neradis avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.