GithubHelp home page GithubHelp logo

hbz / limetrans Goto Github PK

View Code? Open in Web Editor NEW
3.0 8.0 1.0 52.8 MB

Transform library metadata using Metafacture

License: Other

Java 100.00%
elasticsearch metafacture bibliographic-data transformation

limetrans's Introduction

limetrans - Library Metadata Transformation

Travis CI SonarCloud

Configuration

Limetrans can be regarded as a configuration frame for the use of Metafacture for library purposes. It makes use of a JSON configuration scheme and can be abstracted as:

{
  "input" : {
    ...
  },
  "transformation-rules" : "...",
  "output": {
    ...
  },
  ...
}

Input

Input is generally configured like this:

"input" : {
  "queue" : {
    "path" : "a/path/to/your/input/file/",
    "pattern" : "your-marc-xml-input-file.xml",
    "sort_by" : "lastmodified",
    "order" : "desc",
    "max" : 1,
    "normalize-unicode" : false,
    "processor" : "MARC21"
  }
}

MARCXML is the default value for 'processor' thus 'processor' can be omitted when processing MARCXML data.

Transformation

"transformation-rules" : "a/path/to/your/transformation/metafacture/rules/file.xml"

Output

By now, limetrans is written to be used with Elasticsearch. Therefore, the output object mainly contains an Elasticsearch configuration, besides a JSON output option.

"output": {
  "json" : "a/path/to/your/jsonlines/output/file.jsonl",
  "elasticsearch" : {
    "cluster": "elasticsearch-01",
    "host": ["localhost:9300"],
    "index" : {
      "type" : "title",
      "name" : "choose-your-own-index-name",
      "timewindow" : "yyyyMMdd",
      "settings" : "a/path/to/your/elasticsearch/settings.json",
      "mapping" : "a/path/to/your/elasticsearch/mapping.json",
      "idKey" : "the-id-field-name-configured-in-your-metafacture-rules-file"
    },
    "update" : false,
    "delete" : false,
    "bulkAction" : "index",
    "maxbulkactions" : 100000
  },
  "pretty-printing" : false
}

"type" : "title" is a suggestion, assuming you might want to transform and store book title information.

Further configuration

"catalogid" : "choose-your-own-catalog-id",
"collection" : "choose-your-own-collection"

Please find examples for the configuration of limetrans in the source code, e.g. here.

Setup project

Get Source

$ git clone [email protected]:hbz/limetrans.git

Setup Elasticsearch

Download and install Elasticsearch

$ cd third-party
$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-2.1.1.zip
$ unzip elasticsearch-2.1.1.zip
$ cd elasticsearch-2.1.1
$ bin/elasticsearch

Check with curl -X GET http://localhost:9200/ if all is well.

Configure Elasticsearch

Currently, Elasticsearch is configured to run on a cluster named elasticsearch-01, see e.g. here. Make sure you have accordingly configured the cluster name in /etc/elasticsearch/elasticsearch.yml.

Optionally, you may want to install the head plugin

$ cd third-party/elasticsearch-2.4.0
$ bin/plugin install mobz/elasticsearch-head

Contribute

Coding conventions

Indent blocks by four spaces and wrap lines at 100 characters. For more details, refer to the Google Java Style Guide.

Bug reports

Please file bugs as an issue labeled "Bug" here.

limetrans's People

Contributors

blackwinter avatar philboeselager avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

blackwinter

limetrans's Issues

Add Readme

Add a readme file to this project, providing an overall installation and use guide.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.