GithubHelp home page GithubHelp logo

ryan-jacobs / vufindharvest Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vufind-org/vufindharvest

0.0 0.0 0.0 317 KB

VuFind Harvest Tools

License: GNU General Public License v2.0

PHP 100.00%

vufindharvest's Introduction

CI Status VuFindHarvest

Introduction

VuFindHarvest contains OAI-PMH harvesting logic. This is part of the VuFind project (https://vufind.org) but may be used as a stand-alone tool or incorporated into other software dealing with metadata harvesting.

Installation

The recommended method for incorporating this library into your project is to use Composer (http://getcomposer.org). If you wish to use this as a stand-alone tool, simply clone the repository and run composer install or php composer.phar install (depending on your Composer setup) to download dependencies.

Concept

This tool is designed to allow for a pipeline approach to OAI-PMH record processing. Its job is to harvest metadata from one or more repositories into one or more directories. It can support a one-file-per-record or a multiple-records-per-file approach. Records can be manipulated and augmented with the help of certain configuration options (primarily to copy data from the OAI-PMH header into the harvested record itself when necessary).

Each directory containing harvested records also includes a last_harvest.txt file which remembers the most recently harvested record date. This allows the tool to be re-run on subsequent occasions to perform an incremental update and retrieve new content.

Interrupted harvests may sometimes be resumed with the help of a last_state.txt file, that will exist in the harvest directory after an abnormal termination of the tool.

Deleted records are supported through the creation of ".delete" files containing the IDs of records that have been removed from the system.

Usage

This package includes a bin/harvest_oai.php script which provides a command-line interface for OAI-PMH harvesting. All harvesting options may be provided at the command-line, or else a .ini file containing saved options may be loaded using the --ini switch.

Harvesting without an .ini file

For the most basic harvest, you need to specify the --url and --metadataPrefix options and include a target parameter specifying where records should be harvested. For additional options, run php bin/harvest_oai.php --help.

Example:

php bin/harvest_oai.php --url=http://example.com/oai_server --metadataPrefix=oai_dc my_target_dir

Harvesting with an .ini file

When specifying many complex options, or when harvesting multiple repositories at once, configuring the harvest with an .ini file is the best option. The .ini option offers more flexibility than the pure command-line option. Note that any command line options passed to the harvester during an .ini-driven harvest will override the equivalent settings in the .ini file.

For a full list of .ini options and some example configurations, see the sample file found in /etc/oai.ini.

If you specify a parameter following the option list when using an .ini file, only the section of the configuration file matching the parameter will be used, and records will be harvested to a directory with a matching name. For example:

php bin/harvest_oai.php --ini=/etc/oai.ini OJS

If you omit the parameter, all sections of the .ini file will be harvested in sequence.

Architecture

If you wish to incorporate this code into another project, or extend it to support more options, here are the most important top-level classes:

Here are key dependencies used by VuFindHarvester\OaiPmh\Harvester:

Several classes make use of the traits and classes in the VuFindHarvester\ConsoleOutput namespace to help with standard status output tasks.

History

See CHANGELOG.md

vufindharvest's People

Contributors

demiankatz avatar losullivansu avatar ryan-jacobs avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.