GithubHelp home page GithubHelp logo

vivo-community / vivo-harvester Goto Github PK

View Code? Open in Web Editor NEW
24.0 32.0 22.0 329.69 MB

An ETL tool for transferring data from traditional systems into Semantic Systems.

License: Other

Shell 20.74% XSLT 27.97% Java 48.59% Batchfile 2.62% FreeMarker 0.08%

vivo-harvester's Introduction

VIVO Harvester
VIVO: Enabling National Networking of Scientists

Thanks for your interest in using the harvester. Below is a listing of resources for documentation and help.

Wiki:
https://wiki.duraspace.org/display/VIVO/VIVO+Harvester+User+Guide+1.0
https://wiki.duraspace.org/display/VIVO/VIVO+Harvester


Example scripts
The example-scripts directory contains example scripts for reference and demonstration. These are meant as a guide and template to follow for your own harvest. The xml configuration files found in our example-scripts/ folders document each parameter used (and some unused).
	
HELP Messages	
The --help message for each tool is always up-to-date on what parameters exist, but sparse on explanation
To see a help message, run <HarvesterInstallationDirectory>/bin/harvester-<toolName> --help
Example:
/usr/share/vivo/harvester/bin/harvester-databaseclone --help

Reporting bugs / researching existing bugs & workarounds
The VIVO JIRA Bug Tracker @ Cornell or GITHUB
http://issues.library.cornell.edu/browse/VIVOHARV

vivo-harvester's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vivo-harvester's Issues

Is VIVO_Harvester_(vivo.sourceforge.net) tool "decomisioned"?

While using the vivo harvester it is specified as a tool VIVO_Harvester_(vivo.sourceforge.net) as with:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&db=pubmed&tool=VIVO_Harvester_(vivo.sourceforge.net) ....

but I am getting error messages that force me to ask if this VIVO_Harvester_(vivo.sourceforge.net) tool is still available or if it has been replaced by something better.
Also, the next link does not work anymore:
https://github.com/vivo-community/VIVO-Harvester/tree/develop

OAI-FETCH error 302 ?

Hello everyone I am trying to use the oai-fetch tool to harvest on a repository in Dspace and it returned the server to the request code 302 I think that it should add the parm verb in the uri handled in the code? @awoods @ieb @lawlesst #

Add AutoNumber URI to CVSFetch

CVS Fetch currently requires a property to create unique URI's for nodes. Modify it to allow it to create URI's without a property to use.

Replace all PubmedFetch by PubmedHTTPFetch?

PubmedFetch seems to fail the tests repeatedly during mvn install.
If it's true that PubmedHTTPFetch is able to fetch just as good as PubmedFetch, how about replacing PubmedFetch by PubmedHTTPFetch? And disable the use of PubmedFetch all together?

Difficulties running the application

I ran into some issues trying to get this software to run - even though maven threw some warnings about library dependencys with system scope, building worked.
While installing the .deb package with dpkg -i (That command points to bin/ instead of build/ in update-rebuild-install.sh), i got warnings about the postinstall script trying to chgrp/chown to tomcat6 (I am on debian with tomcat7, there is no tomcat6 user or group).

Trying to make the harvester-* commands work, i found that i had to alter the java command using -cp harvester.jar:dependency/* in order to get them running from within /usr/share/vivo/harvester/bin.

Since i did not find detailed installation instructions, i assume that it might be a good idea to add some to the readme file?
I might have missed some things, so please point them out.

PubmedFetch

Caused by: java.lang.ClassNotFoundException: org.vivoweb.harvester.fetch.nih.PubmedFetch

Dear Experts

How to solve this issue

MODS DateTime Issue

Original posted to JIRA VIVOHARV-119 http://issues.library.cornell.edu/browse/VIVOHARV-119

Per Nicholas Rejack:

I think there is another bug that is causing "incomplete date/time" to show up in VIVO. The date/time method has changed, so the new process is:

  1. Create a new core:DateTimeValue class object with an RFC 3339-style date (see example below) 2007-01-01T05:00:00Z 2) Link the core:DateTimeValue object via core:dateTimeValue property from original object to the core:DateTimeValue 3) Link core:DateTimeValue object to (yearMonthDayPrecision, yearMonthDayTimePrecision, yearMonthPrecision, yearPrecision) via core:dateTimePrecision

Add harvesting script for OpenAlex

Is your feature request related to a problem? Please describe.
OpenAlex is the rising star in open research information and provides rich metadata for more than 200 million publications. It would be very useful to have a working harvester script for this data source.

Describe the solution you'd like
A reusable script, maybe based on the JSON fetch.

Describe alternatives you've considered
None, at least not for the VIVO harvester.

Additional context
API overview: https://docs.openalex.org/how-to-use-the-api/api-overview

Add harvesting script for ConfIDent conference metadata

Is your feature request related to a problem? Please describe.
Conference metadata is hard to get. It would be very useful to have a working harvester script for ConfIDent (https://www.confident-conference.org/index.php/Main_Page)

Describe the solution you'd like
A reusable script, maybe based on the JSON fetch.

Describe alternatives you've considered
None, at least not for the VIVO harvester.

Additional context
API overview: https://www.confident-conference.org/index.php/Help:FAQ#How_to_use_the_API

Update Axis2 Jar

The new version of Eutils requires 1.6.2. VIVO uses 1.5.4 for the old Eutils file. Get the new version of the Axis2 Jar for VIVO. It may mean just a maven modification or if we're caching the jar file download the new jar and add it to our library.

VIVO Harvester (xsltranslator issue)

I am using VIVO Harvester on CentOS7 with VIVO 1.9.3. I have copied the example-oaifetch and modified it according to my configuration. When I execute "run-oaifetch.sh" it runs without any errors and it reaches the last line but as I check the terminal I see this:

[root@localhost oaifetch]# harvester-xsltranslator -X xsltranslator.config.xml
2018-05-25 07:37:15.305 INFO  [o.v.h.t.XSLTranslator] XSLTranslator: Start
2018-05-25 07:37:15.308 DEBUG [o.v.h.u.a.ArgList] running XSLTranslator
2018-05-25 07:37:15.309 DEBUG [o.v.h.u.a.ArgList] command line args: -X xsltranslator.config.xml
2018-05-25 07:37:15.313 DEBUG [o.v.h.u.a.ArgList] config file args: --output translated-records.config.xml --input raw-records.config.xml --xslFile oaifetch-mets.datamap.xsl --force --wordiness DEBUG
2018-05-25 07:37:15.402 DEBUG [o.v.h.u.r.RecordHandler] 'rhClass' - 'org.vivoweb.harvester.util.repo.TextFileRecordHandler'
2018-05-25 07:37:15.402 DEBUG [o.v.h.u.r.RecordHandler] 'fileDir' - 'data/raw-records'
2018-05-25 07:37:15.403 DEBUG [o.v.h.u.r.RecordHandler] Using class: 'org.vivoweb.harvester.util.repo.TextFileRecordHandler'
2018-05-25 07:37:15.415 DEBUG [o.v.h.u.r.RecordHandler] 'rhClass' - 'org.vivoweb.harvester.util.repo.TextFileRecordHandler'
2018-05-25 07:37:15.415 DEBUG [o.v.h.u.r.RecordHandler] 'fileDir' - 'data/translated-records'
2018-05-25 07:37:15.415 DEBUG [o.v.h.u.r.RecordHandler] Using class: 'org.vivoweb.harvester.util.repo.TextFileRecordHandler'
2018-05-25 07:37:15.416 DEBUG [o.v.h.u.r.TextFileRecordHandler] Directory 'data/translated-records' Does Not Exist, attempting to create
2018-05-25 07:37:15.418 DEBUG [o.v.h.u.r.TextFileRecordHandler] Directory 'data/translated-records/.metadata' Does Not Exist, attempting to create
2018-05-25 07:37:15.438 DEBUG [o.v.h.u.r.TextFileRecordHandler] Compiling list of records
2018-05-25 07:37:15.441 DEBUG [o.v.h.u.r.TextFileRecordHandler] List compiled
2018-05-25 07:37:15.441 INFO  [o.v.h.t.XSLTranslator] 0 records translated.
2018-05-25 07:37:15.441 INFO  [o.v.h.t.XSLTranslator] 0 records did not need translation
2018-05-25 07:37:15.442 INFO  [o.v.h.t.XSLTranslator] XSLTranslator: End

When I check the raw-data folder I see many files without an extension. Anyone knows what could be wrong?

SLF4J and ActiveMQ are Incompatible in V 1.3

Original posted on JIRA#VIVOHARV-120 - http://issues.library.cornell.edu/browse/VIVOHARV-120

When running Harvester tools utilizing logging via slf4j, the following error occurs:

SLF4J: The requested version 1.6 by your slf4j binding is not compatible with 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.5.10, 1.5.11
SLF4J: See http://www.slf4j.org/codes.html#version_mismatch for further details.
Exception in thread "main" java.lang.NoSuchMethodError: org.slf4j.helpers.MessageFormatter.arrayFormat(Ljava/lang/String;[Ljava/lang/Object;)Lorg/slf4j/helpers/FormattingTuple;
at ch.qos.logback.classic.spi.LoggingEvent.(LoggingEvent.java:112)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:471)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:427)
at ch.qos.logback.classic.Logger.info(Logger.java:631)
at org.vivoweb.harvester.fetch.SOAPFetch.main(SOAPFetch.java:275)

This error was introduced when org.vivoweb.harvester.util.ImageQueueConsumer was added. This class uses ActiveMQ, which uses slf4j in a way that does not gracefully handle different versions in other classes. This is a reported bug and can be found at https://issues.apache.org/jira/browse/AMQ-3296 .

A workaround for developers exists. In pom.xml, the activemq dependency needs to be removed or commented out, and then ImageQueueConsumer.java removed or renamed to an innocuous extension like .txt. In pom.xml, as of the creation of this bug report the text that would be commented-out looks like this:

org.apache.activemq
activemq-all
5.5.0

org.slf4j
slf4j-api

Then rebuild Harvester via Maven and the rest of the tools work.

Obviously this workaround is unsuitable for released software and therefore the bug needs to be resolved in some way before the next Harvester release.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.