vivo-community / vivo-harvester Goto Github PK
View Code? Open in Web Editor NEWAn ETL tool for transferring data from traditional systems into Semantic Systems.
License: Other
An ETL tool for transferring data from traditional systems into Semantic Systems.
License: Other
VIVO Harvester VIVO: Enabling National Networking of Scientists Thanks for your interest in using the harvester. Below is a listing of resources for documentation and help. Wiki: https://wiki.duraspace.org/display/VIVO/VIVO+Harvester+User+Guide+1.0 https://wiki.duraspace.org/display/VIVO/VIVO+Harvester Example scripts The example-scripts directory contains example scripts for reference and demonstration. These are meant as a guide and template to follow for your own harvest. The xml configuration files found in our example-scripts/ folders document each parameter used (and some unused). HELP Messages The --help message for each tool is always up-to-date on what parameters exist, but sparse on explanation To see a help message, run <HarvesterInstallationDirectory>/bin/harvester-<toolName> --help Example: /usr/share/vivo/harvester/bin/harvester-databaseclone --help Reporting bugs / researching existing bugs & workarounds The VIVO JIRA Bug Tracker @ Cornell or GITHUB http://issues.library.cornell.edu/browse/VIVOHARV
Modify the score configuration files for the 1.6 ontology
While using the vivo harvester it is specified as a tool VIVO_Harvester_(vivo.sourceforge.net) as with:
but I am getting error messages that force me to ask if this VIVO_Harvester_(vivo.sourceforge.net) tool is still available or if it has been replaced by something better.
Also, the next link does not work anymore:
https://github.com/vivo-community/VIVO-Harvester/tree/develop
Modify the score configuration files for the 1.6 ontology
Attempting to use RenameBlank node and the nodes are being returned still blank.
Dear Sir,
Kindly share the installation and configuration guide.
CVS Fetch currently requires a property to create unique URI's for nodes. Modify it to allow it to create URI's without a property to use.
PubmedFetch seems to fail the tests repeatedly during mvn install.
If it's true that PubmedHTTPFetch is able to fetch just as good as PubmedFetch, how about replacing PubmedFetch by PubmedHTTPFetch? And disable the use of PubmedFetch all together?
I ran into some issues trying to get this software to run - even though maven threw some warnings about library dependencys with system scope, building worked.
While installing the .deb package with dpkg -i (That command points to bin/ instead of build/ in update-rebuild-install.sh), i got warnings about the postinstall script trying to chgrp/chown to tomcat6 (I am on debian with tomcat7, there is no tomcat6 user or group).
Trying to make the harvester-* commands work, i found that i had to alter the java command using -cp harvester.jar:dependency/* in order to get them running from within /usr/share/vivo/harvester/bin.
Since i did not find detailed installation instructions, i assume that it might be a good idea to add some to the readme file?
I might have missed some things, so please point them out.
Please update the pubmed-to-vivo.xsl script to modify the input code into the 1.6 VIVO Ontology Format. the code
Update the XSLT to the VIVO 1.6 format for the PeopleSoft Example. the code
Caused by: java.lang.ClassNotFoundException: org.vivoweb.harvester.fetch.nih.PubmedFetch
Dear Experts
How to solve this issue
Original posted to JIRA VIVOHARV-119 http://issues.library.cornell.edu/browse/VIVOHARV-119
Per Nicholas Rejack:
I think there is another bug that is causing "incomplete date/time" to show up in VIVO. The date/time method has changed, so the new process is:
Modify the score configuration methods for the 1.6 ontology changes
Is your feature request related to a problem? Please describe.
OpenAlex is the rising star in open research information and provides rich metadata for more than 200 million publications. It would be very useful to have a working harvester script for this data source.
Describe the solution you'd like
A reusable script, maybe based on the JSON fetch.
Describe alternatives you've considered
None, at least not for the VIVO harvester.
Additional context
API overview: https://docs.openalex.org/how-to-use-the-api/api-overview
Modify the configuration xml's for score methods to use the data properties found in 1.6
Please update the course-to-vivo.xsl script to modify the input code into the 1.6 VIVO Ontology Format. the code
Is your feature request related to a problem? Please describe.
Conference metadata is hard to get. It would be very useful to have a working harvester script for ConfIDent (https://www.confident-conference.org/index.php/Main_Page)
Describe the solution you'd like
A reusable script, maybe based on the JSON fetch.
Describe alternatives you've considered
None, at least not for the VIVO harvester.
Additional context
API overview: https://www.confident-conference.org/index.php/Help:FAQ#How_to_use_the_API
The new version of Eutils requires 1.6.2. VIVO uses 1.5.4 for the old Eutils file. Get the new version of the Axis2 Jar for VIVO. It may mean just a maven modification or if we're caching the jar file download the new jar and add it to our library.
Create a new directory for the 1.6 examples and name the older examples with the versions they work with
I am using VIVO Harvester on CentOS7 with VIVO 1.9.3. I have copied the example-oaifetch and modified it according to my configuration. When I execute "run-oaifetch.sh" it runs without any errors and it reaches the last line but as I check the terminal I see this:
[root@localhost oaifetch]# harvester-xsltranslator -X xsltranslator.config.xml
2018-05-25 07:37:15.305 INFO [o.v.h.t.XSLTranslator] XSLTranslator: Start
2018-05-25 07:37:15.308 DEBUG [o.v.h.u.a.ArgList] running XSLTranslator
2018-05-25 07:37:15.309 DEBUG [o.v.h.u.a.ArgList] command line args: -X xsltranslator.config.xml
2018-05-25 07:37:15.313 DEBUG [o.v.h.u.a.ArgList] config file args: --output translated-records.config.xml --input raw-records.config.xml --xslFile oaifetch-mets.datamap.xsl --force --wordiness DEBUG
2018-05-25 07:37:15.402 DEBUG [o.v.h.u.r.RecordHandler] 'rhClass' - 'org.vivoweb.harvester.util.repo.TextFileRecordHandler'
2018-05-25 07:37:15.402 DEBUG [o.v.h.u.r.RecordHandler] 'fileDir' - 'data/raw-records'
2018-05-25 07:37:15.403 DEBUG [o.v.h.u.r.RecordHandler] Using class: 'org.vivoweb.harvester.util.repo.TextFileRecordHandler'
2018-05-25 07:37:15.415 DEBUG [o.v.h.u.r.RecordHandler] 'rhClass' - 'org.vivoweb.harvester.util.repo.TextFileRecordHandler'
2018-05-25 07:37:15.415 DEBUG [o.v.h.u.r.RecordHandler] 'fileDir' - 'data/translated-records'
2018-05-25 07:37:15.415 DEBUG [o.v.h.u.r.RecordHandler] Using class: 'org.vivoweb.harvester.util.repo.TextFileRecordHandler'
2018-05-25 07:37:15.416 DEBUG [o.v.h.u.r.TextFileRecordHandler] Directory 'data/translated-records' Does Not Exist, attempting to create
2018-05-25 07:37:15.418 DEBUG [o.v.h.u.r.TextFileRecordHandler] Directory 'data/translated-records/.metadata' Does Not Exist, attempting to create
2018-05-25 07:37:15.438 DEBUG [o.v.h.u.r.TextFileRecordHandler] Compiling list of records
2018-05-25 07:37:15.441 DEBUG [o.v.h.u.r.TextFileRecordHandler] List compiled
2018-05-25 07:37:15.441 INFO [o.v.h.t.XSLTranslator] 0 records translated.
2018-05-25 07:37:15.441 INFO [o.v.h.t.XSLTranslator] 0 records did not need translation
2018-05-25 07:37:15.442 INFO [o.v.h.t.XSLTranslator] XSLTranslator: End
When I check the raw-data folder I see many files without an extension. Anyone knows what could be wrong?
Original posted on JIRA#VIVOHARV-120 - http://issues.library.cornell.edu/browse/VIVOHARV-120
When running Harvester tools utilizing logging via slf4j, the following error occurs:
SLF4J: The requested version 1.6 by your slf4j binding is not compatible with 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.5.10, 1.5.11
SLF4J: See http://www.slf4j.org/codes.html#version_mismatch for further details.
Exception in thread "main" java.lang.NoSuchMethodError: org.slf4j.helpers.MessageFormatter.arrayFormat(Ljava/lang/String;[Ljava/lang/Object;)Lorg/slf4j/helpers/FormattingTuple;
at ch.qos.logback.classic.spi.LoggingEvent.(LoggingEvent.java:112)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:471)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:427)
at ch.qos.logback.classic.Logger.info(Logger.java:631)
at org.vivoweb.harvester.fetch.SOAPFetch.main(SOAPFetch.java:275)
This error was introduced when org.vivoweb.harvester.util.ImageQueueConsumer was added. This class uses ActiveMQ, which uses slf4j in a way that does not gracefully handle different versions in other classes. This is a reported bug and can be found at https://issues.apache.org/jira/browse/AMQ-3296 .
A workaround for developers exists. In pom.xml, the activemq dependency needs to be removed or commented out, and then ImageQueueConsumer.java removed or renamed to an innocuous extension like .txt. In pom.xml, as of the creation of this bug report the text that would be commented-out looks like this:
org.apache.activemq
activemq-all
5.5.0
org.slf4j
slf4j-api
Then rebuild Harvester via Maven and the rest of the tools work.
Obviously this workaround is unsuitable for released software and therefore the bug needs to be resolved in some way before the next Harvester release.
Pull down the new eutils jar file from the NIH and integrate it into our Maven build.
Update the XSL translation script for dsr to the pubmed 1.6 format the code
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.