ajs6f / fcrepo3-rdf-extractor Goto Github PK
View Code? Open in Web Editor NEWA utility to extract RDF triples from Fedora Commons 3 Akubra-based persistence stores.
License: Other
A utility to extract RDF triples from Fedora Commons 3 Akubra-based persistence stores.
License: Other
Testing this out on a subset of URIs I have hit one where it can't find the DC datastream.
Error log says
INFO 2018-05-30 10:27:48.123 [pool-2-thread-1] (ObjectProcessor) Operating on object URI: info:fedora/uofm:2939588
ERROR 2018-05-30 10:27:48.138 [pool-2-thread-1] (ObjectProcessor) Couldn't find datastream DC from object info:fedora/uofm:2939588! Caused by:
org.akubraproject.MissingBlobException: (Missing blob with id = 'file:0f/uofm%3A2939588%2BDC%2BDC.0')
at org.akubraproject.fs.FSBlob.openInputStream(FSBlob.java:100)
at org.akubraproject.impl.BlobWrapper.openInputStream(BlobWrapper.java:93)
at edu.si.fcrepo.ObjectProcessor.getDatastreamContent(ObjectProcessor.java:205)
at edu.si.fcrepo.ObjectProcessor.consume(ObjectProcessor.java:180)
at edu.si.fcrepo.ObjectProcessor.accept(ObjectProcessor.java:152)
at edu.si.fcrepo.Extract.lambda$null$3(Extract.java:240)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
INFO 2018-05-30 10:27:48.169 [main] (Extract) Reached 101 objects at end of objects with 0 in-queue after 1 errors.
INFO 2018-05-30 10:27:48.169 [main] (Extract) Finished extraction.
The file does not exist in the 0f directory, nor does it exist there with the name info%3Afedora%2Fuofm%3A2939588%2BDC%2BDC.0. There is a managed DC datastream that I can access via Fedora. 0
I'll generate a new list of random pids to test with.
When running a build of master (a560428) using the command
java -jar fcrepo3-rdf-extractor.jar -a /home/ubuntu/akubra-llstore.xml -o /mnt/rdf/juno.nq -g '<info:ca.umanitoba.fedora#ri>' -n 2 -s 2 --logback /home/ubuntu/fcrepo-rdf-extractor-logback.xml
I got the following errors.
https://gist.github.com/whikloj/79ca322a06d61391889d3c7063b6c423
When RDF/XML includes a namespace without a prefix, a la:
<rdf:RDF xmlns:islandora="http://islandora.ca/ontology/relsint#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="info:fedora/uofm:2211108/JP2">
<width xmlns="http://islandora.ca/ontology/relsext#">4478</width>
<height xmlns="http://islandora.ca/ontology/relsext#">6762</height>
</rdf:Description>
</rdf:RDF>
the triples derived from elements like width
are being dropped with a message similar to the title of this ticket.
A run of the rdf extractor seems to miss some triples. A full run of ours collected 121,819,261 but a count of the existing triple store showed 125,262,308.
Tried to use this but am having trouble with my Akubra storage config file.
[whikloj@juno]/opt/fcrepo3-rdf-extractor% java -jar target/fcrepo3-rdf-extractor-0.0.1-SNAPSHOT.jar -a /usr/local/fedora/server/config/spring/akubra-llstore.xml -o /local/dam/staging/jareds_triples/juno_20161121.sparql
INFO 15:50:26.012 (edu.si.fcrepo.Extract) Using 4 threads for extraction and a queue size of 1048576.
INFO 15:50:26.018 (edu.si.fcrepo.Extract) Extracting to /local/dam/staging/jareds_triples/juno_20161121.sparql...
INFO 15:50:26.018 (edu.si.fcrepo.Extract) with Akubra configuration from /usr/local/fedora/server/config/spring/akubra-llstore.xml.
INFO 15:50:26.082 (org.springframework.context.support.FileSystemXmlApplicationContext) Refreshing org.springframework.context.support.FileSystemXmlApplicationContext@97e93f1: startup date [Mon Nov 21 15:50:26 GMT-06:00 2016]; root of context hierarchy
INFO 15:50:26.111 (org.springframework.beans.factory.xml.XmlBeanDefinitionReader) Loading XML bean definitions from URL [file:/usr/local/fedora/server/config/spring/akubra-llstore.xml]
INFO 15:50:26.169 (org.springframework.beans.factory.support.DefaultListableBeanFactory) Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@4fa1c212: defining beans [org.fcrepo.server.storage.lowlevel.ILowlevelStorage,org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorage,objectStore,fsObjectStore,fsObjectStoreMapper,datastreamStore,fsDatastreamStore,fsDatastreamStoreMapper,fedoraStorageHintProvider]; root of factory hierarchy
INFO 15:50:26.170 (org.springframework.beans.factory.support.DefaultListableBeanFactory) Destroying singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@4fa1c212: defining beans [org.fcrepo.server.storage.lowlevel.ILowlevelStorage,org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorage,objectStore,fsObjectStore,fsObjectStoreMapper,datastreamStore,fsDatastreamStore,fsDatastreamStoreMapper,fedoraStorageHintProvider]; root of factory hierarchy
Exception in thread "main" org.springframework.beans.factory.CannotLoadBeanClassException: Cannot find class [org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorageModule] for bean with name 'org.fcrepo.server.storage.lowlevel.ILowlevelStorage' defined in URL [file:/usr/local/fedora/server/config/spring/akubra-llstore.xml]; nested exception is java.lang.ClassNotFoundException: org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorageModule
at org.springframework.beans.factory.support.AbstractBeanFactory.resolveBeanClass(AbstractBeanFactory.java:1261)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.predictBeanType(AbstractAutowireCapableBeanFactory.java:575)
at org.springframework.beans.factory.support.AbstractBeanFactory.isFactoryBean(AbstractBeanFactory.java:1330)
at org.springframework.beans.factory.support.AbstractBeanFactory.isFactoryBean(AbstractBeanFactory.java:896)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:566)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:895)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:425)
at org.springframework.context.support.FileSystemXmlApplicationContext.<init>(FileSystemXmlApplicationContext.java:140)
at org.springframework.context.support.FileSystemXmlApplicationContext.<init>(FileSystemXmlApplicationContext.java:84)
at edu.si.fcrepo.Extract.init(Extract.java:194)
at edu.si.fcrepo.Extract.main(Extract.java:157)
Caused by: java.lang.ClassNotFoundException: org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorageModule
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.springframework.util.ClassUtils.forName(ClassUtils.java:257)
at org.springframework.beans.factory.support.AbstractBeanDefinition.resolveBeanClass(AbstractBeanDefinition.java:408)
at org.springframework.beans.factory.support.AbstractBeanFactory.doResolveBeanClass(AbstractBeanFactory.java:1282)
at org.springframework.beans.factory.support.AbstractBeanFactory.resolveBeanClass(AbstractBeanFactory.java:1253)
... 10 more
[whikloj@juno]/opt/fcrepo3-rdf-extractor%
My akubra-llstore.xml -> https://gist.github.com/whikloj/584dea271c6e872e4b3d574676781bcc
This is weird. I started the tool running against my production objectStore and it was kicking along all nice and easy.
Once it had filled 5 20MB log files and 20GB of triples it just stopped outputting anything.
The process is still active (in a screened session), but it is not doing anything I can see.
Could it be that my logback.xml
is causing this? It is weird that I set the RollingFIleAppender to <maxIndex>5</maxIndex>
and thats when the process stops.
I tried deleting the log files, but that had no effect.
Thoughts?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.