GithubHelp home page GithubHelp logo

ajs6f / fcrepo3-rdf-extractor Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 2.0 206 KB

A utility to extract RDF triples from Fedora Commons 3 Akubra-based persistence stores.

License: Other

Java 100.00%

fcrepo3-rdf-extractor's People

Contributors

ajs6f avatar whikloj avatar

Watchers

 avatar  avatar  avatar

fcrepo3-rdf-extractor's Issues

Can't find datastream

Testing this out on a subset of URIs I have hit one where it can't find the DC datastream.

Error log says

INFO 2018-05-30 10:27:48.123 [pool-2-thread-1] (ObjectProcessor) Operating on object URI: info:fedora/uofm:2939588
ERROR 2018-05-30 10:27:48.138 [pool-2-thread-1] (ObjectProcessor) Couldn't find datastream DC from object info:fedora/uofm:2939588! Caused by:
org.akubraproject.MissingBlobException: (Missing blob with id = 'file:0f/uofm%3A2939588%2BDC%2BDC.0')
        at org.akubraproject.fs.FSBlob.openInputStream(FSBlob.java:100)
        at org.akubraproject.impl.BlobWrapper.openInputStream(BlobWrapper.java:93)
        at edu.si.fcrepo.ObjectProcessor.getDatastreamContent(ObjectProcessor.java:205)
        at edu.si.fcrepo.ObjectProcessor.consume(ObjectProcessor.java:180)
        at edu.si.fcrepo.ObjectProcessor.accept(ObjectProcessor.java:152)
        at edu.si.fcrepo.Extract.lambda$null$3(Extract.java:240)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
INFO 2018-05-30 10:27:48.169 [main] (Extract) Reached 101 objects at end of objects with 0 in-queue after 1 errors.
INFO 2018-05-30 10:27:48.169 [main] (Extract) Finished extraction.

The file does not exist in the 0f directory, nor does it exist there with the name info%3Afedora%2Fuofm%3A2939588%2BDC%2BDC.0. There is a managed DC datastream that I can access via Fedora. 0

I'll generate a new list of random pids to test with.

Exception Caused by: javax.xml.stream.XMLStreamException: The namespace URI "XXXXXXXXX" has not been bound to a prefix.

When RDF/XML includes a namespace without a prefix, a la:

<rdf:RDF xmlns:islandora="http://islandora.ca/ontology/relsint#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="info:fedora/uofm:2211108/JP2">
    <width xmlns="http://islandora.ca/ontology/relsext#">4478</width>
    <height xmlns="http://islandora.ca/ontology/relsext#">6762</height>
  </rdf:Description>
</rdf:RDF>

the triples derived from elements like width are being dropped with a message similar to the title of this ticket.

Missed triples

A run of the rdf extractor seems to miss some triples. A full run of ours collected 121,819,261 but a count of the existing triple store showed 125,262,308.

Doesn't like my config files.

Tried to use this but am having trouble with my Akubra storage config file.

[whikloj@juno]/opt/fcrepo3-rdf-extractor% java -jar target/fcrepo3-rdf-extractor-0.0.1-SNAPSHOT.jar -a /usr/local/fedora/server/config/spring/akubra-llstore.xml -o /local/dam/staging/jareds_triples/juno_20161121.sparql
INFO 15:50:26.012 (edu.si.fcrepo.Extract) Using 4 threads for extraction and a queue size of 1048576.
INFO 15:50:26.018 (edu.si.fcrepo.Extract) Extracting to /local/dam/staging/jareds_triples/juno_20161121.sparql...
INFO 15:50:26.018 (edu.si.fcrepo.Extract) with Akubra configuration from /usr/local/fedora/server/config/spring/akubra-llstore.xml.
INFO 15:50:26.082 (org.springframework.context.support.FileSystemXmlApplicationContext) Refreshing org.springframework.context.support.FileSystemXmlApplicationContext@97e93f1: startup date [Mon Nov 21 15:50:26 GMT-06:00 2016]; root of context hierarchy
INFO 15:50:26.111 (org.springframework.beans.factory.xml.XmlBeanDefinitionReader) Loading XML bean definitions from URL [file:/usr/local/fedora/server/config/spring/akubra-llstore.xml]
INFO 15:50:26.169 (org.springframework.beans.factory.support.DefaultListableBeanFactory) Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@4fa1c212: defining beans [org.fcrepo.server.storage.lowlevel.ILowlevelStorage,org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorage,objectStore,fsObjectStore,fsObjectStoreMapper,datastreamStore,fsDatastreamStore,fsDatastreamStoreMapper,fedoraStorageHintProvider]; root of factory hierarchy
INFO 15:50:26.170 (org.springframework.beans.factory.support.DefaultListableBeanFactory) Destroying singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@4fa1c212: defining beans [org.fcrepo.server.storage.lowlevel.ILowlevelStorage,org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorage,objectStore,fsObjectStore,fsObjectStoreMapper,datastreamStore,fsDatastreamStore,fsDatastreamStoreMapper,fedoraStorageHintProvider]; root of factory hierarchy
Exception in thread "main" org.springframework.beans.factory.CannotLoadBeanClassException: Cannot find class [org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorageModule] for bean with name 'org.fcrepo.server.storage.lowlevel.ILowlevelStorage' defined in URL [file:/usr/local/fedora/server/config/spring/akubra-llstore.xml]; nested exception is java.lang.ClassNotFoundException: org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorageModule
        at org.springframework.beans.factory.support.AbstractBeanFactory.resolveBeanClass(AbstractBeanFactory.java:1261)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.predictBeanType(AbstractAutowireCapableBeanFactory.java:575)
        at org.springframework.beans.factory.support.AbstractBeanFactory.isFactoryBean(AbstractBeanFactory.java:1330)
        at org.springframework.beans.factory.support.AbstractBeanFactory.isFactoryBean(AbstractBeanFactory.java:896)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:566)
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:895)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:425)
        at org.springframework.context.support.FileSystemXmlApplicationContext.<init>(FileSystemXmlApplicationContext.java:140)
        at org.springframework.context.support.FileSystemXmlApplicationContext.<init>(FileSystemXmlApplicationContext.java:84)
        at edu.si.fcrepo.Extract.init(Extract.java:194)
        at edu.si.fcrepo.Extract.main(Extract.java:157)
Caused by: java.lang.ClassNotFoundException: org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorageModule
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.springframework.util.ClassUtils.forName(ClassUtils.java:257)
        at org.springframework.beans.factory.support.AbstractBeanDefinition.resolveBeanClass(AbstractBeanDefinition.java:408)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doResolveBeanClass(AbstractBeanFactory.java:1282)
        at org.springframework.beans.factory.support.AbstractBeanFactory.resolveBeanClass(AbstractBeanFactory.java:1253)
        ... 10 more
[whikloj@juno]/opt/fcrepo3-rdf-extractor% 

My akubra-llstore.xml -> https://gist.github.com/whikloj/584dea271c6e872e4b3d574676781bcc

Question: extractor stops outputting but doesn't quit.

This is weird. I started the tool running against my production objectStore and it was kicking along all nice and easy.

Once it had filled 5 20MB log files and 20GB of triples it just stopped outputting anything.

The process is still active (in a screened session), but it is not doing anything I can see.

Could it be that my logback.xml is causing this? It is weird that I set the RollingFIleAppender to <maxIndex>5</maxIndex> and thats when the process stops.

I tried deleting the log files, but that had no effect.

Thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.