GithubHelp home page GithubHelp logo

vphill / metadata_breakers Goto Github PK

View Code? Open in Web Editor NEW
26.0 2.0 5.0 14 KB

Python script for breaking or atomizing OAI-PMH repositories into simpler text formats

License: MIT License

Python 100.00%

metadata_breakers's Introduction

metadata breakers

Python scripts for "breaking" or atomizing OAI-PMH repositories into simpler text formats.

These scripts were designed to use the output from pyoaiharvester.

Basic Usage

Using pyoaiharvester, grab some records you are interested in working with.

python3 pyoaiharvest.py -l https://texashistory.unt.edu/explore/collections/ACUC/oai/ -o acuc.dc.xml

This will result in a repository xml file called acuc.dc.xml for the ACUC collection on The Portal to Texas History.

Next you can start to work with the metadata breakers.

python3 dc_breaker.py ../pyoaiharvester/acuc.dc.xml


      {http://purl.org/dc/elements/1.1/}title: |=========================|    191/191 | 100.00%
    {http://purl.org/dc/elements/1.1/}creator: |=========================|    191/191 | 100.00%
{http://purl.org/dc/elements/1.1/}contributor: |                         |      3/191 |   1.57%
  {http://purl.org/dc/elements/1.1/}publisher: |=========================|    191/191 | 100.00%
       {http://purl.org/dc/elements/1.1/}date: |=========================|    191/191 | 100.00%
   {http://purl.org/dc/elements/1.1/}language: |=========================|    191/191 | 100.00%
{http://purl.org/dc/elements/1.1/}description: |=========================|    191/191 | 100.00%
    {http://purl.org/dc/elements/1.1/}subject: |=========================|    191/191 | 100.00%
   {http://purl.org/dc/elements/1.1/}coverage: |=========================|    191/191 | 100.00%
     {http://purl.org/dc/elements/1.1/}rights: |=                        |     10/191 |   5.24%
       {http://purl.org/dc/elements/1.1/}type: |=========================|    191/191 | 100.00%
     {http://purl.org/dc/elements/1.1/}format: |=========================|    191/191 | 100.00%
 {http://purl.org/dc/elements/1.1/}identifier: |=========================|    191/191 | 100.00%


        dc_completeness      73.79
collection_completeness     100.00
      wwww_completeness     100.00
   average_completeness      91.26

You can designate a specific Dublin Core field to list those elements only.

python3 dc_breaker.py ../pyoaiharvester/acuc.dc.xml -e title

Catalog of Abilene Christian College, 1906-1907
The Childers Classical Institute, Abilene, Texas, Catalog 1906-1907
Announcements 1907-1908
Catalog of Abilene Christian College, 1910-1911
Fifth Annual Catalogue, Abilene Christian College, Abilene, Texas, 1910-1911
Announcement 1910-1911
Catalog of Abilene Christian College, 1912-1913
Seventh Annual Announcement, Abilene Christian College, Abilene, Texas, 1912-1913
Announcement 1912-1913
Catalog of Abilene Christian College, 1913-1914

You can prepend the identifier for the record to the line with the -i flag.

python3 dc_breaker.py ../pyoaiharvester/acuc.dc.xml -e title -i | head
info:ark/67531/metapth45902	Catalog of Abilene Christian College, 1906-1907
info:ark/67531/metapth45902	The Childers Classical Institute, Abilene, Texas, Catalog 1906-1907
info:ark/67531/metapth45902	Announcements 1907-1908
info:ark/67531/metapth45910	Catalog of Abilene Christian College, 1910-1911
info:ark/67531/metapth45910	Fifth Annual Catalogue, Abilene Christian College, Abilene, Texas, 1910-1911
info:ark/67531/metapth45910	Announcement 1910-1911
info:ark/67531/metapth45909	Catalog of Abilene Christian College, 1912-1913
info:ark/67531/metapth45909	Seventh Annual Announcement, Abilene Christian College, Abilene, Texas, 1912-1913
info:ark/67531/metapth45909	Announcement 1912-1913
info:ark/67531/metapth45908	Catalog of Abilene Christian College, 1913-1914

More examples and a full explination of how you might use this tool as part of metadata analysis can be found in the article Metadata Analysis at the Command-Line

metadata_breakers's People

Contributors

vphill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.