GithubHelp home page GithubHelp logo

is-this-digitized's Introduction

Searching for digitized books by OCLC identifier

This repository has scripts to search the following websites for digitized books by their OCLC numbers.


Data

Formatting your OCLC numbers for searching.

OCLC identifiers should be entered into a spreadsheet in a column called 'oclc_id'. The OCLC identifiers should not have any prefixes like "ocm", "on", or "(OCoLC)". Save your spreadsheet as a UTF-8 encoded CSV. It does not matter if the identifiers are saved as integers or strings, as the scripts automatically converts identifiers into strings.

When your CSV is ready, put it in the same folder location as the scripts below on your local system.

Test data

There is a folder called "test-data" in the repository with test data and results. This can help with formatting and troubleshooting the scripts on your local system.

  • test.csv: A CSV with 9 items (3 items findable by OCLC number for each website). These items were selected at random.
  • hathiTrustResults_test.csv: The results from running test.csv against searchHathiTrustByOCLC.py.
  • googleBooksResults_test.csv: The results from running test.csv against searchGoogleBooksByOCLC.py.
  • internetArchiveResults_test.csv: The results from running test.csv against searchInternetArchivesByOCLC.py.

Scripts

Requirements

searchGoogleBooksByOCLC.py

Setup: Register for a Google API key to search. Go to Google's APIs & Services Credentials page and register for an API key using a Google account. Then create a Python file in the same folder as this script called googleKey.py with the following code:

key='##########'

Be sure to add googleKey.py to your gitignore.

Search limits: There is a 60-second pause after searching a set of 100 OCLC numbers as Google Books limits the number of books searched per minute via API. So, if you have 1000 OCLC identifiers to search, this script will take at least 10 minutes. I'm sure there is a better solution, I just don't know what it is. There is also a daily limit of books you can search via API. Avoid searching more than 1000 identifiers in a 24-hour period. You will get an error if this occurs, just try rerunning your script the next day.

searchHathiTrustByOCLC.py

This searches the oclc field in HathiTrust.

searchInternetArchiveByOCLC.py

This searches two metadata fields in the Internet Archive for an OCLC number: external-identifier and oclc_id.

combineMyResults.py

This script combines CSV results generated by running the above three scripts.

is-this-digitized's People

Contributors

mjanowiecki avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.