GithubHelp home page GithubHelp logo

theuerc / imslp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jlumbroso/imslp

0.0 0.0 0.0 89 KB

🎼 The clean and modern way of accessing IMSLP data and scores programmatically. 🎢

Home Page: https://pypi.org/project/imslp/

License: GNU Lesser General Public License v3.0

Python 100.00%

imslp's Introduction

imslp

pytest codecov Documentation Status Downloads Run on Repl.it Stargazers

🎼 The clean and modern way of accessing IMSLP data and scores programmatically. 🎢

Installation

The package is available on PyPi and can be installed using your favorite package manager:

pip install imslp

Data Sources

This project attempts to use robust sources of data, that do not require web scraping of some sort:

  • MediaWiki API. IMSLP is one of tens of thousands of websites built on top of MediaWiki, the framework created for Wikipedia.org. As such, it can be accessed through the MediaWiki API for which, fortunately, there exists a fantastic Python wrapper library called mwclient.

  • IMSLP API. For convenience, the IMSLP built some ad-hoc scripts that can be used to get a list of people and a list of works, in a variety of different formats, including JSON.

It also uses scraping to collect additional information (such as the number of pages in a score, the number of times a score was downloaded, or the user-provided ratings).

Some quirks of IMSLP

While fortunately, as mentioned, IMSLP uses a widely used open-source Wiki platform, MediaWiki, it has a handful of quirks. Such as:

  • Composers are stored as Category, for instance Category:Scarlatti, Domenico. For each composer, there is usually three tabs: "Compositions", "Collaborations" and "Collections"; these are stored as separate categories resulting from the concatenation of the composer and subtype, such as Category:Scarlatti, Domenico/Collections.

  • PDF files for sheet music are stored as "images"; unfortunately, for the time being, the scheme does not appear in the URLs computed for the files. These need to be manually patched.

  • The imslpdisclaimeraccepted cookie must be set to "yes" for files to download properly (otherwise, downloading any file will result in the disclaimer page). With mwclient, this can be specified on login.

    cookies = {
        "imslp_wikiLanguageSelectorLanguage": "en",
        "imslpdisclaimeraccepted": "yes",
    }
  • Much of the metadata associated with images, such as the internal ID or the download counter, is stored separately than the MediaWiki metadata. This makes scraping the rendered HTML page a necessary endeavour.

Fortunately all these quirks are handled by this package!

Related Projects

Here are a handful of other related projects available on GitHub to access the IMSLP data programmatically:

  • jjjake/imslp-scrape: Last commit in May 2012 (32 commits), mix of Python and shell, scraping the website for data (people, score links) with HTML parsing.

  • FrankTheCodeMonkey/IMSLP-Scraper: Last commit in June 2020 (6 commits), Python, scraping the website for data and scores, with HTML parsing and Selenium.

  • josefleventon/imslp-api: Last commit in May 2020 (17 commits), JavaScript, uses IMSLP's custom API to get the list of people and list of works programmatically through a web API query.

More recently, and in other languages:

Acknowledgements

Let's be clear that all the heavy lifting is done by mwclientβ€”and the volunteers who uploaded and/or scanned and/or typeset the scores on IMSLP.

License

This project is licensed under the LGPLv3 license, with the understanding that importing a Python modular is similar in spirit to dynamically linking against a library.

  • You can use the library imslp in any project, for any purpose, as long as you provide some acknowledgement to this original project for use of the library.

  • If you make improvements to imslp, you are required to make those changes publicly available.

imslp's People

Contributors

jlumbroso avatar ramseyharrison avatar github-actions[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.