GithubHelp home page GithubHelp logo

asilatakarandikar / eparlibscrapr Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 61 KB

A simple web scraper to collect public domain metadata from searches on the Parliament of India's Digital Library.

License: GNU General Public License v3.0

R 100.00%

eparlibscrapr's Introduction

eparlibscrapR

Introduction

This is a simple web scraper built on rvest to collect and organise metadata from searches on the Parliament of India's Digital Library which can be accessed at www[dot]eparlib[dot]nic[dot]in. It returns a data frame with details including the pdf link of each search result.

Functions

At present, this package offers two functions, and each takes a different search filter as input:

  1. scrape_qna: For searches filtered by type "Part 1(Questions and Answers)" which includes only questions asked in the Lok Sabha.

  2. scrape_nonqna: For searches filtered by type "Part 2(Other Than Questions and Answers)", which includes Government Bills, Private Member's Bills, Parliamentary Committee Reports, etc.

This is to accommodate the differences in elements of the two search types.

Future Steps

Development is underway to create:

  1. A simpler function to work on search URLs with any or no filter.

  2. A function to scrape, read, and tidy PDFs from the search results.

Contribute

Spotted a bug? Report here. Want to contribute to code development? Open a pull request here.

Author's Declaration

  1. eparlibscrapR (hereafter, "this Package") is neither affiliated with nor endorsed by the Parliament Digital Library (hereafter, "the Entity").

  2. This Package is created for systematic collection of data available in the public domain in support of research and study.

  3. All users of this Package are hereby advised to refer to these webscraping guidelines and make ethical use of this Package.

  4. The development of this Package, or any use of this Package for any purpose, by itself or in combination with, or as the basis for other packages, or any analyses resultant thereof, does not in any way imply the official positions of, nor hold liable, the Entity.

  5. All attempts have been made to comply with the Terms and Conditions and the Copyright Policy (accessible at www[dot]eparlib[dot]nic[dot]in/help/terms-conditions[dot]jsp and www[dot]eparlib[dot]nic[dot]in/help/copyright-policy[dot]jsp respectively), and other policies of the Entity.

  6. The Author of this Package is neither responsible nor liable for any misuse or non-compliance with the said Terms and/or Policies by any third party. Persons using this package are responsible for ensuring they comply with the same.

  7. Any person interested in using this Package should do so on the understanding that any third party use of this Package for any purpose, by itself or in combination with, or as the basis for other packages, or any analyses resultant thereof, does not amount to an endorsement by, imply the positions of, or hold liable, the Author of this Package.

  8. This Package has been licensed under the GNU General Public License v3.0.

  9. Any subsequent work which uses this Package must also be made available under the same license, as per the terms of using material licensed under the GNU General Public License v3.0.

  10. The Author of this Package is not responsible for changes in availability of information or non-availability of the same on any of the above websites.

About Logo

The logo is adapted from this image sourced from Wikimedia Commons with a CC BY-SA 4.0 license, which requires that any reuse or adaptation of work licensed under the same be made available under the same or compatible licence, and be attributed to the original author. The R package eparlibscrapR which uses this image as the basis for its logo is made available on GitHub with a GPL-3.0 License, declared as a compatible license to CC BY-SA 4.0.

Original file: https://commons.wikimedia.org/wiki/File:Indian_Parliament.svg
Attribution: Suthir, CC BY-SA 4.0, via Wikimedia Commons.

eparlibscrapr's People

Contributors

asilatakarandikar avatar

Stargazers

Ishan Shanware avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.