GithubHelp home page GithubHelp logo

haakooto / uioscrape Goto Github PK

View Code? Open in Web Editor NEW

This project forked from brorh/uioscrape

0.0 0.0 0.0 175 KB

Scrapes semester pages of given semester and returns a list of all pdfs of old exams.

Python 99.14% Shell 0.86%

uioscrape's Introduction

UiOScrape

Effortlessly extract exams, exam solutions, lectures, exercises etc. from old lecture pages of any UiO subject.
Finds unpublished files from the UiO servers. Great for exam studying

Table of contents

Installing

Windows

ahahahaha

MacOS

pray to GNU and follow Linux guide

Linux

  1. Requires Python3.8 or later as well as the modules

    • numpy: $ pip install numpy
    • cryptography: $ย pip install cryptography
  2. Install wdfs

  3. Clone the repo.


HOW TO INSTALL WDFS:

wdfs is a Web-Dav filesharing system required by UiOScrape to work:
(tutorial kindly stolen verbatim from Khurshid Alam)

sudo apt-get install checkinstall libfuse-dev libneon27 libneon27-dev
sudo apt-get install python-fuse fuse-utils

wget http://noedler.de/projekte/wdfs/wdfs-1.4.2.tar.gz
tar xzvf wdfs-1.4.2.tar.gz
cd wdfs-1.4.2
./configure
sudo checkinstall
sudo dpkg -i wdfs_1.4.2-1_*.deb

Usage

scraper.py

Run the python script and pass any UiO subject as the first and only argument. Example:
python3.8 scaper.py FYS-MEK1110
The subject code is case-insensitive.
You will be asked to enter your UiO username and password (why?). You will have to enter these credentials everytime you run the scraper, unless you setup a pin-code. See credentials.py.
As of now (almost) all pdfs are downloaded into a ./downloads dir in the cloned repo. Future versions will feature more controllable downloading options and filtering methods.
It is worth noting that the downloading gets exponentially faster, as the system needs time to finish mounting.
NOTE: When stopping the program mid-scrape, plese only press "ctrl+C" (or equivalent) one time only, as the program needs to safely unmount after a scrape. If you did not do this, the program will automatically clean up after you next time you run it, but it may cause some issues (nothing serious) if you act careless

credentials.py

Running this script and following the instructions will allow you to use a pin-code instead of your UiO credentials every time by encrypting and storing your credentials. If you are the main user of your computer this should not be a problem. However for multi-user systems, this is not recommended. This is similar to the git-credentials approach of achieving seemless login, (which surprisingly just stores your password plaintext!)

To-do list:

  • Add support for other files than just .pdfs
  • Add autocomplete for all subjects via command line
  • Chose which files from a scrape to be downloaded
  • Download specific files only
  • Add smart filter which detects only specific file types (i.e exams or oblig)
  • Eat ๐Ÿ•
  • Group results in a prettier manner
  • bugsbugsbugs

To-done list:

  • Extract files though mounting of UiO webDav server
  • Basic password/pin system for credential storage
  • Implement hashing/UUID system to assert no file is downloaded twice.
  • Safer unmouting system

FAQ

Why does the program ask me to log in with my UiO account?

As a UiO student you are granted access to the course pages of most semester pages of most subjects. Simply telling the UiO servers that you are indeed a student grants you the ability to view and download resources that do not appear publicly on the semester web pages.
The great thing about open source is that you can confirm for yourself that the scraper does nothing dubious with your credentials.

Ummm, ok is there a "no-login" option?

Yes! While not currently fully implemented, there will soon be an option to perform classic web-scraping of the html pages of each semester course. While this scraping allows anyone to scrape for pdfs this method is slow, resource heavy, has very ugly code and most importantly it can't find the above mentioned non-public files (which are often the most juicy ones ๐Ÿ˜‰)

Can I get in trouble for using this?

No. However, common sense applies; don't behave inappropriately and don't be an idiot and everything will be fine. These files are available to all UiO students, this program just quickly and neatly collects them.

When will you eat ๐Ÿ•?

The day when I wont have to google regex syntaxing, I'll allow myself a slice

Status

Project is: heavily in development

Contact

Created by Bror Hjemgaard - feel free to contact me!

uioscrape's People

Contributors

brorh avatar brorhjemgaard avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.