GithubHelp home page GithubHelp logo

aleksamcode / university-notices-email-notifier Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 0.0 82 KB

Dynamic website scraper and email notifier.

License: MIT License

Python 95.31% Shell 1.69% Batchfile 1.36% PowerShell 1.64%
scraper web-scraping etl etl-process

university-notices-email-notifier's Introduction

University notices email notifier

Scraper for notices on Faculty of Electrical Engineering Banja Luka website. This project scrapes notices from a website and after ETL processing data is sent to the appointed email address through Yahoo SMTP, using smtplib library, in a form of a JSON file.

Table of contents

Introduction

I've always wanted to build a web scraper, and recently I found some free time recently to complete this project. Because the website is dynamic, scraping was done with Selenium API in addition to Beautiful Soup library. The project is written in such way that it can be run both on Windows and Linux.

Note:

  • In order for any of this to work one prerequisite is that you have installed Python 3 on your machine.
  • Be cautious when changing config.ini because it's tightly coupled with python code.
  • The code is tested both on Windows 10 and latest Linux Mint distribution.

Initial Setup

In this section, I will go over details how to set up this project on Linux. However, the majority of the steps are also applicable on Windows. Firstly, you will open the Command line and position yourself to the desired directory, after which you will need to clone this repository using git clone command.

$ git clone https://github.com/AleksaMCode/university-notices-email-notifier.git

Next, position yourself inside the project directory, create a virtualenv and then install all the needed packages from the requirements.txt file.

$ cd university-notices-email-notifier
$ virtuelenv -p python3 venv
$ source venv/bin/activate
(venv) pip install -r requirements.txt

Note:

All of these commands you can find in init.sh file that is located inside of the resources/scripts directory.

Config file setup

Before using this project, you need to adjust a couple of parameters stored in a config ini file. Firstly, you'll need to add an email address (user_email field) you wish to use to receive the email notification. If you wish to use Yahoo SMTP, you only need to update the email and password fields with your own credentials. Below you can find detail instruction how to set up Yahoo SMTP with your account. If for some reason you want to use another email provider, then you will need, in addition to the previously mentioned fields, to update fields that are provider specific, such as port and SMTP server. All of this information is stored in a config file in the SMTP section.

[SMTP]
smtp = smtp.mail.yahoo.com
email =
port = 587
password =
user_email =

Yahoo SMTP

Below you have a table of all the essential details you need:

SMTP server Port Requires SSL Requires TLS Authentication Username Password
smtp.mail.yahoo.com 587 Your Yahoo email address Your Yahoo Mail App Password, which isn't the same as your account password

Restrictions:

  • You can send maximum of 500 emails per day.
  • Some sources claim you can send maximum of 100 emails per hour.

In order to use Yahoo SMTP server, you need to create a dedicated App Password. Firstly you need to go to your account settings area and then click on Account Security after which you will click on Generate app password link under the Other ways to sign in section. After the popup is shown, you will need to enter your app name, which can be anything. Next, click the Generate password button. You should then see the 16-char long app password, which you will need to remember for later usage, as Yahoo will not be showing it to you again.

Scheduling scraping

Windows - Task Scheduler

First thing you need to create is a bat file which will connect the python.exe and notifier.py script. Open a directory in which you wish to create a bat file and open a PowerShell and type the following commands:

New-Item scraper.bat
"@echo of `r`n""C:\Users\Username\AppData\Local\Programs\Python\Python310\python.exe"" ""C:\Users\Username\university-notices-email-notifier\notifier.py"""

Note:
You will need to adjust the syntax above:

  • Set first path where your python.exe is stored.
  • Set second path where notifier.py script is stored.

In order to schedule the scraper using Window Scheduler, you will need to:

  • Open the Windows Control Panel, then click on the Administrative Tools and double-click on the Task Scheduler.
  • Choose the option `Create Task...`.
  • Type a name for this task (description is optional) in General tab and then click on Triggers tab.
  • Press on the New... and then in the newly opened New Trigger window choose to start the task 'One time' starting from 12:00:00 am.
  • In Advanced settings tick 'Repeat task every' and enter your desired frequency.
  • From the drop menu for a duration of choose 'Indefinitely' and press on OK.
  • Press on the tab and click on the New... button. There you will need to browse and find scraper.bat which is located inside of the resources/scripts directory.
  • Press OK twice.

Linux - Cron job

Firstly, you need to open crontab with the following command crontab -e. Once you enter the cron editor, you will need to add the cronjob command. For example, if you want to run this scraper every 30 minutes, you will enter:

0,30 * * * * /usr/bin/python /home/script/university-notices-email-notifier/notifier.py

Save your changes and exit the editor. For more details on how to specify frequency, visit this link.

Note:
Don't forget to exit Vim using :wq. :)

To-Do List

  • Replace json file attachment with html formatted email response.
  • Implement year specific command for notifications.
  • Implement year range command for notifications.
  • Move sensitive information, like password, from config file to environment variables.
  • Implement toast notifications.

university-notices-email-notifier's People

Contributors

aleksamcode avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.