GithubHelp home page GithubHelp logo

jozsa / hipposcraper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kokopi-dev/hipposcraper

0.0 2.0 0.0 126 KB

Parsing and scraping Holberton project pages to automate repetitive tasks.

Python 92.48% Shell 7.52%

hipposcraper's Introduction

github version

Hipposcraper - Python Scripts for Automating Holberton Projects

The Hipposcraper automates file template creation for Holberton projects. The program takes a link to a Holberton School project, scrapes the webpage, and creates the corresponding directory and files. The Hipposcraper currently supports the following:

System Engineering Low-Level Programming Higher-Level Programming
Bash script templates .c templates .py and .c templates
Header file Header file
_putchar file
main.c test files main.c/main.py test files
README.md README.md README.md

Getting Started ๐Ÿ”ง

IMPORTANT: Make sure your version is up to date (at the top of the readme), running hippoproject or hipporead will display the version.

Follow these instructions to set up the Hipposcraper on your machine.

Prerequisites

The Hipposcraper relies on the Python packages Mechanize and BeautifulSoup4. Installation of these packages requires pip. If you are on a Debian-based Linux distribution:

sudo apt-get install pip

Once pip has been installed, install Mechanize and BeautifulSoup4 as follows:

pip install mechanize
pip install beautifulsoup4

Note that you may need to run the --user option when installing these packages.

Setup ๐Ÿ”‘

Setting User Information

After cloning a local copy of the repository, enter your Holberton intranet username and password as well as your GitHub name, username, and profile link in the auth_data.json file.

Setting Aliases

The Hipposcraper defines two separate Python scripts - one (hippoproject.py) that creates projects, and a second (hipporead.py) that creates README.md files. To run both simultaneously, you'll need to define an alias to the script hipposcrape.sh.

First, open the script and enter the full pathname to the Hipposcraper directory where directed. Then, if you work in a Bash shell, define the following in your .bashrc:

alias hipposcrape='./ENTER_FULL_PATHNAME_TO_SCRAPER_DIRECTORY_HERE/hipposcrape.sh'

Alternatievely, you can define separate aliases for each individual script. To define a project scraper alias:

alias hippoproject='./ENTER_FULL_PATHNAME_TO_SCRAPER_DIRECTORY_HERE/hipposcraper.py'

And to define a README.md scraper alias:

alias hipporead='./ENTER_FULL_PATHNAME_TO_SCRAPER_DIRECTORY_HERE/hipporead.py'

NOTE: This program only works with Python 2; ensure that your aliases specify 'python2' (Mechanize is not supported by Python 3).


Usage ๐Ÿ’ป

After you have setup the proper aliases, you can run the Hipposcraper with the following command:

~$ hipposcrape project_link

Where project_link is the URL link to the Holberton School project to scrape.

Alternatively, to run only the project scraper:

~$ hippoproject project_link

Or only the README.md scraper:

~$ hipporead project_link

Repository Contents ๐Ÿ“

  • hipposcraper.sh

    • A Bash script for running the entire Hipposcraper at once.
  • hippoproject.py

    • Python script that scrapes Holberton intranet webpage to create project directories.
  • hipporead.py

    • Python script that scrapes Holberton intranet webpage to create project README.md.
  • auth_data.json

    • Stores user Holberton intranet and GitHub profile information.
  • scrapers

    • Folder of file-creation scrapers.
      • base_parse.py: Python script for parsing project pages.
      • sys_scraper.py: Python methods for creating Bash task files for system engineering projects.
      • low_scraper.py: Python methods for creating _putchar.c, task files, and header file for low-level programming projects.
      • high_scraper.py: Python methods for creating Python task files for higher-level programming projects.
      • test_file_scraper.py: Python methods for creating test files for all project types.

Example of the C scraper

demo0

Example of the README scraper

demo1

README Modification โœ๏ธ

To modify the scraping template for README.md creation, edit the hipporead.py file.

  • In the block commented by README TEMPLATE BELOW, you can modify .write functions to edit what is written into the file.
  • In the blocks commented by SCRAPERS and EXTRA SCRAPES, you can use the variables ending with _arr to modify the README.md contents.
    • Note that the information is stored in lists.

Author


Contributors

hipposcraper's People

Contributors

kokopi-dev avatar 234761 avatar bdbaraban avatar narnat avatar

Watchers

James Cloos avatar Allison Weiner avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.