GithubHelp home page GithubHelp logo

zachstence / project-euler-scrape Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 5.01 MB

Several spreadsheets and JSON files for various data (text, HTML, files/images) that was extracted from each ProjectEuler problem using a web scraper

Python 100.00%
web-scraping project-euler coding-challenge python

project-euler-scrape's Introduction

Project-Euler-Scrape

A complete web-scrape of every Project Euler programming challenge problem including information about the problem, the problems themselves, and all files/images.

What information was scraped?

Example of scraped information I extracted as much information as I could find that was useful, including:

  • Problem number (purple)
  • Problem title (blue)
  • Problem information (green)
    • Publish date/time
    • Number of solvers
    • Difficulty rating
  • Problem description (orange)
    • Raw HTML from the page
    • Plain text
  • Any images in the problem description (red)
  • Any files in the problem description (yellow)

Most of the data I scraped is in the file 1_631.json. The structure of the data is:

{
  "<problem number>": {
    "number": 1,
    "url": "<Project Euler problem URL>",
    "title": "<title of problem>",
    "info": {
      "difficulty": <problem difficulty level in %>,
      "published": "<publish date/time>",
      "solved": <number of solvers>
    },
    "content": {
      "images": <list of images>,
      "html": <raw HTML text>,
      "files": <list of files>
    }
  },

  ...
}

The images and files from the problems are found in the images/ and files/ directories respectively.

How was information scraped?

In previous commits I used a program called ParseHub to do the scraping as I was fairly new to the concept and didn't think about doing it in a programming language. However, recently I redid everything in Python using requests to get the webpages and BeautifulSoup to parse the HTML and scrape the information I wanted with regular expressions. All of the code is in pe_scrape.py

Why?

I am in the process of making a portfolio of all of the programming projects I have done. Naturally, I have solved a couple of the Project Euler problems and wanted to include their descriptions, title, etc in my website without manually entering it all. So I decided to have the webpages dynamically filled with PHP using a json file containing all the necessary information, hence this project!

Feel free to use the data I scraped, or modify my code to suit your needs!

project-euler-scrape's People

Contributors

zachstence avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.