GithubHelp home page GithubHelp logo

pngix_scraper's Introduction

Attention

This script no longer works because pngix.com is down This script is incomplete due to lack of my knowledge Some common error cases are not accounted. It can break sometimes

Requirements

Apart from standart python3 libraries, you will need BeautifulSoup 4 for better html parsing

pip install bs4

Info

This script scrapes data from pngix.com and saves it into file in csv format File name is generated based on date and time and looks like this: pngix_data_(weekday)(month)(day)(time)(year).csv Note that weekday and month will be names, not numbers, limited by lenth of 3 letters. Like this:

pngix_data_Mon_Dec_31_23:59:59_2000.csv

Final file contains image source, resolution, license, size and downloads This data is taken strait from pngix.com/viewpng/(png name)

Currently it takes about 10 days (Considering quite fast internet connection) to scrape all data. (I had never let this script to finish, but in 24h about 10% was complete)

In detail, this script works in 2 steps: Step One:

  • Get cookie for this session
  • Get 1st page of images sorted by top (just requests www.pngix.com, as it returns top images)
  • Loop getting N page (www.pngix.com/top/N) until site returns empty page
  • For every image it finds on page, writes image_page, image_source to dump.csv Step Two: =========
  • For every image page in dump.csv request that page and get data about size, resolution, license and downloads
  • Saves all data in pngix_data_(date/time).csv

pngix_scraper's People

Contributors

xrzi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.