GithubHelp home page GithubHelp logo

jamesscottbrown / getpage Goto Github PK

View Code? Open in Web Editor NEW

This project forked from elleryq/getpage

0.0 2.0 0.0 120 KB

Get a single page from web, and store it with all its ressources in a single MIME-HTML file

License: GNU General Public License v3.0

Python 100.00%

getpage's Introduction

getpage

This command line programm solves the need for fetching a single webpage from a server and storing ist in a single file which can be viewed offline in any webbrowser, making it look at close as possible to the original page.

The Goal

There is a firefox extension named “Mozilla Archive Format” by Christopher Ottley which ollows us to save a webpage to disk in a single file, and showing it afterwards while offline. It support reading and writing of two file formats:

  • MAF
  • MHT

MAF means Mozilla Archive Format and it is a little bit smaller, because it is ZIP compressed, but as far as i know it is only supported by Mozilla. The other one is MHT (short for MIME-HTML) and beneath the Mozilla plugin, it is also supported by Microsofts Internet Explorer. Also it is very close to the standard MIME format used in e-mails.
getpage tries to mimic the output of this plugin, while being independent from the browser, as it is an commandline tool.

The way it works

First getpage fetches the HTML code of the given page and parses it, looking for 4 elements:

  • images
  • external CSS scripts
  • external javascript
  • iframes

all those elements found are now fetched from the server too, and if they reference some other elemnts themself (as Stylesheet may reference images for example) they are also fetched. All elements are now MIME encoded and appended to the original HTML code.
This is stored altogether in a single *.mht file, which can be viewed offline with Firefox, Internet Explorer, and maybe some other browsers

Usage

getpage [options] url

for example:

getpage --quiet --verbose http://remline.de

To Do

  • Programm messages
  • Logging
  • Command line Arguments
  • Tests
  • Documentation

getpage's People

Contributors

jamesscottbrown avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.