GithubHelp home page GithubHelp logo

he-man.org-archive-thing's Introduction

about

saw this post, decided to throw together a script to archive threads

tldr stuff's already being handled by Archive Team folks, so no need to rush to run this.

setup

  1. clone this repo
  2. install the requirements python3 -m pip install -r requirements.txt

i might set this up as a pip installable thing but for now this is it

usage

given this thread as an example https://www.he-man.org/forums/boards/showthread.php?168773-United-Kingdom-Collector-s-Delivery-Thread
to save from page 17 to the end (which at time of writing is page 328), do this1:

python3 archive.py --thread-id "168773-United-Kingdom-Collector-s-Delivery-Thread" --start-page 17 --end-page 328

and it should start saving pages

saving page 17...
page 17 saved at https://web.archive.org/web/20231112092515/https://www.he-man.org/forums/boards/showthread.php?168773-United-Kingdom-Collector-s-Delivery-Thread%2Fpage17
saving page 18...
page 18 saved at https://web.archive.org/web/20231112092627/https://www.he-man.org/forums/boards/showthread.php?168773-United-Kingdom-Collector-s-Delivery-Thread%2Fpage18
saving page 19...
page 19 saved at https://web.archive.org/web/20231112092653/https://www.he-man.org/forums/boards/showthread.php?168773-United-Kingdom-Collector-s-Delivery-Thread%2Fpage19
saving page 20...
page 20 saved at https://web.archive.org/web/20231112092710/https://www.he-man.org/forums/boards/showthread.php?168773-United-Kingdom-Collector-s-Delivery-Thread%2Fpage20
saving page 21...
page 21 saved at https://web.archive.org/web/20231112092733/https://www.he-man.org/forums/boards/showthread.php?168773-United-Kingdom-Collector-s-Delivery-Thread%2Fpage21
... and so on ...

i've only tested this script on python 3.11, but other relatively new 3.x versions should work too

when i run this it gets the occasional error, which seem to be from captures that got back a database error (example). those database errors don't seem to consistently happen though, so you can save it again later, but with the current version of this script you'll need to just look thru the log to see which ones failed and then manually redo them later.

authenticating

you can save more pages per minute if you authenticate with your archive.org account. pass the --authenticate flag to archive.py if you're doing this, and see this section of savepagenow's docs for where to put the credentials

Footnotes

  1. your python executable might not be python3, e.g. if you're on windows it might be py -3 โ†ฉ

he-man.org-archive-thing's People

Contributors

adrianmgg avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.