Slides and video for my talk about Internet Archiving.
Nick Sweeting (Co-Founder @ https://monadical.com) @theSquashSH / @pirate
- PyCon Colombia 2020 @ Medellin: Video coming soon...
- PyGotham 2019 @ NYC: Video
- Our Networks 2019 @ Toronto: Video
- RC Never Graduate Week @ NYC: No video available.
-
2 min: Self intro
- name, company
- founded in Colombia
- poker -> consulting, fully remote in MTL and NYC now
-
5min: what got me into internet archiving
- grew up with unreliable internet
- censored internet
- hostile environment for journalism and content
- discovered wget
- created pocket-archive stream
-
5min: equifax story
- equifax breach announced, site launched
- cloned with pocket-archive-stream
- rehosted and forgot about it
- notified of equifax misposts
- goes viral, 2mil hits
- only 2nd mention of wget in NYTimes history
-
5 min: Intro to internet archiving tooling
- wget is powerful
- wget has mny options and tunables
- heres the ones I chose for ArchiveBox
- demo
-
5 min: Intro to internet archiving ecosystem
- Why is preserving information important? why does humanity create libraries and museums?
- How has it been done so far?
- what types of archives end up surviving?
- What are the benefits of decentraliced vs centralized archives?
-
5 min: Why is internet archiving hard
- Dynamic and interactive content
- Private and paywalled content
- Content ID and discovery, Base32 is hard
- Dealing with the huge amount of data directly vs curating a smaller amount
- Archive format longevity tradeoffs (WARC vs html / pdf)
-
5 min: Setting up a Wikipedia clone
- Setup Kiwix server
- Download your collections
- Create an index and rehost it
-
1 min: What can you do today to help save the internet?
- Joining the ArchiveTeam task force & archive.org community
- Running a local internet archive
Old outline: https://docs.sweeting.me/s/internet-archiving-talk