GithubHelp home page GithubHelp logo

imclab / tinyarchive Goto Github PK

View Code? Open in Web Editor NEW

This project forked from archiveteam/tinyarchive

0.0 2.0 0.0 760 KB

Software behind tracker.tinyarchive.org - Warning: Very hacky code

License: GNU General Public License v3.0

tinyarchive's Introduction

Introduction

The tinyarchive repository is a loose collection of scripts to help with backing up URL shorteners. Most scripts are written in Python.

Concepts

Tinyarchive database

The very core of the whole thing. It consists of multiple Berkely DB B-Tree databases that contain mappings from short url codes to long URLs. For each shortener there is one database. For example, the database bitly.db might contain the following mappings:

Tracker

The tracker is a completely separate application that hands out tasks to tinyback instances.

trim-old

When tr.im shut down, part of it's database was preserved. In 2013 tr.im was relaunched by Matthew Kelly, but all the old shortlinks were lost. With a little magic, it was possible to refill the new tr.im database with links from the old tr.im database. One such magic trick is trim-old.tinyarchive.org: Since tr.im had trouble with some URLs (for whatever reason), instead of directly linking to the URL, it was created to redirect to trim-old.tinyarchive.org/$UUID and then is redirected to the real URL from there.

Scripts

Database scripts

create_release.py

Creates a new release from the database. By specifying the location of a previous release, the create_release.py script can check which files have not changed and avoid recompressing them, which would waste time and possibly change their hashsum. The code_to_file.json file is used to map from a shortener name and code to a specific output file.

create_trim-old_db.py

Creates the sqlite3 database used by the trim-old website.

import.py

Imports finished tasks from the tracker into the database.

import_tnyim.py

One-off script to import CSV dumps from the URL shortener at tny.im.

release_import.py

Opposite of create_release.py: Takes a release and imports it into the database, using the code_to_file.json file to map from input file to URL shortener name.

stats.py

Outputs a JSON structure containing a mapping from URL shortener name to number of shorturls in the database.

Tracker scripts

cleanup.py

Calls the tracker's cleanup admin function, which removes finished tasks and resets assignments for tasks assigned over 30 minutes ago.

fetch_finished.py

Fetches a list of finished tasks from the tracker, then for each task first downloads the payload and then tells the tracker to mark the task as deleted. For each task, a JSON file with the task metadata and a corresponding txt.gz with the payload is stored in the output directory.

redo.py

Takes a JSON file containing task metadata and registers a new task with the same parameters at the tracker.

task_create.py

File with some helper functions to create new tasks at the tracker.

twitter_spritzer_import.py

Untested and unfinished tool to import the unrolled URLS from the Twitter spritzer provided by swebb.

tinyarchive's People

Contributors

ersi avatar soult avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.