GithubHelp home page GithubHelp logo

hhy5277 / wikiteam Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wikiteam/wikiteam

0.0 2.0 0.0 52.42 MB

Tools for downloading and preserving wikis

Home Page: https://github.com/WikiTeam

License: GNU General Public License v3.0

Shell 0.22% Python 92.96% Perl 3.38% TeX 3.44%

wikiteam's Introduction

WikiTeam

We archive wikis, from Wikipedia to tiniest wikis

WikiTeam software is a set of tools for archiving wikis. They work on MediaWiki wikis, but we want to expand to other wiki engines. As of January 2017, WikiTeam has preserved more than 27,000 stand-alone wikis, several wikifarms, regular Wikipedia dumps and 34 TB of Wikimedia Commons images.

There are thousands of wikis in the Internet. Every day some of them are no longer publicly available and, due to lack of backups, lost forever. Millions of people download tons of media files (movies, music, books, etc) from the Internet, serving as a kind of distributed backup. Wikis, most of them under free licenses, disappear from time to time because nobody grabbed a copy of them. That is a shame that we would like to solve.

WikiTeam is the Archive Team (GitHub) subcommittee on wikis. It was founded and originally developed by Emilio J. Rodríguez-Posada, a Wikipedia veteran editor and amateur archivist. Many people have helped by sending suggestions, reporting bugs, writing documentation, providing help in the mailing list and making wiki backups. Thanks to all, especially to: Federico Leva, Alex Buie, Scott Boyd, Hydriz, Platonides, Ian McEwen, Mike Dupont, balr0g and PiRSquared17.

Documentation Source code Download available backups Community Follow us on Twitter

Quick guide

This is a very quick guide for the most used features of WikiTeam tools. For further information, read the tutorial and the rest of the documentation. You can also ask in the mailing list.

Requirements

Confirm you satisfy the requirements:

pip install --upgrade -r requirements.txt

or, if you don't have enough permissions for the above,

pip install --user --upgrade -r requirements.txt

Download any wiki

To download any wiki, use one of the following options:

python dumpgenerator.py http://wiki.domain.org --xml --images (complete XML histories and images)

If the script can't find itself the API and/or index.php paths, then you can provide them:

python dumpgenerator.py --api=http://wiki.domain.org/w/api.php --xml --images

python dumpgenerator.py --api=http://wiki.domain.org/w/api.php --index=http://wiki.domain.org/w/index.php --xml --images

If you only want the XML histories, just use --xml. For only the images, just --images. For only the current version of every page, --xml --curonly.

You can resume an aborted download:

python dumpgenerator.py --api=http://wiki.domain.org/w/api.php --xml --images --resume --path=/path/to/incomplete-dump

See more options:

python dumpgenerator.py --help

Download Wikimedia dumps

To download Wikimedia XML dumps (Wikipedia, Wikibooks, Wikinews, etc) you can run:

python wikipediadownloader.py (download all projects)

See more options:

python wikipediadownloader.py --help

Download Wikimedia Commons images

There is a script for this, but we have uploaded the tarballs to Internet Archive, so it's more useful to reseed their torrents than to re-generate old ones with the script.

Developers

Build Status

You can run tests easily by using the tox command. It is probably already present in your operating system, you would need version 1.6. If it is not, you can download it from pypi with: pip install tox.

Example usage:

$ tox
py27 runtests: commands[0] | nosetests --nocapture --nologcapture
Checking http://wiki.annotation.jp/api.php
Trying to parse かずさアノテーション - ソーシャル・ゲノム・アノテーション.jpg from API
Retrieving image filenames
.    Found 266 images
.
-------------------------------------------
Ran 1 test in 2.253s

OK
_________________ summary _________________
  py27: commands succeeded
  congratulations :)
$

wikiteam's People

Contributors

ab2525 avatar alexia avatar balr0g avatar danieloaks avatar elifoster avatar emijrp avatar erkan-yilmaz avatar hashar avatar hydriz avatar makoshark avatar mirkosertic avatar mrshu avatar nemobis avatar pirsquared17 avatar plexish avatar scottdb avatar seanyeh avatar southparkfan avatar timsc avatar tyisi avatar vss-devel avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.