avian2 / wikiprep Goto Github PK
View Code? Open in Web Editor NEWWikipedia preprocessor and information extractor.
Home Page: http://wikiprep.sourceforge.net/
License: GNU General Public License v2.0
Wikipedia preprocessor and information extractor.
Home Page: http://wikiprep.sourceforge.net/
License: GNU General Public License v2.0
Wikiprep ============== Zemanta fork MediaWiki syntax preprocessor and information extractor ============== NOTE: Code in this repository has been unmaintained for several years. It is known to have problems with newer Wikipedia dumps and Perl versions. I no longer have at hand datasets and processing power necessary to run Wikiprep. Hence I am not able to help with any problems. -- Tomaz Solc ============== Content ======= 1. Introduction 1.1 Relation to the original Wikiprep 1.2 Current status 2. Requirements and installation 2.1 Software 2.2 Hardware 3. Usage 3.1 Parallel processing 4. Tools 5. License 6. Hacking 1. Introduction =============== Wikiprep is a Perl script that parses MediaWiki data dumps in XML format and extracts useful information from them (MediaWiki is the software that is best known for running Wikipedia and other Wikimedia foundation projects). MediaWiki uses a markup language (wiki syntax) that is optimized for easy editing by human beings. It contains a lot of quirks, special cases and odd corners that help MediaWiki correctly display wiki pages, even when they contain typing errors. This makes parsing this syntax with other software highly non-trivial. One goal of Wikiprep is to implement a parser that: o is compatible with MediaWiki as closely as possible, implementing as much functionality as is needed to achive other goals. o is as fast as possible, allowing tracking the English Wikipedia dataset as closely as possible (MediaWiki's PHP code is slow) The other goal is to use that parser to extract various information from the dump that is suitable for further processing and is stored in files with simple syntax. 1.1 Relation to the original Wikiprep Wikiprep was initialy developed by Evgeniy Gabrilovich to aid his research. Tomaz Solc adapted his script for use in semantic information extraction from Wikipedia as part of Zemanta web service. This version of Wikiprep undergone some extensive modification to be able to extract information needed by Zemanta's engine. 1.2 Current status Currently implemented MediaWiki functionality: o Templates: - Named and positional parameters, - parameter defaults, - recursive inclusion to some degree - infinite recursion breaking is currently implemented in a way incompatible with MediaWiki and - support for <noinclude>, <includeonly> and similar syntax. o Parser functions (currently #if, #ifeq, #language, #switch) o Magic words (currently urlencode, PAGENAME) o Redirects o Internal, external and interwiki links o Proper handling of <nowiki> and other pseudo-HTML tags o Disambiguation page recognition and special parsing o MediaWiki compatible date handling o Stub page recognition o Table and math syntax blocks are recognized and removed from the final output o Related article identification 2. Requirements =============== 2.1 Software You need a recent version of Perl 5 compiled for a 64 bit architecture with the following modules installed (names in parentheses are names of respective Debian packages): Parse::MediaWikiDump (libparse-mediawikidump-perl) Regexp::Common (libregexp-common-perl) Inline::C (libinline-perl) XML::Writer (libxml-writer-perl) BerkeleyDB (libberkeleydb-perl) Log::Handler (liblog-handler-perl) If you can't use Inline::C for some reason, run wikiprep with "-pureperl" flag and it will use the (roughly) equivalent pure Perl implementations. If you want to process gzip or bzip2 compressed dumps you will need gzip and bzip2 installed. Gzip is also required for the -compress option. To run unit tests you will also need xmllint utility (shipped with libxml2, libxml2-utils) 2.2 Hardware English Wikipedia is big and is getting bigger, so requirements below slowly grow over time. As of January 2011, Wikiprep output takes approximately 22 GB of hard disk space (with -compress, -format composite and default logging). Debug log takes 20 GB or more. Prescan phase requires a little over 4 GB of memory in a single Perl process (hence the requirement for a 64 bit version of Perl). In transform phase Wikiprep requires 100 to 200 MB per Perl process (but 4 GB or more is recommended for decent performance due to OS caching BDB tables) On a dual 6 core 2.6 GHz AMD Opteron with 32 GB of RAM and 12 parallel processes it takes approximately 16 hours to process English Wikipedia dump from 15 January 2011 3. Usage ======== 3.1 Installation Run the following commands from the top of the Wikiprep distribution: $ perl Makefile.PL $ make $ make test (optional) And as root: $ make install 3.2 Running The most common command to start Wikiprep on an XML dump you downloaded from Wikimedia: $ wikiprep -format composite -compress -f enwiki-20090306-pages-articles.xml.bz2 This will produce a number of files in the same directory as the dump. Alternatively, run the following to get a list of other available options: $ wikiprep To run regression tests included in the distribution, run: $ make test 3.3 Parallel processing Wikiprep is capable of using multiple CPUs, however in order to do this you must provide it with a split XML dump. First split the XML dump using the splitwiki utility in the tools subdirectory: $ bzcat enwiki-20090306-pages-articles.xml.bz2 | \ splitwiki 4 enwiki-20090306-pages-articles.xml Then run Wikiprep on the split dump using the -parallel option: $ perl wikiprep.pl -format composite -compress \ -f enwiki-20090306-pages-articles.xml.0000 You only need to run Wikiprep once on a machine and it will automatically split into as many parallel processes as there are parts of the dump (identified by the .NNNN suffix). Wikiprep processing is split into sequential parts: "prescan" which is done in one process and "transform" which can be done in parallel. So you will need to wait for prescan to finish before you will see multiple wikiprep processes running on the system. With some scripting it is possible to distribute dump parts to multiple machines and start Wikiprep separately on each machine. It's also possible to only run prescan once on a single machine and then distribute .db files it creates to other machines where only transform part is started. 3.4 Parsing MediaWiki dumps other than English Wikipedia All language and site-specific settings are located in Perl modules under /lib/Wikiprep/Config. The configuration used can be chosen via the -config command line parameter (default is Enwiki.pm) Wikiprep distribution includes configurations for Wikipedia in various languages. These were contributed by Wikiprep users and can be outdated or incomplete. At the moment the only configuration that is thoroughly tested before each release is the English Wikipedia config. Patches are always welcome. To add support for a new language or MediaWiki installation, copy Enwiki.pm to a new file (the convention is to use the same name as is used in naming Wikimedia dumps) and adjust the values within. 4. Tools ======== There are a couple of tools in the tools/ directory that aim to make your life a bit easier. Their use should be pretty much obvious from the help message they return if you run them without any command line arguments. o findtemplate.sh Find templates that support a named parameter. Good for searching for all templates that support for example the "isbn" parameter. o getpage.py Uses Special:Export feature of Wikipedia to download a single article and all templates it depends on. It then constructs a file resembling a MediaWiki XML dump, containing only these pages. Good for making testing datasets or researching why a specific page failed to parse properly. o samplewiki Takes a complete MediaWiki XML dump, takes a random sample of pages from it and creates a new (smaller) dump. o splitwiki Splits a MediaWiki XML dump into N equal parts. For example: $ bzcat enwiki-20090306-pages-articles.xml.bz2 | \ splitwiki 4 enwiki-20090306-pages-articles.xml Will produce 4 files of roughly equal length in the current directory: $ ls enwiki-20090306-pages-articles.xml.0000 enwiki-20090306-pages-articles.xml.0001 enwiki-20090306-pages-articles.xml.0002 enwiki-20090306-pages-articles.xml.0003 o riffle Takes a complete MediaWiki XML dump and some articles downloaded using the Special:Export feature and inserts these articles into the dump. Good for keeping an XML dump up-to-date without having to re-download the whole thing. 5. License ========== Copyright (C) 2007 Evgeniy Gabrilovich ([email protected]) Copyright (C) 2009 Tomaz Solc ([email protected]) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA, or see <http://www.gnu.org/licenses/> and <http://www.fsf.org/licensing/licenses/info/GPLv2.html> Some of the example files are copied from English Wikipedia and are copyright (C) by their respective authors. Text is available under the terms of the GNU Free Documentation License 6. Hacking ========== The code is pretty well documented. Have a look inside first, then ask on the mailing list. If you need to customize Wikiprep for a specific language, copy Wikiprep/Config/Enwiki.pm and change fields there. You can then use your new config with -language option. To run a test on just a particular test case, run for example: $ perl t/cases.t t/cases/infobox.xml
System info:
OS X 10.10.3
perl 5.22.0 installed with perlbrew
perl modules installed with cpanm:
Parse::MediaWikiDump (1.0.6)
Regexp::Common (2013031301)
Inline::C (0.76)
XML::Writer (0.625)
BerkeleyDB (0.55)
Log::Handler (0.87)
Output of make test
:
PERL_DL_NONLAZY=1 "/Users/sharix/perl5/perlbrew/perls/perl-5.22.0/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/anchors.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/anchors.anchor_text: No such file or directory
# Failed test 'check t/cases/anchors.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/anchors.hgw.xml: No such file or directory
# Failed test 'check t/cases/anchors.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/anchorspace.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/anchorspace.anchor_text: No such file or directory
# Failed test 'check t/cases/anchorspace.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/apple.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/apple.gum.xml: No such file or directory
# Failed test 'check t/cases/apple.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/asse.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/asse.anchor_text: No such file or directory
# Failed test 'check t/cases/asse.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/asse.gum.xml: No such file or directory
# Failed test 'check t/cases/asse.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/barzilla.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/barzilla.anchor_text: No such file or directory
# Failed test 'check t/cases/barzilla.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/barzilla.gum.xml: No such file or directory
# Failed test 'check t/cases/barzilla.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/carbon.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/carbon.hgw.xml: No such file or directory
# Failed test 'check t/cases/carbon.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/categories.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/categories.hgw.xml: No such file or directory
# Failed test 'check t/cases/categories.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/citebook.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/citeweb.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/citeweb.hgw.xml: No such file or directory
# Failed test 'check t/cases/citeweb.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/commons-theatre.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/commons-theatre.hgw.xml: No such file or directory
# Failed test 'check t/cases/commons-theatre.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/coord.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/coord.hgw.xml: No such file or directory
# Failed test 'check t/cases/coord.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/css.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/css.hgw.xml: No such file or directory
# Failed test 'check t/cases/css.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/css2.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/css2.hgw.xml: No such file or directory
# Failed test 'check t/cases/css2.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/dates.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/dates.anchor_text: No such file or directory
# Failed test 'check t/cases/dates.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/dates.hgw.xml: No such file or directory
# Failed test 'check t/cases/dates.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/dblredir.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/dblredir.anchor_text: No such file or directory
# Failed test 'check t/cases/dblredir.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/disambig.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/disambig.disambig: No such file or directory
# Failed test 'check t/cases/disambig.disambig'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/div.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/div.hgw.xml: No such file or directory
# Failed test 'check t/cases/div.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/enwiki-20080103-pages-articles.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/externalurls.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/externalurls.external_anchors: No such file or directory
# Failed test 'check t/cases/externalurls.external_anchors'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/externalurls.hgw.xml: No such file or directory
# Failed test 'check t/cases/externalurls.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/gallery.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/gallery.anchor_text: No such file or directory
# Failed test 'check t/cases/gallery.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/gallery.gum.xml: No such file or directory
# Failed test 'check t/cases/gallery.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/geo.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/geo.hgw.xml: No such file or directory
# Failed test 'check t/cases/geo.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/headings.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/headings.hgw.xml: No such file or directory
# Failed test 'check t/cases/headings.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/hurt.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/hurt.anchor_text: No such file or directory
# Failed test 'check t/cases/hurt.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/hurt.hgw.xml: No such file or directory
# Failed test 'check t/cases/hurt.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/ifeq.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/ifeq.hgw.xml: No such file or directory
# Failed test 'check t/cases/ifeq.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/imagemap.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/imagemap.anchor_text: No such file or directory
# Failed test 'check t/cases/imagemap.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/imagemap.hgw.xml: No such file or directory
# Failed test 'check t/cases/imagemap.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/images.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/images.anchor_text: No such file or directory
# Failed test 'check t/cases/images.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/images.gum.xml: No such file or directory
# Failed test 'check t/cases/images.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/infobox.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/intel.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/intel.hgw.xml: No such file or directory
# Failed test 'check t/cases/intel.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/intel.templates/1/1192206: No such file or directory
# Failed test 'check t/cases/intel.templates/1/1192206'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/interwiki-new.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/interwiki-new.gum.xml: No such file or directory
# Failed test 'check t/cases/interwiki-new.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/interwiki.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/interwiki.anchor_text: No such file or directory
# Failed test 'check t/cases/interwiki.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/interwiki.hgw.xml: No such file or directory
# Failed test 'check t/cases/interwiki.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/kaon2.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/kaon2.hgw.xml: No such file or directory
# Failed test 'check t/cases/kaon2.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/languages.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/languages.hgw.xml: No such file or directory
# Failed test 'check t/cases/languages.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/mac.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/mac.gum.xml: No such file or directory
# Failed test 'check t/cases/mac.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/magicwords.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/magicwords.hgw.xml: No such file or directory
# Failed test 'check t/cases/magicwords.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/mainarticle.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/mainarticle.hgw.xml: No such file or directory
# Failed test 'check t/cases/mainarticle.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/microsoft-new.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/microsoft-new.gum.xml: No such file or directory
# Failed test 'check t/cases/microsoft-new.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/microsoft.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/microsoft.hgw.xml: No such file or directory
# Failed test 'check t/cases/microsoft.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/microsoft.templates/1/1192206: No such file or directory
# Failed test 'check t/cases/microsoft.templates/1/1192206'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/microsoft2.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/microsoft2.hgw.xml: No such file or directory
# Failed test 'check t/cases/microsoft2.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/microsoft2.templates/1/1192206: No such file or directory
# Failed test 'check t/cases/microsoft2.templates/1/1192206'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/missinganchors.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/nestedtables.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/nestedtables.hgw.xml: No such file or directory
# Failed test 'check t/cases/nestedtables.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/noinclude.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/noinclude.hgw.xml: No such file or directory
# Failed test 'check t/cases/noinclude.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/nonfreegamecover.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/nonfreegamecover.hgw.xml: No such file or directory
# Failed test 'check t/cases/nonfreegamecover.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/nowiki.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/nowiki.hgw.xml: No such file or directory
# Failed test 'check t/cases/nowiki.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/olympics.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/olympics.hgw.xml: No such file or directory
# Failed test 'check t/cases/olympics.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/order.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/otheruses.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/otheruses.gum.xml: No such file or directory
# Failed test 'check t/cases/otheruses.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/parserfunctions.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/parserfunctions.hgw.xml: No such file or directory
# Failed test 'check t/cases/parserfunctions.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/redir.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/redir.redir.xml: No such file or directory
# Failed test 'check t/cases/redir.redir.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/related.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/related.related_links: No such file or directory
# Failed test 'check t/cases/related.related_links'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/riemann.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/riemann.hgw.xml: No such file or directory
# Failed test 'check t/cases/riemann.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/segfault.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/segfault.hgw.xml: No such file or directory
# Failed test 'check t/cases/segfault.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/stub.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/stub.gum.xml: No such file or directory
# Failed test 'check t/cases/stub.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/templates.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/templates.anchor_text: No such file or directory
# Failed test 'check t/cases/templates.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/templates.hgw.xml: No such file or directory
# Failed test 'check t/cases/templates.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/tempredir.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/tempredir.anchor_text: No such file or directory
# Failed test 'check t/cases/tempredir.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/tempredir.hgw.xml: No such file or directory
# Failed test 'check t/cases/tempredir.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/tempredir.redir.xml: No such file or directory
# Failed test 'check t/cases/tempredir.redir.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/tibet.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/tibet.hgw.xml: No such file or directory
# Failed test 'check t/cases/tibet.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/unicode.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/unicode.anchor_text: No such file or directory
# Failed test 'check t/cases/unicode.anchor_text'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
diff: t/cases/unicode.gum.xml: No such file or directory
# Failed test 'check t/cases/unicode.gum.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/urlencode.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/urlencode.hgw.xml: No such file or directory
# Failed test 'check t/cases/urlencode.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/whitespace.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/whitespace.hgw.xml: No such file or directory
# Failed test 'check t/cases/whitespace.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/window.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/window.disambig: No such file or directory
# Failed test 'check t/cases/window.disambig'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
Use of the encoding pragma is deprecated at bin/wikiprep line 29.
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
syntax error at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/Disambig.pm line 9, near "->import qw/ extractWikiLinks /"
Compilation failed in require at bin/wikiprep line 50.
BEGIN failed--compilation aborted at bin/wikiprep line 50.
# Failed test 'run t/cases/xmlcomments.xml'
# at t/cases.t line 57.
# got: '255'
# expected: '0'
diff: t/cases/xmlcomments.hgw.xml: No such file or directory
# Failed test 'check t/cases/xmlcomments.hgw.xml'
# at t/cases.t line 69.
# got: '2'
# expected: '0'
# Looks like you planned 263 tests but ran 132.
# Looks like you failed 132 tests of 132 run.
t/cases.t ...........
Dubious, test returned 132 (wstat 33792, 0x8400)
Failed 263/263 subtests
t/css.t ............. ok
Use of the encoding pragma is deprecated at t/ctemplates.t line 3.
# Failed test at t/ctemplates.t line 20.
# got: '{1}'
# expected: ''
# Failed test at t/ctemplates.t line 20.
# got: '{b}'
# expected: ''
# Failed test at t/ctemplates.t line 20.
# got: '{b}'
# expected: '{'
# Failed test at t/ctemplates.t line 20.
# got: '{'
# expected: undef
# Failed test at t/ctemplates.t line 20.
# got: '{b}'
# expected: '{{'
# Failed test at t/ctemplates.t line 20.
# got: ''
# expected: undef
# Failed test at t/ctemplates.t line 20.
# got: '{b}'
# expected: '{{c'
# Failed test at t/ctemplates.t line 20.
# got: 'c'
# expected: undef
# Failed test at t/ctemplates.t line 20.
# got: '{b}'
# expected: '{{c}'
# Failed test at t/ctemplates.t line 20.
# got: 'c}'
# expected: undef
# Failed test at t/ctemplates.t line 20.
# got: '{b}'
# expected: ''
# Failed test at t/ctemplates.t line 20.
# got: ''
# expected: 'c'
# Failed test at t/ctemplates.t line 20.
# got: '{{c}}'
# expected: ''
# Failed test at t/ctemplates.t line 20.
# got: '{c}'
# expected: undef
# Failed test at t/ctemplates.t line 20.
# got: '{b}'
# expected: 'd'
# Failed test at t/ctemplates.t line 20.
# got: 'd'
# expected: 'c'
# Failed test at t/ctemplates.t line 20.
# got: '{{c}}'
# expected: 'e'
# Failed test at t/ctemplates.t line 20.
# got: '{c}'
# expected: undef
# Failed test at t/ctemplates.t line 20.
# got: 'e'
# expected: undef
# Failed test at t/ctemplates.t line 20.
# got: '{{b}}'
# expected: ''
# Failed test at t/ctemplates.t line 20.
# got: '{ {{ }} }'
# expected: ''
# Failed test at t/ctemplates.t line 20.
# got: '{ }'
# expected: ' }}'
# Failed test at t/ctemplates.t line 20.
# got: ' '
# expected: undef
# Failed test at t/ctemplates.t line 20.
Wide character in print at /Users/sharix/perl5/perlbrew/perls/perl-5.22.0/lib/5.22.0/Test/Builder.pm line 1826.
# got: '{ΕΎ}'
# expected: ''
# Looks like you planned 123 tests but ran 137.
# Looks like you failed 24 tests of 137 run.
t/ctemplates.t ......
Dubious, test returned 24 (wstat 6144, 0x1800)
Failed 24/123 subtests
t/images.t .......... ok
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
t/languages.t ....... ok
t/namespace.t ....... ok
t/nowiki.t .......... ok
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
t/parserfunction.t .. ok
t/revision.t ........ ok
Use of the encoding pragma is deprecated at /Volumes/Macintosh HD/Users/sharix/Documents/wikidump/wikiprep/blib/lib/Wikiprep/languages.pm line 7.
t/templates.t ....... ok
t/utils.t ........... ok
Test Summary Report
-------------------
t/cases.t (Wstat: 33792 Tests: 132 Failed: 132)
Failed tests: 1-132
Non-zero exit status: 132
Parse errors: Bad plan. You planned 263 tests but ran 132.
t/ctemplates.t (Wstat: 6144 Tests: 137 Failed: 38)
Failed tests: 7, 20, 26-27, 33-34, 40-41, 47-48, 55-58
66-70, 75, 80, 89-90, 99, 124-137
Non-zero exit status: 24
Parse errors: Bad plan. You planned 123 tests but ran 137.
Files=11, Tests=383, 11 wallclock secs ( 0.08 usr 0.03 sys + 9.08 cusr 1.18 csys = 10.37 CPU)
Result: FAIL
Failed 2/11 test programs. 170/383 subtests failed.
make: *** [test_dynamic] Error 255
Hi,
I am trying to process the Wikipedia dump (February 2017) and while using threads to speed the process up, I got this error:
Mar 22 10:43:12 [ERROR] One of the workers died! Results will be incomplete!
I've tried to process the dump many times and each time one of the threads dies. I've tried to use 8 threads and another time I used 6 threads, the same error happens. It is usually happens with only 1 thread.
Also the final percentage of the processed content (of the whole Wikipedia dump) is ~%86
Hello,
I wonder why your release doesn't have wikipedia category hierarchies?
Is it complicated to add it?
Thank you
Hi,
I've used wikiprep on a 2012 dump of WIkipedia (8GB) on my 4GB ram machine, ubuntu linux. I didn't split dump file and after the prescan phase, wikiprep process crashed with Out of memory!
error. I know that the prescan phase was completed because my log file ends with these lines:
Jul 01 08:31:52 [NOTICE] total 12300930 pages (30654051304 bytes)
Jul 01 08:32:00 [NOTICE] Loaded 5992645 titles
Jul 01 08:32:00 [NOTICE] Loaded 5471056 redirects
Jul 01 08:32:01 [NOTICE] Loaded 346913 templates
Is it possible to continue process from the transform phase?
I also tested -transform
option and that resulted this error: Can't call method "filter_fetch_value" on an undefined value at /usr/local/bin/wikiprep line 483
using Wikiprep-3.04
version.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.