GithubHelp home page GithubHelp logo

Comments (9)

xt2357 avatar xt2357 commented on June 26, 2024

I have the same problem too.

from wikiextractor.

attardi avatar attardi commented on June 26, 2024

I tested the extractor (version 2.39) on the farsi dump you mention and it works correctly on my ubuntu machine.

The variable templatePrefix is a global variable that is assigned a value obtained from this field in the siteinfo xml element:
<namespace key="10" case="first-letter">الگو</namespace>
within function load_templates.

from wikiextractor.

xt2357 avatar xt2357 commented on June 26, 2024

I downloaded the source code of version 2.39 from this site, but I found that variable 'templatePrefix' has no definition in global scope, I mean every appearance of identifier 'templatePrefix' is in a function, my pycharm ide also says that there is a mistake of 'Global variable 'templatePrefix' is undefined at the module level.'

from wikiextractor.

attardi avatar attardi commented on June 26, 2024

It is declared as global wherever it is used.
That cannot be the cause of your problems, otherwise it would have failed within function load_templates, when it is first used.
Anyhow, you can try to add an assignment at the beginning of the file
templatePrefix = ''
and see if that fixes it.

from wikiextractor.

xt2357 avatar xt2357 commented on June 26, 2024

I added the statement ' templatePrefix = '' ' at the beginning of the file, but I encountered a new exception as below:

INFO: Starting page extraction from zhwiki-20151002-pages-articles-multistream.xml.
INFO: Using 3 extract processes.
Process Process-1:
Traceback (most recent call last):
File "C:\Anaconda\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Anaconda\lib\multiprocessing\process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "F:\wiki_dump\WikiExtractor.py", line 2431, in reduce_process
output.write(ordering_buffer.pop(next_ordinal))
File "F:\wiki_dump\WikiExtractor.py", line 2137, in write
self.reserve(len(data))
File "F:\wiki_dump\WikiExtractor.py", line 2132, in reserve
if self.file.tell() + size > self.max_file_size:
ValueError: I/O operation on closed file

it seems like the process terminated because of some unexpected reasons(does the error message : 'Process Process-1:' means that the process terminated with returned value -1?)

from wikiextractor.

attardi avatar attardi commented on June 26, 2024

You are using 3 processes and process n. 1 raises the error.
You are running under Windows: it might be that the implementation of memory buffers works differently from linux.

from wikiextractor.

xt2357 avatar xt2357 commented on June 26, 2024

thanks for your advice, I'll try it in linux.

from wikiextractor.

attardi avatar attardi commented on June 26, 2024

You might try using the Python version of StringIO.
Modify the import to this:

from StringIO import StringIO

from wikiextractor.

xt2357 avatar xt2357 commented on June 26, 2024

Unfortunately it doesn't work in my environment, thank you anyway :)

from wikiextractor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.