openzim / zimfarm Goto Github PK
View Code? Open in Web Editor NEWFarm operated by bots to grow and harvest new zim files
Home Page: https://farm.openzim.org
License: GNU General Public License v3.0
Farm operated by bots to grow and harvest new zim files
Home Page: https://farm.openzim.org
License: GNU General Public License v3.0
Add enable and disable schedule API
Currently all resource APIs are not protected.
I have added JWT token auth in worker-mwoffliner
branch, we need to verify tokens in all resource APIs, throw exceptions if token does not exist or is invalid.
Correctly generate correct mwoffliner command when flag value is an array
Currently, we use a SQLite db connection in dispatcher_backend
, but I think sooner or later it needs to be 1) separated and 2) be replaced with a NoSQL solution.
Everytime the dispatcher_backend
container is rebooted (for example, because of a update), the sqlite file is removed, which means all task history data are lost. If we separate database to another container, we don't have to stop the database container.
We have scrapper for Phets https://phet.colorado.edu/. It is available herer: https://hub.docker.com/r/openzim/phet/. Be just calling npm start
is generate all the Phets ZIM files in one directory.
In the README, with examples.
even if this should run finally only on one place, this should not be hard coded in the yml file but given as argument at the (docker-compose) run-time.
to prevent logging
$ curl -X POST -H "token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJkaXNwYXRjaGVyLWJhY2tlbmQiLCJleHAiOjE1MTEzNzg5MzEsImlhdCI6MTUxMTM3NzEzMSwianRpIjoiMzNmMWVhYmUtOGNjMC00NDUwLWE1MjYtYmEwMzkwY2M2N2YxIiwidXNlcm5hbWUiOiJhZG1pbiIsInNjb3BlIjp7ImFkbWluIjp0cnVlfX0.rS5r_U0wo7ro4N0_c_rKm6IUoPzzsxwiPX1mlq-Z6wc" --data "[{ "mwUrl": "https://bm.wikipedia.org/", "adminEmail": "[email protected]", "verbose": true }]" "https://farm.openzim.org/api/task/mwoffliner"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>TypeError: 'NoneType' object is not iterable // Werkzeug Debugger</title>
<link rel="stylesheet" href="?__debugger__=yes&cmd=resource&f=style.css"
type="text/css">
<!-- We need to make sure this has a favicon so that the debugger does
not by accident trigger a request to /favicon.ico which might
change the application state. -->
<link rel="shortcut icon"
href="?__debugger__=yes&cmd=resource&f=console.png">
<script src="?__debugger__=yes&cmd=resource&f=jquery.js"></script>
<script src="?__debugger__=yes&cmd=resource&f=debugger.js"></script>
<script type="text/javascript">
var TRACEBACK = 139747121042936,
CONSOLE_MODE = false,
EVALEX = true,
EVALEX_TRUSTED = false,
SECRET = "YqXW2eTSMtesh66WWQq3";
</script>
</head>
<body style="background-color: #fff">
<div class="debugger">
<h1>builtins.TypeError</h1>
<div class="detail">
<p class="errormsg">TypeError: 'NoneType' object is not iterable</p>
</div>
<h2 class="traceback">Traceback <em>(most recent call last)</em></h2>
<div class="traceback">
<ul><li><div class="frame" id="frame-139747121041536">
<h4>File <cite class="filename">"/usr/local/lib/python3.6/site-packages/flask/app.py"</cite>,
line <em class="line">1997</em>,
in <code class="function">__call__</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span>error = None</pre>
<pre class="line before"><span class="ws"> </span>ctx.auto_pop(error)</pre>
<pre class="line before"><span class="ws"></span> </pre>
<pre class="line before"><span class="ws"> </span>def __call__(self, environ, start_response):</pre>
<pre class="line before"><span class="ws"> </span>"""Shortcut for :attr:`wsgi_app`."""</pre>
<pre class="line current"><span class="ws"> </span>return self.wsgi_app(environ, start_response)</pre>
<pre class="line after"><span class="ws"></span> </pre>
<pre class="line after"><span class="ws"> </span>def __repr__(self):</pre>
<pre class="line after"><span class="ws"> </span>return '<%s %r>' % (</pre>
<pre class="line after"><span class="ws"> </span>self.__class__.__name__,</pre>
<pre class="line after"><span class="ws"> </span>self.name,</pre></div>
</div>
<li><div class="frame" id="frame-139747121043384">
<h4>File <cite class="filename">"/usr/local/lib/python3.6/site-packages/flask/app.py"</cite>,
line <em class="line">1985</em>,
in <code class="function">wsgi_app</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span>try:</pre>
<pre class="line before"><span class="ws"> </span>try:</pre>
<pre class="line before"><span class="ws"> </span>response = self.full_dispatch_request()</pre>
<pre class="line before"><span class="ws"> </span>except Exception as e:</pre>
<pre class="line before"><span class="ws"> </span>error = e</pre>
<pre class="line current"><span class="ws"> </span>response = self.handle_exception(e)</pre>
<pre class="line after"><span class="ws"> </span>except:</pre>
<pre class="line after"><span class="ws"> </span>error = sys.exc_info()[1]</pre>
<pre class="line after"><span class="ws"> </span>raise</pre>
<pre class="line after"><span class="ws"> </span>return response(environ, start_response)</pre>
<pre class="line after"><span class="ws"> </span>finally:</pre></div>
</div>
<li><div class="frame" id="frame-139747121041760">
<h4>File <cite class="filename">"/usr/local/lib/python3.6/site-packages/flask/app.py"</cite>,
line <em class="line">1540</em>,
in <code class="function">handle_exception</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span># if we want to repropagate the exception, we can attempt to</pre>
<pre class="line before"><span class="ws"> </span># raise it with the whole traceback in case we can do that</pre>
<pre class="line before"><span class="ws"> </span># (the function was actually called from the except part)</pre>
<pre class="line before"><span class="ws"> </span># otherwise, we just raise the error again</pre>
<pre class="line before"><span class="ws"> </span>if exc_value is e:</pre>
<pre class="line current"><span class="ws"> </span>reraise(exc_type, exc_value, tb)</pre>
<pre class="line after"><span class="ws"> </span>else:</pre>
<pre class="line after"><span class="ws"> </span>raise e</pre>
<pre class="line after"><span class="ws"></span> </pre>
<pre class="line after"><span class="ws"> </span>self.log_exception((exc_type, exc_value, tb))</pre>
<pre class="line after"><span class="ws"> </span>if handler is None:</pre></div>
</div>
<li><div class="frame" id="frame-139747121045176">
<h4>File <cite class="filename">"/usr/local/lib/python3.6/site-packages/flask/_compat.py"</cite>,
line <em class="line">33</em>,
in <code class="function">reraise</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span>from io import StringIO</pre>
<pre class="line before"><span class="ws"></span> </pre>
<pre class="line before"><span class="ws"> </span>def reraise(tp, value, tb=None):</pre>
<pre class="line before"><span class="ws"> </span>if value.__traceback__ is not tb:</pre>
<pre class="line before"><span class="ws"> </span>raise value.with_traceback(tb)</pre>
<pre class="line current"><span class="ws"> </span>raise value</pre>
<pre class="line after"><span class="ws"></span> </pre>
<pre class="line after"><span class="ws"> </span>implements_to_string = _identity</pre>
<pre class="line after"><span class="ws"></span> </pre>
<pre class="line after"><span class="ws"></span>else:</pre>
<pre class="line after"><span class="ws"> </span>text_type = unicode</pre></div>
</div>
<li><div class="frame" id="frame-139747121042712">
<h4>File <cite class="filename">"/usr/local/lib/python3.6/site-packages/flask/app.py"</cite>,
line <em class="line">1982</em>,
in <code class="function">wsgi_app</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span>ctx = self.request_context(environ)</pre>
<pre class="line before"><span class="ws"> </span>ctx.push()</pre>
<pre class="line before"><span class="ws"> </span>error = None</pre>
<pre class="line before"><span class="ws"> </span>try:</pre>
<pre class="line before"><span class="ws"> </span>try:</pre>
<pre class="line current"><span class="ws"> </span>response = self.full_dispatch_request()</pre>
<pre class="line after"><span class="ws"> </span>except Exception as e:</pre>
<pre class="line after"><span class="ws"> </span>error = e</pre>
<pre class="line after"><span class="ws"> </span>response = self.handle_exception(e)</pre>
<pre class="line after"><span class="ws"> </span>except:</pre>
<pre class="line after"><span class="ws"> </span>error = sys.exc_info()[1]</pre></div>
</div>
<li><div class="frame" id="frame-139747121045456">
<h4>File <cite class="filename">"/usr/local/lib/python3.6/site-packages/flask/app.py"</cite>,
line <em class="line">1614</em>,
in <code class="function">full_dispatch_request</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span>request_started.send(self)</pre>
<pre class="line before"><span class="ws"> </span>rv = self.preprocess_request()</pre>
<pre class="line before"><span class="ws"> </span>if rv is None:</pre>
<pre class="line before"><span class="ws"> </span>rv = self.dispatch_request()</pre>
<pre class="line before"><span class="ws"> </span>except Exception as e:</pre>
<pre class="line current"><span class="ws"> </span>rv = self.handle_user_exception(e)</pre>
<pre class="line after"><span class="ws"> </span>return self.finalize_request(rv)</pre>
<pre class="line after"><span class="ws"></span> </pre>
<pre class="line after"><span class="ws"> </span>def finalize_request(self, rv, from_error_handler=False):</pre>
<pre class="line after"><span class="ws"> </span>"""Given the return value from a view function this finalizes</pre>
<pre class="line after"><span class="ws"> </span>the request by converting it into a response and invoking the</pre></div>
</div>
<li><div class="frame" id="frame-139747121043104">
<h4>File <cite class="filename">"/usr/local/lib/python3.6/site-packages/flask/app.py"</cite>,
line <em class="line">1517</em>,
in <code class="function">handle_user_exception</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span>return self.handle_http_exception(e)</pre>
<pre class="line before"><span class="ws"></span> </pre>
<pre class="line before"><span class="ws"> </span>handler = self._find_error_handler(e)</pre>
<pre class="line before"><span class="ws"></span> </pre>
<pre class="line before"><span class="ws"> </span>if handler is None:</pre>
<pre class="line current"><span class="ws"> </span>reraise(exc_type, exc_value, tb)</pre>
<pre class="line after"><span class="ws"> </span>return handler(e)</pre>
<pre class="line after"><span class="ws"></span> </pre>
<pre class="line after"><span class="ws"> </span>def handle_exception(self, e):</pre>
<pre class="line after"><span class="ws"> </span>"""Default exception handling that kicks in when an exception</pre>
<pre class="line after"><span class="ws"> </span>occurs that is not caught. In debug mode the exception will</pre></div>
</div>
<li><div class="frame" id="frame-139747121044112">
<h4>File <cite class="filename">"/usr/local/lib/python3.6/site-packages/flask/_compat.py"</cite>,
line <em class="line">33</em>,
in <code class="function">reraise</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span>from io import StringIO</pre>
<pre class="line before"><span class="ws"></span> </pre>
<pre class="line before"><span class="ws"> </span>def reraise(tp, value, tb=None):</pre>
<pre class="line before"><span class="ws"> </span>if value.__traceback__ is not tb:</pre>
<pre class="line before"><span class="ws"> </span>raise value.with_traceback(tb)</pre>
<pre class="line current"><span class="ws"> </span>raise value</pre>
<pre class="line after"><span class="ws"></span> </pre>
<pre class="line after"><span class="ws"> </span>implements_to_string = _identity</pre>
<pre class="line after"><span class="ws"></span> </pre>
<pre class="line after"><span class="ws"></span>else:</pre>
<pre class="line after"><span class="ws"> </span>text_type = unicode</pre></div>
</div>
<li><div class="frame" id="frame-139747121042824">
<h4>File <cite class="filename">"/usr/local/lib/python3.6/site-packages/flask/app.py"</cite>,
line <em class="line">1612</em>,
in <code class="function">full_dispatch_request</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span>self.try_trigger_before_first_request_functions()</pre>
<pre class="line before"><span class="ws"> </span>try:</pre>
<pre class="line before"><span class="ws"> </span>request_started.send(self)</pre>
<pre class="line before"><span class="ws"> </span>rv = self.preprocess_request()</pre>
<pre class="line before"><span class="ws"> </span>if rv is None:</pre>
<pre class="line current"><span class="ws"> </span>rv = self.dispatch_request()</pre>
<pre class="line after"><span class="ws"> </span>except Exception as e:</pre>
<pre class="line after"><span class="ws"> </span>rv = self.handle_user_exception(e)</pre>
<pre class="line after"><span class="ws"> </span>return self.finalize_request(rv)</pre>
<pre class="line after"><span class="ws"></span> </pre>
<pre class="line after"><span class="ws"> </span>def finalize_request(self, rv, from_error_handler=False):</pre></div>
</div>
<li><div class="frame" id="frame-139747121045120">
<h4>File <cite class="filename">"/usr/local/lib/python3.6/site-packages/flask/app.py"</cite>,
line <em class="line">1598</em>,
in <code class="function">dispatch_request</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span># request came with the OPTIONS method, reply automatically</pre>
<pre class="line before"><span class="ws"> </span>if getattr(rule, 'provide_automatic_options', False) \</pre>
<pre class="line before"><span class="ws"> </span>and req.method == 'OPTIONS':</pre>
<pre class="line before"><span class="ws"> </span>return self.make_default_options_response()</pre>
<pre class="line before"><span class="ws"> </span># otherwise dispatch to the handler for that endpoint</pre>
<pre class="line current"><span class="ws"> </span>return self.view_functions[rule.endpoint](**req.view_args)</pre>
<pre class="line after"><span class="ws"></span> </pre>
<pre class="line after"><span class="ws"> </span>def full_dispatch_request(self):</pre>
<pre class="line after"><span class="ws"> </span>"""Dispatches the request and on top of that performs request</pre>
<pre class="line after"><span class="ws"> </span>pre and postprocessing as well as HTTP exception catching and</pre>
<pre class="line after"><span class="ws"> </span>error handling.</pre></div>
</div>
<li><div class="frame" id="frame-139747121043496">
<h4>File <cite class="filename">"/app/routes/task.py"</cite>,
line <em class="line">51</em>,
in <code class="function">enqueue_mwoffliner</code></h4>
<div class="source"><pre class="line before"><span class="ws"> </span>'options': config,</pre>
<pre class="line before"><span class="ws"> </span>'steps': []</pre>
<pre class="line before"><span class="ws"> </span>})</pre>
<pre class="line before"><span class="ws"></span> </pre>
<pre class="line before"><span class="ws"> </span>task_configs = request.get_json()</pre>
<pre class="line current"><span class="ws"> </span>for task_config in task_configs:</pre>
<pre class="line after"><span class="ws"> </span>check_task(task_config)</pre>
<pre class="line after"><span class="ws"> </span>for task_config in task_configs:</pre>
<pre class="line after"><span class="ws"> </span>enqueue_task(task_config)</pre>
<pre class="line after"><span class="ws"> </span>return Response(status=202)</pre>
<pre class="line after"><span class="ws"></span> </pre></div>
</div>
</ul>
<blockquote>TypeError: 'NoneType' object is not iterable</blockquote>
</div>
<div class="plain">
<form action="/?__debugger__=yes&cmd=paste" method="post">
<p>
<input type="hidden" name="language" value="pytb">
This is the Copy/Paste friendly version of the traceback. <span
class="pastemessage">You can also paste this traceback into
a <a href="https://gist.github.com/">gist</a>:
<input type="submit" value="create paste"></span>
</p>
<textarea cols="50" rows="10" name="code" readonly>Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1997, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1985, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1540, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/routes/task.py", line 51, in enqueue_mwoffliner
for task_config in task_configs:
TypeError: 'NoneType' object is not iterable</textarea>
</form>
</div>
<div class="explanation">
The debugger caught an exception in your WSGI application. You can now
look at the traceback which led to the error. <span class="nojavascript">
If you enable JavaScript you can also use additional features such as code
execution (if the evalex feature is enabled), automatic pasting of the
exceptions and much more.</span>
</div>
<div class="footer">
Brought to you by <strong class="arthur">DON'T PANIC</strong>, your
friendly Werkzeug powered traceback interpreter.
</div>
</div>
<div class="pin-prompt">
<div class="inner">
<h3>Console Locked</h3>
<p>
The console is locked and needs to be unlocked by entering the PIN.
You can find the PIN printed out on the standard output of your
shell that runs the server.
<form>
<p>PIN:
<input type=text name=pin size=14>
<input type=submit name=btn value="Confirm Pin">
</form>
</div>
</div>
</body>
</html>
<!--
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1997, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1985, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1540, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/routes/task.py", line 51, in enqueue_mwoffliner
for task_config in task_configs:
TypeError: 'NoneType' object is not iterable
-->
I think eventually there is gonna be two type of workers: local and remote.
The way file is synced could be
Currently we do not catch HTTP exception when auth fail, result in socket error.
source
For stable dispatcher backend APIs, we should document them, so future contributors could easily refer to them.
Please refer to API example for more.
See here for details https://github.com/openzim/zimfarm/network/dependencies
The task query API does not always return the correct result. I am not totally sure why, but I think this is due to the rpc result backend has the restriction of one queue per client. But in our case, we have 4 queues (4 uWSGI processes). If the uWSGI process handled the status query request is not the same that started the task, celery will not be able to determine the task's status, hence return PENDING
.
The difficulty here is that these options take as argument the path of of file:
--articleList
--customZimFavicon
A solution should be found to transfert the files data from the API client, through the queue to the final mwoffliner worker.
The GET /task
API just return a list of tasks in the order they are added to database. But this API should be more sophisticated. We should add features including:
At least for Redis, otherwise docker-compose might fail to start the containers because of conflicting ports with hosting system (in case this hosting system runs redis for example).
https://www.speedguide.net/port.php can be used to identify high-ranged ports which are not associated to any application.
All dispatcher related images should be removed form docker hub, as no one is / will be using them
Currently we use plain FTP to transfer files from worker to warehouse. This brings a security risk, as username and user token are transferred in plain text.
We also had a discussion regarding using rsync. The problem is it is not possible to enter the password programatically.
Here I propose another approach, inspired by GitHub. (thanks @blajzer)
~/.ssh
on host to the same path inside of the container on startupWhen worker need to upload a file:
We can not assume that each worker is going to be able to do all possible ZIM files. One of the reason is that a worker might not have enough disk storage allowed to do an extra big ZIM file (like Wikipedia in English with videos or StackOverflow).
For this reason we need:
The dispatcher should then dispatch the jobs properly.
Would be nice if http://farm.openzim.org/ redirects to https://farm.openzim.org/. Your Nginx reverse proxy can do that easily, it is just a configuration stuff.
If monitor boot up slightly earlier than rabbit or flask server, there is gonna be a lot of auth failure. We should catch those exceptions, wait and retry the connection.
add user workflow:
worker start up workflow:
I get a "502 Bad Gateway" for "http://localhost:8080/"
Here is the build log:
Successfully built 585bda2759cb
Creating zimfarm_redis_1
Creating zimfarm_dispatcher_frontend_1
Creating zimfarm_rabbit_1
Creating zimfarm_dispatcher_backend_1
Creating zimfarm_proxy_1
Creating zimfarm_worker_1
Attaching to zimfarm_dispatcher_frontend_1, zimfarm_redis_1, zimfarm_rabbit_1, zimfarm_dispatcher_backend_1, zimfarm_proxy_1, zimfarm_worker_1
redis_1 | WARNING: no logs are available with the 'none' log driver
rabbit_1 | WARNING: no logs are available with the 'none' log driver
dispatcher_frontend_1 | npm info it worked if it ends with ok
dispatcher_frontend_1 | npm info using [email protected]
dispatcher_frontend_1 | npm info using [email protected]
dispatcher_frontend_1 | npm info lifecycle [email protected]~prestart: [email protected]
dispatcher_frontend_1 |
dispatcher_frontend_1 | > [email protected] prestart /app
dispatcher_frontend_1 | > npm run build
dispatcher_frontend_1 |
dispatcher_frontend_1 | npm info it worked if it ends with ok
proxy_1 | WARNING: no logs are available with the 'none' log driver
dispatcher_backend_1 | * Running on http://0.0.0.0:80/ (Press CTRL+C to quit)
dispatcher_backend_1 | * Restarting with stat
dispatcher_frontend_1 | npm info using [email protected]
dispatcher_frontend_1 | npm info using [email protected]
dispatcher_frontend_1 | npm info lifecycle [email protected]~prebuild: [email protected]
dispatcher_frontend_1 | npm info lifecycle [email protected]~build: [email protected]
dispatcher_frontend_1 |
dispatcher_frontend_1 | > [email protected] build /app
dispatcher_frontend_1 | > tsc -p src/
dispatcher_frontend_1 |
worker_1 | Stopping redis-server: redis-server.
dispatcher_backend_1 | * Debugger is active!
dispatcher_backend_1 | * Debugger pin code: 958-657-090
worker_1 | Starting redis-server: redis-server.
worker_1 | /usr/local/lib/python3.5/dist-packages/celery/platforms.py:793: RuntimeWarning: You're running the worker with superuser privileges: this is
worker_1 | absolutely not recommended!
worker_1 |
worker_1 | Please specify a different user using the -u option.
worker_1 |
worker_1 | User information: uid=0 euid=0 gid=0 egid=0
worker_1 |
worker_1 | uid=uid, euid=euid, gid=gid, egid=egid,
worker_1 | [2017-06-18 09:41:50,741: ERROR/MainProcess] consumer: Cannot connect to amqp://admin:**@rabbit:5672//: [Errno 111] Connection refused.
worker_1 | Trying again in 2.00 seconds...
worker_1 |
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(43,40): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(2364,40): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(2366,46): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(2477,23): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(2478,17): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(2479,17): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3290,29): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3299,37): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3586,30): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3692,23): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3693,21): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3698,41): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3706,43): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3824,42): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3824,57): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3891,23): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3892,21): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3893,21): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3972,23): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3973,21): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(3974,21): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4003,41): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4003,56): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4012,23): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4024,23): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4025,21): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4026,21): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4038,23): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4039,21): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4040,21): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4151,25): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4151,46): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4151,48): error TS1139: Type parameter declaration expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4158,31): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4165,20): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4172,32): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4179,25): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4186,26): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4193,22): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4200,22): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4207,38): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4214,20): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4221,24): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4228,26): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4235,21): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4242,22): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4258,9): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4266,9): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4274,9): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4282,9): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4290,9): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4293,29): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4306,44): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4306,46): error TS1139: Type parameter declaration expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4307,9): error TS1136: Property assignment expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4307,14): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4307,37): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4307,68): error TS1109: Expression expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4307,77): error TS1005: ',' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4307,85): error TS1005: ';' expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4307,92): error TS1109: Expression expected.
dispatcher_frontend_1 | node_modules/@types/jquery/index.d.ts(4451,1): error TS1128: Declaration or statement expected.
dispatcher_frontend_1 |
dispatcher_frontend_1 | npm info lifecycle [email protected]~build: Failed to exec build script
dispatcher_frontend_1 | npm ERR! Linux 4.4.0-78-generic
dispatcher_frontend_1 | npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "build"
dispatcher_frontend_1 | npm ERR! node v7.10.0
dispatcher_frontend_1 | npm ERR! npm v4.2.0
dispatcher_frontend_1 | npm ERR! code ELIFECYCLE
dispatcher_frontend_1 | npm ERR! errno 2
dispatcher_frontend_1 | npm ERR! [email protected] build: `tsc -p src/`
dispatcher_frontend_1 | npm ERR! Exit status 2
dispatcher_frontend_1 | npm ERR!
dispatcher_frontend_1 | npm ERR! Failed at the [email protected] build script 'tsc -p src/'.
dispatcher_frontend_1 | npm ERR! Make sure you have the latest version of node.js and npm installed.
dispatcher_frontend_1 | npm ERR! If you do, this is most likely a problem with the zimfarm package,
dispatcher_frontend_1 | npm ERR! not with npm itself.
dispatcher_frontend_1 | npm ERR! Tell the author that this fails on your system:
dispatcher_frontend_1 | npm ERR! tsc -p src/
dispatcher_frontend_1 | npm ERR! You can get information on how to open an issue for this project with:
dispatcher_frontend_1 | npm ERR! npm bugs zimfarm
dispatcher_frontend_1 | npm ERR! Or if that isn't available, you can get their info via:
dispatcher_frontend_1 | npm ERR! npm owner ls zimfarm
dispatcher_frontend_1 | npm ERR! There is likely additional logging output above.
dispatcher_frontend_1 |
dispatcher_frontend_1 | npm ERR! Please include the following file with any support request:
dispatcher_frontend_1 | npm ERR! /root/.npm/_logs/2017-06-18T09_41_51_757Z-debug.log
dispatcher_frontend_1 |
dispatcher_frontend_1 | npm info lifecycle [email protected]~prestart: Failed to exec prestart script
dispatcher_frontend_1 | npm ERR! Linux 4.4.0-78-generic
dispatcher_frontend_1 | npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "start"
dispatcher_frontend_1 | npm ERR! node v7.10.0
dispatcher_frontend_1 | npm ERR! npm v4.2.0
dispatcher_frontend_1 | npm ERR! code ELIFECYCLE
dispatcher_frontend_1 | npm ERR! errno 2
dispatcher_frontend_1 | npm ERR! [email protected] prestart: `npm run build`
dispatcher_frontend_1 | npm ERR! Exit status 2
dispatcher_frontend_1 | npm ERR!
dispatcher_frontend_1 | npm ERR! Failed at the [email protected] prestart script 'npm run build'.
dispatcher_frontend_1 | npm ERR! Make sure you have the latest version of node.js and npm installed.
dispatcher_frontend_1 | npm ERR! If you do, this is most likely a problem with the zimfarm package,
dispatcher_frontend_1 | npm ERR! not with npm itself.
dispatcher_frontend_1 | npm ERR! Tell the author that this fails on your system:
dispatcher_frontend_1 | npm ERR! npm run build
dispatcher_frontend_1 | npm ERR! You can get information on how to open an issue for this project with:
dispatcher_frontend_1 | npm ERR! npm bugs zimfarm
dispatcher_frontend_1 | npm ERR! Or if that isn't available, you can get their info via:
dispatcher_frontend_1 | npm ERR! npm owner ls zimfarm
dispatcher_frontend_1 | npm ERR! There is likely additional logging output above.
dispatcher_frontend_1 |
dispatcher_frontend_1 | npm ERR! Please include the following file with any support request:
dispatcher_frontend_1 | npm ERR! /root/.npm/_logs/2017-06-18T09_41_51_781Z-debug.log
zimfarm_dispatcher_frontend_1 exited with code 2
worker_1 | [2017-06-18 09:41:52,757: ERROR/MainProcess] consumer: Cannot connect to amqp://admin:**@rabbit:5672//: [Errno 111] Connection refused.
worker_1 | Trying again in 4.00 seconds...
worker_1 |
worker_1 | [2017-06-18 09:41:56,973: INFO/MainProcess] Connected to amqp://admin:**@rabbit:5672//
worker_1 | [2017-06-18 09:41:56,988: INFO/MainProcess] mingle: searching for neighbors
worker_1 | [2017-06-18 09:41:58,059: INFO/MainProcess] mingle: all alone
worker_1 | [2017-06-18 09:41:58,099: INFO/MainProcess] celery@f174068cccc8 ready.
Our current worker implementation is to run a shell script (examples), which seems to execute mwoffliner
multiple times and generate multiple zim files. In this way, every celery task could generate multiple zim files.
I think, instead of doing this, it would be better to runmwoffliner
once, generate one zim file per celery task. Reasons:
mwoffliner
in dispatcher and assemble the command programmatically on the worker.stdout
& stderr
contains messages regarding one zim file generationZIM files from the warehouse will be checked (for integrity) and transfered to their final place to be available at http://download.kiwix.org. But, to do that properly, the system needs to know if a file has been properly transfered, ie. completed.
I'm learning about ZIM and related efforts, and came across this repo, but am having trouble wrapping my head around what ZIM farm is about. I realize that my difficulty is entirely because I don't have sufficient background knowledge in this particular effort (or ZIM in general, for that matter); my point in bringing this up here is that I think I may not be alone, and adding some additional explanations may help other people in the future.
The front README has the following brief explanation:
A farm operated by bots to grow and harvest new zim files. User can submit a new zim file generate task through the website and a registered worker will run the task and upload the file back to the dispatcher.
But to an outsider such as me, there is not enough context to understand this summary. For example, when it says "harvest new zim files", where are those files being harvested from? When it says "upload the file back to the dispatcher", what is the dispatcher? What is its role? (It's not even clear whether it's part of ZIM farm.) And finally, why would one want to harvest ZIM files in the first place?
A section on background or a longer introduction might clear up these questions for me and others.
In any case, thank you for your efforts on ZIM and the associated infrastructure.
items needed:
This is useless as all the necessary Dockerfiles are already in the subdirectories (depending of their respective duties).
A minor suggestion – not urgent and not a complaint – but future contributors may benefit from the addition of a CONTRIBUTING.md file at the top level of the repo. The file could describe the process(es) for how people can contribute to the project. (Examples of contributors' guidelines are plentiful on GitHub; I'm familiar with EDGI's but there are other examples.)
Update API POST tasks/
to be compatible with new schedule object design.
Add tests for this API
Task needs to be recovered / rerun in the following cases:
$ docker logs -f zimfarm_worker
[2019-02-02 14:27:24,214: INFO] Starting Zimfarm Worker...
[2019-02-02 14:27:24,621: INFO] ENV USERNAME -- kelson
[2019-02-02 14:27:24,623: INFO] ENV DISPATCHER_HOST -- farm.openzim.org
[2019-02-02 14:27:24,624: INFO] ENV RABBIT_PORT -- 5671
[2019-02-02 14:27:24,626: INFO] ENV WAREHOUSE_HOST -- warehouse.farm.openzim.org
[2019-02-02 14:27:24,627: INFO] ENV WAREHOUSE_PORT -- 1522
[2019-02-02 14:27:24,629: INFO] ENV WORKING_DIR -- /data/zimfarm/tmp
[2019-02-02 14:27:24,631: INFO] ENV NODE_NAME -- kelson_worker
[2019-02-02 14:27:24,632: INFO] ENV QUEUES -- small
[2019-02-02 14:27:26,111: ERROR] SFTP auth check failed -- please double check your username and private key.
[2019-02-02 14:46:04,366: INFO] Starting Zimfarm Worker...
[2019-02-02 14:46:04,774: INFO] ENV USERNAME -- kelson
[2019-02-02 14:46:04,775: INFO] ENV DISPATCHER_HOST -- farm.openzim.org
[2019-02-02 14:46:04,776: INFO] ENV RABBIT_PORT -- 5671
[2019-02-02 14:46:04,777: INFO] ENV WAREHOUSE_HOST -- warehouse.farm.openzim.org
[2019-02-02 14:46:04,777: INFO] ENV WAREHOUSE_PORT -- 1522
[2019-02-02 14:46:04,778: INFO] ENV WORKING_DIR -- /data/zimfarm/tmp
[2019-02-02 14:46:04,779: INFO] ENV NODE_NAME -- kelson_worker
[2019-02-02 14:46:04,780: INFO] ENV QUEUES -- small
[2019-02-02 14:46:05,334: INFO] SFTP auth check success.
/usr/local/lib/python3.6/site-packages/celery/platforms.py:796: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!
Please specify a different user using the --uid option.
User information: uid=0 euid=0 gid=0 egid=0
uid=uid, euid=euid, gid=gid, egid=egid,
[2019-02-02 14:46:06,434: INFO/MainProcess] Connected to amqps://kelson:**@farm.openzim.org:5671/zimfarm
[2019-02-02 14:46:06,743: INFO/MainProcess] mingle: searching for neighbors
[2019-02-02 14:46:08,360: INFO/MainProcess] mingle: all alone
[2019-02-02 14:46:08,994: INFO/MainProcess] kelson@kelson_worker ready.
[2019-02-02 14:46:09,000: INFO/MainProcess] Received task: offliner.mwoffliner[5c5129548c127b00217cb4af]
[2019-02-02 14:46:09,010: INFO/MainProcess] Received task: offliner.mwoffliner[5c5129548c127b00217cb4b2]
[2019-02-02 14:46:09,158: INFO/MainProcess] Received task: offliner.mwoffliner[5c5129558c127b00217cb4b5]
[2019-02-02 14:46:09,169: INFO/MainProcess] Received task: offliner.mwoffliner[5c5129558c127b00217cb4b8]
/usr/local/lib/python3.6/site-packages/celery/platforms.py:796: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!
Please specify a different user using the --uid option.
User information: uid=0 euid=0 gid=0 egid=0
uid=uid, euid=euid, gid=gid, egid=egid,
[2019-02-02 14:46:25,858: ERROR/ForkPoolWorker-2] offliner.mwoffliner[5c5129548c127b00217cb4af]: task failed
[2019-02-02 14:46:25,861: ERROR/ForkPoolWorker-2] Task offliner.mwoffliner[5c5129548c127b00217cb4af] raised unexpected: APIError(HTTPError('409 Client Error: Conflict for url: http+docker://localhost/v1.35/containers/create?name=zimfarm_redis',),)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 256, in _raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: http+docker://localhost/v1.35/containers/create?name=zimfarm_redis
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/src/app/tasks/mwoffliner.py", line 41, in run
run_redis.execute()
File "/usr/src/app/operations/run_redis.py", line 31, in execute
self.docker.containers.run('redis', detach=True, name=self.container_name)
File "/usr/local/lib/python3.6/site-packages/docker/models/containers.py", line 785, in run
detach=detach, **kwargs)
File "/usr/local/lib/python3.6/site-packages/docker/models/containers.py", line 843, in create
resp = self.client.api.create_container(**create_kwargs)
File "/usr/local/lib/python3.6/site-packages/docker/api/container.py", line 427, in create_container
return self.create_container_from_config(config, name)
File "/usr/local/lib/python3.6/site-packages/docker/api/container.py", line 438, in create_container_from_config
return self._result(res, True)
File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 262, in _result
self._raise_for_status(response)
File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 258, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/local/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 409 Client Error: Conflict ("Conflict. The container name "/zimfarm_redis" is already in use by container "e85ce7f7fe3d000f4824879405268e63c846af7d47a325e4655f17659d22e926". You have to remove (or rename) that container to be able to reuse that name.")
[2019-02-02 14:46:25,935: INFO/MainProcess] Received task: offliner.mwoffliner[5c5129558c127b00217cb4bb]
[2019-02-02 14:46:27,667: ERROR/ForkPoolWorker-2] offliner.mwoffliner[5c5129558c127b00217cb4b5]: task failed
[2019-02-02 14:46:27,675: ERROR/ForkPoolWorker-2] Task offliner.mwoffliner[5c5129558c127b00217cb4b5] raised unexpected: APIError(HTTPError('409 Client Error: Conflict for url: http+docker://localhost/v1.35/containers/create?name=zimfarm_redis',),)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 256, in _raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: http+docker://localhost/v1.35/containers/create?name=zimfarm_redis
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/src/app/tasks/mwoffliner.py", line 41, in run
run_redis.execute()
File "/usr/src/app/operations/run_redis.py", line 31, in execute
self.docker.containers.run('redis', detach=True, name=self.container_name)
File "/usr/local/lib/python3.6/site-packages/docker/models/containers.py", line 785, in run
detach=detach, **kwargs)
File "/usr/local/lib/python3.6/site-packages/docker/models/containers.py", line 843, in create
resp = self.client.api.create_container(**create_kwargs)
File "/usr/local/lib/python3.6/site-packages/docker/api/container.py", line 427, in create_container
return self.create_container_from_config(config, name)
File "/usr/local/lib/python3.6/site-packages/docker/api/container.py", line 438, in create_container_from_config
return self._result(res, True)
File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 262, in _result
self._raise_for_status(response)
File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 258, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/local/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 409 Client Error: Conflict ("Conflict. The container name "/zimfarm_redis" is already in use by container "e85ce7f7fe3d000f4824879405268e63c846af7d47a325e4655f17659d22e926". You have to remove (or rename) that container to be able to reuse that name.")
[2019-02-02 14:46:27,748: INFO/MainProcess] Received task: offliner.mwoffliner[5c5129558c127b00217cb4be]
/usr/local/lib/python3.6/site-packages/celery/platforms.py:796: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!
Please specify a different user using the --uid option.
User information: uid=0 euid=0 gid=0 egid=0
uid=uid, euid=euid, gid=gid, egid=egid,
Data are here https://world.openfoodfacts.org/data
As the workers are thought to not run on the same machine as the dispatcher, all generated ZIM files need to be gathered somewhere. That is why we need to be able to upload the files from a worker machine to a dedicated place.
This upload process should match the following requirements:
Different kind of zim file generation requires different processing power. For large zim files, it might be better to generate them using a more powerful machine. It would be a good idea to add support for queues.
When scheduling a task, user can choose which queue the task go to. Based on the file size, we might have the following queues: huge, large, medium, small, tiny, corresponding to 1000000+, 100000+, 10000+, 1000+, 100+ articles in the zimfile.
When a worker starts, user can choose to join one or more queue.
For the purpose of fast development, I did not use try catch in response APIs. So they are not protected against invalid input or exceptions like KeyError
. To make the APIs resilient, all exceptions should be handled.
The reason is that otherwise the last step of renaming the zim.tmp
file to .zim
will fail. In the case this happens an error should be generated in the zimfarm-worker and the job should be finished (and reschedule in a month - if we have a monthly scheduling).
From @Popolechien on November 4, 2018 9:20
I'm looking at http://library.kiwix.org/granbluefantasy_en_all_all_nopic_2018-10/ (the last release of Granblue fantasy wiki) and it is obvious that a bunch of things are broken, rendering the file unusable and a waste of data/download time for users.
It seems to be a rather recent addition to the library, so can we think of some simple confirm/vetting process (a.k.a. Quality control) before adding new zims?
Copied from original issue: openzim/mwoffliner#422
If for some reason, an upload can not be completed then the file .tmp
will stay forever in the warehouse. This should be avoided. I would recommend to just implement a deletion in the cron of all .tmp
files older than 2 weeks.
Newer version use foobar.zim as temporary file to write the ZIM file. This is a regression in comparison to the previous version and does not allow anymore to know if a ZIM file is ready or not. foobar.zim should be named foobar.zim.tmp as long as not everything is over with its creation. Then it should be renamed.
Regarding how workers should be designed, there are two approaches, generic and specialized. And we need to make a decision to go forward.
Note: the work task used in this issue is referring to things like mwoffliner
and maintenance
Dispatcher send name of script to worker, worker download the script and execute it. Or dispatcher directly send content of the script to worker. In both situation, worker trust the script it receives is legit and executes it.
Every worker is specialized, i.e., one type of worker can and can only handle zim file generation, another type of worker can and can only handle maintenance task in dispatcher. Only parameters / settings are transferred from dispatcher to worker. Worker then do the necessary work. In the case of mwoffliner
it will be to make sure redis server running, run mwoffliner
command with parameters, upload the file.
Worker user should be explained in the doc:
I don't see that here https://hub.docker.com/r/openzim/zimfarm-worker
It might be a good idea to add a server side event listener, instead of having worker constantly sending status updates to the server.
This is important for transparency and backup.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.