Comments (3)
Btw, this could be an elegant solution to fix the fact that all dumps currently stored on S3 are still from April 2 and "corrupted"
from zimfarm.
So it's clear for everybody, the watcher didn't pick up the fixed version because we identify a release as a YYYY-MM string given updates are expected every quarter.
This string is computed from the Last-Modified
header of the Sites.xml file and compared to the one we stored in S3.
An update within the same month cannot be detected this way.
from zimfarm.
Here the dumps have been updated but the Sites.xml
has not been updated at-all, so current watcher behavior is not aligned (anymore) with SE practices.
I suggest to:
- stop checking the version of "Sites.xml" (we do not care about it indeed)
- check the
Last-Modified
header of every individual archive - store the whole
Last-Modified
header in S3 metadata (instead of the simplified version) - use this whole
Last-Modified
header for taking the decision to download the archive again
from zimfarm.
Related Issues (20)
- Clone doesn't set disabled anymore HOT 2
- Jobs that are doing should also list requester (as in the Todo tab)
- Add warning when editor is changing a recipe warehouse path HOT 2
- HTTP 500 error linked to invalid ipv6 while removing secrets
- Again \u0000 cannot be converted to text
- Add capability to recreate a ZIM with updated metadata
- Add concept of scraper release channels HOT 1
- Simplify warehouse list HOT 4
- Zimfarm worker: add support for multiple outgoing IPs HOT 4
- Seems impossible to specify a tag HOT 5
- Fix sotoki scraper configuration
- Upload logs and artifacts even when task is cancelled
- Name resolution errors in StackExchange watcher
- Configuration of Zimfarm local instance HOT 1
- Installing monitoring of a task is failing HOT 2
- Workers: report real disk size (and CPU / RAM) HOT 6
- StackExchanger watcher regularly restarts after connection reset by peer errors HOT 1
- Is the worker incorrectly checking the space already consumed by the Zimfarm? HOT 8
- Collect zimfarm usage statistics
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zimfarm.