Comments (7)
#44 is trying to solve this issue. You can have queues like small, medium and large. And then a worker can say only tune in to small and medium queues.
However, I do not think it is possible to do it based on file size. Also how would you know about file size before a zim file is generated?
from zimfarm.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
from zimfarm.
@rgaudin This was the first requirement about dealing with different kind of requirements for the scrapers. Multiple queue was a first answer but we now deal with other problems (priorities). To my opinion my first proposal, if implementable, would have avoid the kind of problem we have now.
from zimfarm.
OK, I understand now how we got there. It reinforces that run-time server+client sides knowledge requirement for task assignment that I described in #202.
While we can add a way to provide that task-required size in the DB, we should know how we're gonna get that information. It can (and should as most wikis grow) be overestimated but if we're going to compare two numbers to decide whether a task can be ran, we need them both.
How do you see us estimating those sizes?
from zimfarm.
How do you see us estimating those sizes?
It would be an heuristic, manual or automated.
from zimfarm.
Sure but doesn't help much. We're gonna have to start somewhere with a basic algorithm and adjust over time while allowing manual input for those we have more info on and where it matters (largest ones).
- we have ~1,200 schedules
- all of them have a
small
,medium
orlarge
info (very approximate, will have to be refined over time) - we don't inspect disk usage and so don't report it during our task life cycle. should we?
- we don't have direct access to number of articles but could get that info, if of value. Is it?
- how does format and formats combination influence the disk usage?
from zimfarm.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
from zimfarm.
Related Issues (20)
- Review all input validations HOT 1
- /schedules/backup/ include `most_recent_task`
- Add "Tyap" language to the language list HOT 3
- Deleting wikipedia_ak_all seems to fail HOT 2
- Two times "Azerbaijani" in the recipe language list HOT 1
- Add new languages for recently-created Wikipedias HOT 2
- Introduce `--customZimLanguage` support in MWoffliner recipes HOT 1
- Illustration seems not always retrieved properly HOT 3
- Task history not sorted HOT 2
- Set nautilus collection param as secret in offliner
- Never totally delete recipes HOT 3
- Zimfarm at youzim.it doesn't show schedule names HOT 10
- Fix `_id` sample value in OpenAPI documentation
- Secrets are not hidden properly in API responses
- Secrets are still not hided properly
- Add freecodecamp support HOT 4
- Add support for `--long-description` parameter for kolibri
- Include maintenance scripts in the API docker image
- Support text-area fields + maxlength on text fields (input/textarea)
- Automatically deploy `main` branch in production HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zimfarm.