Comments (18)
I might have a simple basic idea is to create a docker image/container which simply handles the uploads (on the server side).
The technology of file transfer might be discussed, but basically it's a server which allows to upload a file which has to be able to do two things:
- Take uploads
- Check user/pass against zimfarm API.
This could be quite easy to do that with the FTP protocoll with for example:
- using twisted https://twistedmatrix.com/documents/current/_downloads/ftpserver.py OR
- might be even easier with pyftpdlib, see this example https://pyftpdlib.readthedocs.io/en/latest/tutorial.html
from zimfarm.
The problem with FTP (same with rsync) is the first time worker uploads, a fingerprint verification prompt will appear
from zimfarm.
@automactic FTP is user/pass based, nothing to do with any crypto proof or fingerprint. You just need the right user/pass.
from zimfarm.
Mmm... yes you don't need to verify key with FTP.
I don't know if WebDAV could be a good option.
from zimfarm.
@automactic I don't really care about the protocoll as long as it work. WebDAV might work but it's more risky than FTP and I'm not sure you will get such an easy solution like you might have for FTP with twisted or pyftpdlib. In addition they are for more less WebDAV clients than FPT clients.
from zimfarm.
Yes I will try to do this with pyftpdlib, looks promising. Since FTP does not encrypt password and everyone can see it. But if we use FTPS I believe the credentials should also be encrypted.
from zimfarm.
@automactic good... I would recommend to somehow use password with a short life time (maybe only one usage?). We can do that as the dispatcher distribute them and do the check.
from zimfarm.
It could also be a good idea to have the worker request a file transfer token from dispatcher and send this token to ftp server when uploading the file. Upon receiving the token, the ftp server verify it with dispatcher.
This way, the worker only need to transfer the password once (with dispatcher) during its lifetime. But user also cannot login to the ftp server with other tools (since users cannot see the token)
from zimfarm.
Due to some reasons I cannot connect to the ftp server once they are in the container, but it works fine outside the container. The error log is below
INFO:pyftpdlib:196.52.2.24:59014-[] FTP session opened (connect)
DEBUG:pyftpdlib:196.52.2.24:59014-[] -> 220 Welcome to Zimfarm Warehouse.
DEBUG:pyftpdlib:196.52.2.24:59014-[] <- USER admin
DEBUG:pyftpdlib:196.52.2.24:59014-[] -> 331 Username ok, send password.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- PASS ******
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 230 Hi, there!
INFO:pyftpdlib:196.52.2.24:59014-[admin] USER 'admin' logged in.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- FEAT
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 211 End FEAT.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- OPTS UTF8 ON
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 501 Invalid argument.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- SYST
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 215 UNIX Type: L8
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- PWD
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 257 "/" is the current directory.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- CWD /
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 250 "/" is the current directory.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- TYPE A
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 200 Type set to: ASCII.
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] <- PASV
DEBUG:pyftpdlib:196.52.2.24:59014-[admin] -> 227 Entering passive mode (172,17,0,3,200,212).
DEBUG:pyftpdlib:[debug] call: close() (<FTPHandler(id=140131051023048, addr='196.52.2.24:59014', user='admin')>)
DEBUG:pyftpdlib:[debug] call: close() (<pyftpdlib.handlers.PassiveDTP listening 172.17.0.3:0 at 0x7f72cd171710>)
INFO:pyftpdlib:196.52.2.24:59014-[admin] FTP session closed (disconnect).
Any ideas?
from zimfarm.
@automactic I fully support #38 (comment). Regarding the transfer problem, you probably do not support the FTP passiv mode properly. Look a bit how FTP passiv works (you need basically to open other ports).
from zimfarm.
Here is for example an example fix stilliard/docker-pure-ftpd@da4dee5
from zimfarm.
I took a further look into the issue and read more about passive FTP. It looks like after receiving PASV
, the server is returning 227 Entering passive mode (172,17,0,3,109,116).
The problem is when worker is uploading files, the data transfer happens on 172.17.0.0.3:28020
, which is internal to docker network. We need a way to have the FTP server return the correct address.
from zimfarm.
I need to set masquerade_address
on the FTP server
from zimfarm.
@automactic OK, so does it work ?
from zimfarm.
Yes it does, but when you need to run ftp server you need to set external ip in environment
from zimfarm.
@ dispatcher/warehouse should be moved to /. Considering that the warehouse does not probably run on the same serveur and that in any case is not depending from the dispatcher, it should not be part of it IMO.
from zimfarm.
@automactic Could you also merge your code to master so we can close that ticket?
from zimfarm.
The plain FTP server approach discussed in this issue is already implemented.
from zimfarm.
Related Issues (20)
- Review all input validations HOT 1
- /schedules/backup/ include `most_recent_task`
- Add "Tyap" language to the language list HOT 3
- Deleting wikipedia_ak_all seems to fail HOT 2
- Two times "Azerbaijani" in the recipe language list HOT 1
- Add new languages for recently-created Wikipedias HOT 2
- Introduce `--customZimLanguage` support in MWoffliner recipes HOT 1
- Illustration seems not always retrieved properly HOT 3
- Task history not sorted HOT 2
- Set nautilus collection param as secret in offliner
- Never totally delete recipes HOT 3
- Zimfarm at youzim.it doesn't show schedule names HOT 10
- Fix `_id` sample value in OpenAPI documentation
- Secrets are not hidden properly in API responses
- Secrets are still not hided properly
- Add freecodecamp support HOT 4
- Add support for `--long-description` parameter for kolibri
- Include maintenance scripts in the API docker image
- Support text-area fields + maxlength on text fields (input/textarea)
- Automatically deploy `main` branch in production HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zimfarm.