Comments (33)
You are right, testing is a must! I have zero experience with GitHub action runners but if there is a corresponding ssh ipcluster test for Linux I'm 100% sure that the same thing is achievable for Windows. Maybe the existing test could be simply extended by a Windows instance...
By the way, I'm also happy to maintain this feature since I will use it quite often in the future…
from ipyparallel.
The ssh key is extracted from the running container with docker cp
here
from ipyparallel.
That would be great, thanks for offering! Without any ability to test, it's likely to end up unmaintained unless you or someone with access to Windows + SSH can volunteer to keep it maintained, because I can't do it.
Do you think you can install Windows SSH on the GitHub actions runner?
from ipyparallel.
We have Windows tests already (the runs-on
field picks the OS), but I imagine there may be some additional installation or configuration necessary to make openssh available.
So you can probably start with a PR that duplicates the ssh test run to also run on Windows, then you can try figuring out what needs to be set up. Currently, the SSH tests use docker-compose to launch the cluster, which I guess won't work for testing Windows.
from ipyparallel.
I will try my best, but as mentioned before I'm new to GitHub action runners. Let's see how far I will get...
Meanwhile we should start the discussion on how to technically supporting different OS and shells (It might be that ipcluster currently doesn't fully work on exotic Linux shells. Furthermore, under Windows you can configure OpenSSH to use cmd.exe or powershell.exe which have totally different syntax). As mentioned in my first post, the differences can be addressed by a) the 'client' side via python scripts or b) on the 'server' side by determining the client OS/shell.
ad a)
pros: +unified way, less OS dependent code
cons: - it might be difficult/impossible to find a unify way of correctly single and/or double quoted parameter for all OS
- requires correctly configured/installed python from the very beginning, which could make trouble-shooting difficult
ad b)
pros: +free of prerequisite
+can support any OS and shells (correctly quoting parameter is possible)
cons: -more OS dependent code required
-storage of OS and shell type is required. Adaption of internal data structure required
Is it necessary to have a broader discussion on this topic?
from ipyparallel.
If we want to support Windows, the way to go is probably to implement as much as possible in Python (perhaps move all this to ipyparallel.cluster.ssh
), such that the only command-line operations to execute are really /path/to/python.exe -m ipyparallel.cluster.ssh [operation]
. That should limit the scope of string quoting/etc. we need to deal with.
from ipyparallel.
Sounds good to me and I will go in that direction.
Meanwhile I have been working hard to get the Windows SSH cluster to run Docker/Github (similar to the Linux version - also with user ciuser). It took me some days, but I think I'm there now. I do the testing in my forked repository (if you want to have a look...).
I have to admit that Windows tries its best to NOT let you run its OpenSSH server under Docker. Luckily I was able to get around all the issues...
However, there is one thing in the Linux ssh cluster I do not understand: In the docker container the ssh key pair is generated. But I'm not able to pin point the location where the private key is copied from the docker container to the main runner VM (is then used by the ipcluster_config.py - https://github.com/ipython/ipyparallel/blob/main/ci/ssh/ipcluster_config.py)? Can you give me a hint...
I have a solution for Windows, but I want to sick to the Linux solution as close as possible...
from ipyparallel.
Thanks very much for the hint!
As mentioned in the code, this should be moved to the corresponding Dockerfile or "next" to the docker compose statement. I suggest the later one, since we need something similar for windows. Docker in "Windows mode" doesn't support 'docker cp' at all (one really annoying thing...). So I manually copy the ipyparallel files to the docker container using scp (the other option would be using git inside the docker container itself, but installing git under the currently used windows image (no .net framework) is not so straight forward).
My point is that we require a slightly different ssh section in test.yml for windows and linux anyway. So we could probably remove all docker command from conftest.py. By the way, what's the point of ssh_dir? Returning the local ci/ssh directory but also checking if the docker container is running. Two totally different thing in one function? A bit confusing....
I suggest to split that up into two functions or maybe moving the actual ssh_dir code directly in ssh_key (I havn't found any other reference of ssh_dir)
Since my ssh docker test under windows is now fully working, I will proceed with the actual implementation of ipyparallel.cluster.ssh...
from ipyparallel.
Feel free to reorganize it in a way that makes sense. The main goal is that the ssh tests should be skipped if the ssh context can't be set up. Originally, I wanted the fixture to call docker-compose build/up/etc., but that wasn't working for whatever reason.
from ipyparallel.
Today, I spend quite some time in understanding how the parameter setting and actual function calling work in the app classes like IPCluster, etc.
Since we decided to go for the python solution - something like
/path/to/python.exe -m ipyparallel.cluster.ssh [operation] [--param1 value1] ...
I thought of applying the existing application/parameter concepts. But since I don’t fully understand the concepts yet (including all consequences), it’s hard for me to tell if the existing ‘ipython’ concepts are an overkill and I should rather go for a simple ‘argparse’ implementation.
What do you suggest?
If we go for the ipython concept which base class for the “ssh app” should be used? Traitlets.config.application.Application, IPython.core.application.BaseIPythonApplication, …
from ipyparallel.
I think a simple cli entrypoint with argparse makes the most sense. That ought to be quicker to load. If we find that we have some server-side configuration to do, we should switch it to a traitlets app to be consistent with config loading.
from ipyparallel.
Thanks for the quick reply. So I will implement an argparse solution...
from ipyparallel.
I'm getting closer to a fully working implementation. But before actually replacing shell code in ipyparallel/cluster/launcher.py I may ask for your feedback regarding the proposed implementation:
I changed to name to ipyparallel/cluster/shellcmd because, it allows sending commands to any shell. Hence, it’s not limited to ssh which makes it very versatile and also easy test. I have already established a corresponding test that secures that code works with all possible shells. I haven’t tested the code on linux yet, but I do not expect any surprises since it works under wsl in windows.
The actual implementation consists of two classes:
ShellCommandSend
and ShellCommandReceive
which have matching command functions: cmd_start
, cmd_running
, cmd_kill
, cmd_mkdir
, cmd_rmdir
, cmd_exists
and cmd_remove
. So one can use that nice python API and there is no need to mess with subprocess calls. The object internally build the actual shellcmd call (e.g. python –m ipyparallel.cluster.shellcmd mkdir test_dir
) and sends it to the corresponding shell. On the ‘other side’, the main function of shellcmd.py receives the command and calls the corresponding member function of ShellCommandReceive
which then actual executes the command. This way, it’s straight forward to add additional parameters or further commands.
During testing I discovered that calling the ipyparallel.cluster.shellcmd
module takes quite some time. In case of ssh it is necessary that the remote version of shellcmd.py also matches the local version (not so relevant for a final distribution but a bit annoying during development/debugging). Hence I implemented an option that sends the ShellCommandReceive
code (through stdin pipe) to the other python instance. This is approx. 10x times faster than calling the ipyparallel.cluster.shellcmd
module and also removes the necessity of the matching file versions. Currently, both options are implemented (see flag use_code_sending
) but for simplicity we should eventually stick to one solution only. Although performance is probably not so important (at least for starting up a cluster) I tend to use the code sending method. What’s your option on that?
Final question: For establishing an SSH IPCluster the ssh infrastructure needs to be set up for all computers involved. Furthermore, python and ipyparallel package is required in advance. But python and ipyparallel doesn’t have to be a prerequisite, if IPCluster would support an (engine) initialisation step based on a shell scripts. IPCluster could even support different scripts for different systems (ShellCommandSend
knowns the system on the other side). This would be a specific nice feature for windows, since it doesn’t come with python per default. What do you think?
from ipyparallel.
I have been working work hard to get the shellcmd functionality working on all platforms and shells. Only relying on the Github runners was way to ineffective (much slower compared to local setup and hard/impossible to analyze errors/problems). I have now two full development environments: one on Windows and one on Linux (Ubuntu). Although the setup took some while, I have now full debugging functionality on both system, which was well worth.
As mentioned before the functionality is split into two classes: ShellCommandSend
and ShellCommandReceive
with matching member functions. However, communicating from the sender to the receiver over ‘normal’ cli calls is really a nightmare. As long as you have only simple ‘words’ as parameters things are easy, but in case of complex parameters (which incl single, double quotes, white spaces, etc.) things are getting really nasty. Windows cmd.exe, Windows powershell, ssh/bash behave all slightly different and we need to support all shells since OpenSSH can use cmd and powershell on Windows.
Things are much simpler if we send the command via python. Similar as e.g. the existing ssh_waitpid function. In that sense we won’t need (or at least won’t use) a cli interface and we can avoid the quoting horror. This makes the code more compact, easier to understand and maintain. If you agree on that I would change the code accordingly.
The pytest test_ssh
now works smoothly on my Windows and Linux machines with one exception: The SSHProxy (-> SSHProxyEngineSetLauncher) tests do not work under Windows. I tried to understand/debug the problem, but it difficult, since I do not fully understand the communication of objects (yet). I’m sure it’s not a problem of my changes since I was able to pass the tests (multiple times) by debugging/stepping through the code. Hence, I guess there is some sort of synchronization problem on Windows. Will have a look later again, but maybe it’s quicker that somebody with good knowledge of the system tries solves this issue.
from ipyparallel.
Thanks for working on this!
I don't have time to get into too much detail, but I will say that piping code to Python is a good idea, rather than importing an entrypoint on the other side. That will be a lot more reliable, I think. It does put small limitations on the far side of what code is acceptable, but since it's meant to be very simple RPC shell commands, that shouldn't be a problem. So if you send a Python script to be executed via stdin which listens for commands on lines, you should be able to send simple commands on lines, e.g. serialized as JSON for reasonable behavior without platform-sensitive command-line parsing.
BTW, feel free to do your work in a draft pull request. That will make conversations about specific code easier, and there's no need to wait for things to be 'complete' before you open one.
from ipyparallel.
Thanks very much for your quick reply.
Ok, then I will definitely go for the piping method. I will clean up my code a bit and send a draft pull request in the next days... :-)
from ipyparallel.
I havn't recieved any feedback on my PR yet. Is this because of viewers lack of time or because th PR is not fully working under Github Runners yet? or some other reason? Please get back to me and drop me a short note...
from ipyparallel.
Sorry, it's just because I've been busy. I clicked the 'approve and run' button, so it should be running on CI now. I will try to get to it when I can. Thank you so much for doing the work! I'll post a review on the PR.
from ipyparallel.
Thanks and I didn't want to stress you. Since I have no experince with PR I wasn't sure if I did all correctly. By the way, the pre-commit.ci run shows some error. How can I fixe those (output somehow truncated)? Is there a way to run this locally? Thanks!
from ipyparallel.
Yes, you can install pre-commit:
pip install pre-commit
and run
pre-commit install
in the repo (many repos use this tool). Then every time you commit, these checks will be run on the changes about to be committed. You can also run the hooks on already-committed and/or unstaged files with
pre-commit run --all-files
from ipyparallel.
Thanks very much for the pre-commit explanations!
Finally, I almost got the ssh tests with the Windows github runner to work - see my last SSH-Test run. The last ‘big’ obstacle was a file name issue of the engine logs (forward/backward slashes), which is why I never got the engine logs retrieved. Since it worked in my local environment (no idea why the different slashes work on my machine but not in the github runner), I thought for a long time that the engines never got started. But this wasn't the problem...
Now I get the engine logs and can see the final problem: The engines try to register against an incorrect controller IP. At the end of Set up docker-compose for ssh windows launcher step I output the network addresses (ipconfig):
Ethernet adapter Ethernet 2: 10.1.106.1
Ethernet adapter vEthernet (nat): 172.22.160.1
Ethernet adapter vEthernet (a8fa81d95af0738): 172.27.144.1
But the engines try to register against 10.1.54.0:
2023-10-26 16:18:55.413 [IPEngine] Registering with controller at tcp://10.1.54.0:50868
2023-10-26 16:19:00.423 [IPEngine] CRITICAL | Registration timed out after 5.0 seconds
How do the engines determine the controller ip? Can this somehow be controlled?
Under linux things work correctly:
ifconfig:
r-9da2dc09472a: 172.18.0.1
docker0: 172.17.0.1
eth0: 10.1.0.49
log:
2023-10-26 15:50:50.561 [IPEngine] Registering with controller at tcp://10.1.0.49:35785
from ipyparallel.
How do the engines determine the controller ip? Can this somehow be controlled?
There are two meanings of 'ip'. One is the 'bind' ip address (likely 0.0.0.0 for ssh), set with --ip
or --engine-ip
, and one is the 'connect' ip, used only when bind is ambiguous (i.e. 0.0.0.0 or 127.0.0.1). This latter one is set with 'location':
ipcontroller --ip 0.0.0.0 --location 10.1.54.0
or
ipcontroller --ip 0.0.0.0 --location 'controller-container-name'
If they should differ, bind ips can be set separately for engines and clients with --engine-ip
, --client-ip
, etc.
The default for the location is to use socket.gethostname()
, which usually gets the container name or id. In containers, hostnames are generally preferred to ips, since connect ips are often not knowable inside the container. Hostnames may not work if the client is actually on the host system, which doesn't get DNS resolution for containers, unlike within other containers.
from ipyparallel.
Thanks again for the infos. By adding a corresponding entry to the container hosts file (of course the docker add host command does NOT support windows...), I was able to 'place' the correct ip address. Although I tried different IP addresses of the different host network adapters, I never got the connection to work. The IPEngine always reports a 'registration timed out'....
I guess the github runner windows firewall blocks the network traffic. This is all so annoying, since there is no issue on a local machine...
I hope I can find a solution for this again
from ipyparallel.
YEEESSS. Turning off the windows firewall was the final missing link. As you can see here, the ssh tests did work (persistence paid off eventually)
As mentioned before, the SSHProxy test do not work yet. This is a 'real' issue since the behaviour is identical on my local machines.
I will update my PR next week. So stay tuned!
from ipyparallel.
Finally, everything is working and code, github workflows, etc. is cleaned up and (hopefully) properly comment. So from my side the PR is ready "to go..."
from ipyparallel.
I do want to be annoying, but did you have time to look at the pull request? If you have any suggests for improvement/changes I'm happy to adapt the code. I'm also willing to change to documentation regarding the ssh windows support. But I would like to finish this work as soon as possible. Thanks
from ipyparallel.
thanks for your feeback on the PR. I was busy this week, but I can hopefully work on the code next week again
from ipyparallel.
My apologies for not working on the PR for such a long time. There were so many urgent issues in the last 6 month that I never had a few days in a row to finish up the PR. But now I’m back!
After resyncing with the main project, I now have an issue when installing ipyparallel in the docker container for the ssh test. In my local container everything works perfectly fine, but NOT in the github windows runner. The main issue is that the editable pip installation of ipyparallel fails within the container. I have only basic knowledge about the pip installation process and JupyterLab, so I don’t understand the problem. Could you point me into the right direction?
Please have a look at the end of step Set up docker-compose for ssh windows launcher (not the failed ssh test step ) of
this github runner test. There, you will find messages like
lab/src/clusters.tsx(99,33): error TS2344: Type 'import("C:/src/ipyparallel/node_modules/@lumino/widgets/types/widget").Widget' does not satisfy the constraint 'import("C:/src/ipyparallel/node_modules/@jupyterlab/apputils/node_modules/@lumino/widgets/types/widget").Widget'. [...]
Any idea? As mentioned before is must be related to the github windows runner (no issue in a local docker container…)
Thanks & Cheers
from ipyparallel.
Just a small update: Found the docker-compose issue. Somehow the environment variable IPP_DISABLE_JS=1 is not fully propergated into the docker container. Very strange since a simple
echo IPP_DISABLE_JS=$env:IPP_DISABLE_JS
command tells me that it's set. Nevertheless I'm sure that I will find a work-a-round...
from ipyparallel.
Great! No rush, and yeah it is definitely fine to set IPP_DISABLE_JS in the containers for testing this, I'm not sure why it's not propagating, but setting it explicitly ought to work.
from ipyparallel.
Thanks for your reply!
Setting the environment variable using powershell commands resolved the issue. See
https://github.com/ottointhesky/ipyparallel/blob/0a0efb02be1da21c224f9e805159c13809902e4f/ci/ssh/win_Dockerfile_template#L9-L13
and
https://github.com/ottointhesky/ipyparallel/blob/0a0efb02be1da21c224f9e805159c13809902e4f/ci/ssh/win_Dockerfile_template#L23C1-L23C112
In my PR pre-commit.ci ruff complains on a lot of files, which I havn't touched. Is this a problem or can I simply ignore that?
I have some more questions regarding your code suggestions: I have replied to about half of them already. Is it ok that way I did that or do you expect a separate commit for each comment? Should I resolve the conversation if I incorporate your suggestions or do you always want to do that?
Thanks & cheers
from ipyparallel.
In my PR pre-commit.ci ruff complains on a lot of files, which I havn't touched. Is this a problem or can I simply ignore that?
My guess is most of this will be resolved by merging from main.
I have some more questions regarding your code suggestions: I have replied to about half of them already. Is it ok that way I did that or do you expect a separate commit for each comment? Should I resolve the conversation if I incorporate your suggestions or do you always want to do that?
Resolving yourself with a link to the commit is a fine way to do it, or just a description of what you changed. Feel free to leave them open with a comment if you feel like I should have a look to decide if it's been addressed or not. It definitely doesn't need to be a dedicated commit for each comment.
Don't worry about the git history, I think we'll use squash & merge for this one.
from ipyparallel.
Thanks again for your comments.
I have resolved all code comments except of one (allways (re-)opening the debug log file handle). Have a look on my comment and feel free to presist on your suggestion...
Apart from that I think my PR is ready to "go"
Just a final note: I have added the test_shellcmd.py also to the ssh github runner test. Otherwise the tests
[ssh-win], [ssh-win_src], [ssh-linux] and [ssh-linux_src] are never executed.
from ipyparallel.
Related Issues (20)
- No module named 'jupyter_server' HOT 2
- How to make it work with torch DDP HOT 4
- Transition from `CompositeError` to builtin `ExceptionGroup` HOT 1
- ipcluster nbextension enable not working after notebook upgrade HOT 2
- Print in multiprocessing.Process crashing the engine HOT 7
- map_sync with pandas operation function does not finish. HOT 1
- Py3.10 code serialization does not work on PyPy3.10
- sync_imports not working as intended HOT 9
- ipyparallel and pymoo doesn't work HOT 2
- AsyncResult.join doesn't work
- AsyncResult.abort() call hangs if not all jobs can be stopped HOT 1
- Question: engines and databases HOT 1
- BroadcastView map Not Implemented HOT 3
- Cannot run ipythonparallel with openmpi HOT 7
- 60s timeout on get_connection_info() is not configurable HOT 1
- please release/tag/pypi the current version as it supports JupyterLab 4.x HOT 2
- SSHEngineLauncher does not work as expected HOT 2
- Outstanding task on client but hub says completed when using broadcast view
- Entrypoints should be phased out
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ipyparallel.