Comments (16)
Agreed that repo isn't the best place for this. I dislike the solution that touches nbclient too. I've created jupyter-server/enterprise_gateway#803 and I think this one can be closed.
from nbclient.
You're not doing anything dumb, there were some independent development tracks for various efforts. It might be useful to ping on the of GatewayKernelManager devs into the thread. I remember having a conversation on this topic a long while ago but it's likely my knowledge on GatewayKernelManager is quite outdated.
from nbclient.
Bringing over the discussion to this thread.
From what I see, there's some pretty vast differences between the 2 kernel managers; interfaces are very different, one's async with tornado, the other's not. And then there's this weird client class that nobody uses that I can see: https://github.com/jupyter/enterprise_gateway/blob/master/enterprise_gateway/client/gateway_client.py#L76 which is sync. Haven't tried using it yet though
It's going to be difficult to do this very cleanly without rewriting a lot of things across jupyter_client, nbclient, and enterprise_gateway. I'm going to start with a proof of concept anyway; will post here after.
It seems this is also a bit related: jupyter/nbconvert#809
from nbclient.
@takluyver this seems to be something you're tackling with jupyter_kernel_mgmt Please correct me if I'm wrong
from nbclient.
So it seems that in addition to porting over the jupyter kernel mgmt style over to nbclient, a separate gateway client needs to be created since the available clients are interacting with ZMQ directly. or perhaps just a messaging class
from nbclient.
Hmm, well if we generalized the message getting a bit more it might be generalizable enough to branch between Gateway and normal Kernel managers. I saw @kevin-bates was involved over there, he's got a lot of parallel threads going but he may be able to chime in here on this more directly.
from nbclient.
Thanks for the copy @MSeal. I apologize upfront for the long-winded content.
This is the first I've looked at nbclient
(so many repos, so little time!) and I think I see the confusion here. Notebook's kernel_manager_class
traitlet is really a reference to a subclass of jupyter_client's MultiKernelManager
. The default value for kernel_manager_class
is Notebook's MappingKernelManager
from which GatewayKernelManager
is derived.
MappingKernelManager
is responsible for managing multiple KernelManager
instances in a map indexed by kernel_id
. MappingKernelManager
also has a traitlet named kernel_manager_class
- whose default value is KernelManager
.
GatewayKernelManager
replaces MappingKernelManager
when --gateway-url
is "in play". It does not manage KernelManager
instances, but instead, proxies lifecycle actions to the configured Gateway server. GatewayClient
is specific to the gateway integration and is essentially a singleton of static information that is constant during the Notebook process's execution.
Nbclient's kernel_manager_class
traitlet, on the other hand, is a reference to a subclass of jupyter_client's KernelManager
. It looks like nbclient uses the KernelManager
instance directly and does not go through any kind of MultiKernelManager
to manage the KernelManager
instances.
You're right, the two kinds of kernel_manager_class values are quite different. I always look for the kernel_id
parameter in the methods. If that parameter exists, then you're dealing with a form of MultiKernelManager
, else you're likely dealing with a KernelManager
directly.
What are you trying to achieve wrt Enterprise Gateway? You might be able to use EG's RemoteKernelManager
in place of KernelManager
to launch remote kernels, but that hasn't been done before, so I'm not certain of the outcome.
One of the unfortunate pitfalls of the current architecture is that you can only have one class of KernelManager. That limitation is what the jupyter_kernel_mgmt efforts are addressing. With "JKM", different kernel "providers" can be installed and the kernel discovery model will expose kernel specifications from each registered provider. Then, each provider implements its own kernel manager instance specific to the kind of kernel it's responsible for. The various Enterprise Gateway "process proxy" implementations have been converted to kernel providers in the Enterprise Gateway Experiements organization. The two prototyped providers are KubernetesKernelProvider
and YarnKernelProvider
from that org, both of which derive from RemoteKernelProvider
.
With kernel providers in place, applications like Notebook can leverage things like remote kernels directly - without requiring an EG instance, etc. The plan is that jupyter_server will adopt JKM as its kernel management framework.
from nbclient.
Thank you for the rich context @kevin-bates.
I'm ok with removing the traitlet and strict class inheritance present. It's only there to make the transition out of nbconvert easy. Sounds like maybe we need to support MultiKernelManager, RemoteKernelProvider, and KernelManager class types and write different code execution paths with as much shared code between them as possible. I'm going to mark this as an enhancement task, but I won't be able to get to such crafting for sometime. Probably going to help get the async changes up and down the stack merged first.
@golf-player if that description makes sense to you and you want to take a crack at a first-attempt I can provide reviews and suggestions. No pressure to do so, as it'll be a bit of work, but glad to have you contribute in significant ways if you want to try.
from nbclient.
@kevin-bates @MSeal appreciate the information here.
What are you trying to achieve wrt Enterprise Gateway?
I have a service running nbconvert on a remote kernel using a custom kernelclient. Since gateway exists and maintaining that kernel client is getting more and more expensive, we're looking to switch over. We need something to remotely execute notebooks.
WRT using RemoteKernelManager, I'm not sure that would make sense since that interacts with the processproxies directly rather than the gateway interface.
Right now, I'm just trying to redo the work previously done to get nbclient to use JKM since that's something that I assume y'all will want done eventually and half-solves the problem since blockingKernelClient should be easy to hook up to gateway (although I'm not 100% sure exactly how it muxes the various channels).
This is something I'll be working on in the near future, hopefully I can get some real amount of time allocated next week as this is an initiative my company seems to be leaning towards funding.
Appreciate the help! What's a good way to get in touch for one-off questions in case I get stuck somewhere? I see the gitter's a bit dead. Mailing list? Slack? Discuss forum?
from nbclient.
RemoteKernelManager and the gateway interface are orthogonal. The gateway interface only proxies /api/kernels and /api/kernelspecs requests to another server. If that server is running Kernel Gateway, any kernels launched are local to that server. If that server is running Enterprise Gateway, kernels launched by that server may or may not be local to that server.
RemoteKernelManager is pretty much just like KernelManager, but it leverages the process proxy, which essentially abstracts the Popen subprocess mechanism, allowing us to launch, discover and terminate the kernel using a resource manager (Hadoop YARN, Kubernetes, etc.). The "remoting" is more a function of the resource manager than the kernel manager/process-proxy. That said, I agree that using RemoteKernelManager directly will probably create more work than its worth - especially given the existence of JKM.
What kinds of systems are you targetting to host your remote kernels? Just trying to understand if Kernel Gateway may be more what you want.
I think using the EG discussion forum or opening an issue on the EG repo are probably your two best locations for EG questions. If the issue is related to the remote kernel providers, please open an issue in those repos. If specific to JKM, let's just use the JKM repo. I watch all of these locations.
The evolution to JKM is a big change. I believe it will be adopted eventually but can't predict when.
from nbclient.
I see what you're saying; or at least I hope I do: you're talking about not involving the gateway server at all and instead directly using RKM to interact with process proxies.
This could be a better approach than the one I was thinking of. I'll try this as well, although now I'm a bit confused about what you meant here: "using RemoteKernelManager directly will probably create more work than its worth - especially given the existence of JKM" why would RKM make things harder than anything else while using jupyter kernel mgmt?
We're hosting remote kernels on a couple kube clusters. Going by https://jupyter-enterprise-gateway.readthedocs.io/en/latest/getting-started.html#kernel-gateway-vs-enterprise-gateway, enterprise gateway is definitely what we'll need.
Keeping this on github seems like a good idea then.
from nbclient.
Hi.
"using RemoteKernelManager directly will probably create more work than its worth - especially given the existence of JKM" why would RKM make things harder than anything else while using jupyter kernel mgmt?
This is because RemoteKernelManager
is baked into EG and probably relies on some information that EG provides. It was not designed to be run "standalone". JKM, on the other hand, is designed to be its own isolated entity - although the remote kernel providers we do provide have some of the EG functionality, but are not necessarily dependent on EG (like I suspect RKM to be).
We're hosting remote kernels on a couple kube clusters. Going by https://jupyter-enterprise-gateway.readthedocs.io/en/latest/getting-started.html#kernel-gateway-vs-enterprise-gateway, enterprise gateway is definitely what we'll need.
I agree EG is what you want.
However, nbclient comes at things at a far lower level. I.e., it's not a web application nor is it designed to hit a web application (via REST, etc.). So it seems like JKM is probably your best shot. I would also like to point out that the KubernetesKernelProvider
requires its caller (nbclient in this case), also be running in the k8s cluster. That's one of the advantages that EG brings, in that with EG, only it needs to be running in k8s, while the Notebook server that the user hits can be outside.
from nbclient.
Appreciate the clarification. I hadn't looked too deeply into it, but it makes sense that the kuberneteskernelprovider would need to be run in the cluster. Currently the service I have running nbconvert is run outside the cluster, but that could be changed easily.
So it seems like the "correct" long term solution would be to implement jkm, but it's a massive effort. I'll see what baby steps I can find that might help.
from nbclient.
Got pulled into another project. I finally got time to work on this again. @kevin-bates your suggestion to use RemoteKernelManager directly worked like a charm. There are some things I will contribute upstream since configuring it was challenging.
Here's the code I used:
import os
import nbformat
from nbclient import NotebookClient
from enterprise_gateway.services.kernels.remotemanager import RemoteKernelManager, RemoteMappingKernelManager
from enterprise_gateway.enterprisegatewayapp import EnterpriseGatewayApp
eg = EnterpriseGatewayApp()
rmkm = RemoteMappingKernelManager(parent=eg,
log=eg.log,
connection_dir=eg.runtime_dir,)
class RKM(RemoteKernelManager):
def __init__(self, **kwargs):
kernel_id = rmkm.new_kernel_id()
kwargs.update({
"parent": rmkm,
"log": rmkm.log,
"kernel_name": "python_kubernetes",
"connection_file": os.path.join(rmkm.connection_dir, "kernel-%s.json" % kernel_id)
})
super(RKM, self).__init__(**kwargs)
self.kernel_id = kernel_id
with open("/home/ish/notebooks/Untitled.ipynb") as fp:
test_notebook = nbformat.read(fp, as_version=4)
client = NotebookClient(nb=test_notebook, kernel_manager_class=RKM)
Couple things I plan to address (and I'd appreciate your thoughts here):
- I think
RemoteKernelManager
should make a kernel_id if one is not set (still not actually sure where kernel_id gets set normally; the factory calls don't appear to pass in kernel_id) RemoteKernelManager
gets its config from parent.parent (JEG app instance). Do y'all think it makes sense to give it the same config as JEG (not too familiar with traitlets, but I assume there's a way to share config as a mixin or something)? If not, I'll need to add a parent param to theNotebookClient
constructor to receive parent and set it on the kernel_manager_class, but tbh I dislike that solution.
from nbclient.
I'm glad to hear you've had some success, but, to be honest, I don't think this is the right approach. You really want a "kernel manager" that is not so strongly tied to a specific application (EG). Had I envisioned RemoteKernelManager
being used outside EG, I'd probably have done some things differently, but between figuring out how things work in this ecosystem and the need to get something working, some corners were probably cut. I think the better solution would be to leverage the kernel provider architecture, since this one of the use cases that would address. However, that's not at a level of acceptance yet that would warrant any kind of changes and I'm not sure how the various applications are planning that adoption, if at all.
I would be happy to keep working with you, but I don't advocate making essentially one-off changes to this repository to try to force in something that wasn't designed for this use case.
Regarding kernel_id, the base framework has always treated kernel_id as something only the MultiKernelManager
needs - since it represents the index into its hashmap of kernel managers. This was water under the bridge when RMK was developed. Since nbclient is entirely single-use, there's no need for a multi-kernel manager. I believe "ownership" of kernel_id lives in the kernel provider in the JKM architecture.
If you'd like to open an issue in EG about making RMK a bit more independent, that's probably the best approach for now. There might be a way we could massage things such that no nbclient changes would be required - which I view as the primary requirement to this exercise.
from nbclient.
Glad there was a alignment in direction and clarity gained on this issue.
from nbclient.
Related Issues (20)
- 0.5.13: pytest warnings and fails HOT 3
- Question: Total notebook timeout possible? HOT 3
- Using nbclient to talk to jupyter lab running remotely HOT 10
- Renamed default branch to main
- 0.6.1: sphinx faiils because missing file HOT 2
- Background Python process after running tests
- How to reuse exsisting kernel? HOT 4
- Test failure in ipywidgets 8 HOT 2
- nbclient 0.6.6 doesn't report cell magic error properly
- Cell caching HOT 6
- just_run does not close event loop it creates HOT 2
- Project dependencies may have API risk issues HOT 2
- 0.7.1: pytest is failing because missing ` jupyter_core.utils.ensure_async` HOT 11
- jupyter_core version constraint is invalid HOT 3
- nbclient >= 0.7.1 raises ImportError when trying to open a notebook HOT 1
- AttributeError: 'KernelManager' object has no attribute 'cleanup' HOT 5
- `NotebookClient.wait_for_reply` hangs with jupyter_client 8 or later HOT 4
- Is it possible to programmatically inspect variables of a NotebookNode? HOT 5
- output of type `stream` is split over multiple cells HOT 10
- execute_cell running error HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nbclient.