ipython / ipyparallel Goto Github PK

View Code? Open in Web Editor NEW

2.6K 122.0 994.0 22.07 MB

IPython Parallel: Interactive Parallel Computing in Python

Home Page: https://ipyparallel.readthedocs.io/

License: Other

Python 30.07% CSS 0.12% JavaScript 0.36% Dockerfile 0.06% TypeScript 1.46% Shell 0.04% Jupyter Notebook 67.89%

parallel jupyter python

ipyparallel's Introduction

Interactive Parallel Computing with IPython

IPython Parallel (ipyparallel) is a Python package and collection of CLI scripts for controlling clusters of IPython processes, built on the Jupyter protocol.

IPython Parallel provides the following commands:

ipcluster - start/stop/list clusters
ipcontroller - start a controller
ipengine - start an engine

Install

Install IPython Parallel:

pip install ipyparallel

This will install and enable the IPython Parallel extensions for Jupyter Notebook and (as of 7.0) Jupyter Lab 3.0.

Run

Start a cluster:

ipcluster start

Use it from Python:

import os
import ipyparallel as ipp

cluster = ipp.Cluster(n=4)
with cluster as rc:
    ar = rc[:].apply_async(os.getpid)
    pid_map = ar.get_dict()

See the docs for more info.

ipyparallel's People

Contributors

Stargazers

Watchers

Forkers

sylvaincorlay minrk ssanderson censio mrocklin hannorein jngaravitoc alope107 mkgilbert cloudxtreme takluyver modulexcite andreabedini bopo txd866 rgbkrk jakirkham carreau happyqianwei roxyboy willingc csala chaodluffy aarchiba yaleleo chinaquants princeofdarkness76 aitatanit zhaoqunhe dmellop jdonson dikoufu chuash gazegn lazyleaves melvinvarkey 0x0l simudream bamcdougall uttambhattrai datamining4science frankyyyt seaurchinbot stonebig pruthvishetty nikolayvoronchikhin jbarbosad tcwalther coroa neuroidss hubbitus learninggithup d2wsmallyear reply2vikas mgeplf nojhamster hanhanwu rikima wwwtravel fiolbs hamogu kavi26 aminjamalzadeh njisrawi jfear tacaswell mariofajardo martinwatzinger alam00179 jacobbaumbach gowert fraidal colordays3 jiapei100 ruk0610 kumsh stevenlol renewang chiping waqasm86 shivamagrawal2014 ktargows digideskio francisco-dlp dsblank imbayoh blaise-coder wjlancas baxtereaves yogeshsharma1981 rajnunes lesteve molinamanlio bnt303 khaibnd basnijholt algoskynet mithunghosh adamchainz uos-msam

ipyparallel's Issues

Failing RTD doc builds

The absence of a requirements.txt file is causing the builds to fail.

Recommend removing the requirements.txt file from the Advanced Setting tab in RTD and rebuild latest docs.

BUG: AttributeError: 'module' object has no attribute 'use_cloudpickle'

Maybe this also applies to things like use_dill. Is there some other way we are suppose to be doing this in ipyparallel 5?

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-1a2f16d27967> in <module>()
----> 1 client[:].use_cloudpickle().get()

/opt/conda/lib/python2.7/site-packages/ipyparallel/client/view.pyc in use_cloudpickle(self)
    502         This calls ipyparallel.serialize.use_cloudpickle() here and on each engine.
    503         """
--> 504         serialize.use_cloudpickle()
    505         return self.apply(serialize.use_cloudpickle)
    506 

AttributeError: 'module' object has no attribute 'use_cloudpickle'

Docker containers for ipyparallel

I wrote a pair of Dockerfile to make it easy to deploy ipyparallel clusters with containers. The design is to run seperate containers for the controller, the engines (several of them in a single container) and the clients. All cluster communications are tunnelled through SSH so you only need to open 1 port on the docker host of the controller and that's it.

I don't know if this is the right place but I think people might find it useful:
https://github.com/0x0L/jupyter-cluster

TaskScheduler raises a TraitError when logging is forwarded to a url using 0MQ

Using the following package versions installed with conda on Windows 7 64bit

Package	Version
python	2.7.10
ipykernel	4.0.3
ipyparallel	4.0.2
ipython	4.0.0
jupyter_core	4.0.4
pyzmq	14.7.0
setuptools	18.4
traitlets	4.0.0

With IPython.sys_info():

{'commit_hash': u'f534027',
 'commit_source': 'installation',
 'default_encoding': 'cp437',
 'ipython_path': 'C:\\Users\\bheklilr\\AppData\\Local\\Continuum\\Miniconda3\\envs\\quick\\lib\\site-packages\\IPython',
 'ipython_version': '4.0.0',
 'os_name': 'nt',
 'platform': 'Windows-7-6.1.7601-SP1',
 'sys_executable': 'C:\\Users\\bheklilr\\AppData\\Local\\Continuum\\Miniconda3\\envs\\quick\\python.exe',
 'sys_platform': 'win32',
 'sys_version': '2.7.10 |Continuum Analytics, Inc.| (default, Sep 15 2015, 14:26:14) [MSC v.1500 64 bit (AMD64)]'}

If I run

ipcontroller start --IPControllerApp.log_url=tcp://127.0.0.1:12345

I see the logging message

2015-10-26 15:52:22.441 [IPControllerApp] Forwarding logging to tcp://127.0.0.1:12345

and then after the pid file is created I get the error message

Process Process-1:
Traceback (most recent call last):
  File "C:\Users\bheklilr\AppData\Local\Continuum\Miniconda3\envs\quick\lib\multiprocessing\process.py", line 258, in _bootstrap
    self.run()
  File "C:\Users\bheklilr\AppData\Local\Continuum\Miniconda3\envs\quick\lib\multiprocessing\process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\bheklilr\AppData\Local\Continuum\Miniconda3\envs\quick\lib\site-packages\ipyparallel\controller\scheduler.py", line 842, in launch_scheduler
    config=config)
  File "C:\Users\bheklilr\AppData\Local\Continuum\Miniconda3\envs\quick\lib\site-packages\jupyter_client\session.py", line 169, in __init__
    super(SessionFactory, self).__init__(**kwargs)
  File "C:\Users\bheklilr\AppData\Local\Continuum\Miniconda3\envs\quick\lib\site-packages\traitlets\config\configurable.py", line 74, in __init__
    super(Configurable, self).__init__(**kwargs)
  File "C:\Users\bheklilr\AppData\Local\Continuum\Miniconda3\envs\quick\lib\site-packages\traitlets\traitlets.py", line 588, in __init__
    setattr(self, key, value)
  File "C:\Users\bheklilr\AppData\Local\Continuum\Miniconda3\envs\quick\lib\site-packages\traitlets\traitlets.py", line 450, in __set__
    new_value = self._validate(obj, value)
  File "C:\Users\bheklilr\AppData\Local\Continuum\Miniconda3\envs\quick\lib\site-packages\traitlets\traitlets.py", line 471, in _validate
    value = self.validate(obj, value)
  File "C:\Users\bheklilr\AppData\Local\Continuum\Miniconda3\envs\quick\lib\site-packages\traitlets\traitlets.py", line 1045, in validate
    self.error(obj, value)
  File "C:\Users\bheklilr\AppData\Local\Continuum\Miniconda3\envs\quick\lib\site-packages\traitlets\traitlets.py", line 899, in error
    raise TraitError(e)
TraitError: The 'log' trait of a TaskScheduler instance must be a Logger, but a value of type 'NoneType' (i.e. None) was specified.

And I don't actually seem to get log messages forwarded to that URL using the code

from __future__ import print_function

import time
import threading
import subprocess
import zmq
from zmq.eventloop import ioloop, zmqstream


class Log(object):
    def __init__(self, url):
        self.url = url

        self.ctx = zmq.Context()

        self.lock = threading.Lock()
        self._messages = []

    def log_message(self, msg):
        print('Got message: {}'.format(msg))
        with self.lock:
            self._messages.append(msg)

    @property
    def messages(self):
        with self.lock:
            return self._messages[:]

    def start_listening(self):
        self.socket = self.ctx.socket(zmq.SUB)
        self.socket.bind(self.url)

        self.loop = ioloop.IOLoop().instance()
        self.stream = zmqstream.ZMQStream(self.socket, self.loop)

        self.stream.on_recv(self.log_message)

        self.loop_thread = threading.Thread(target=self.loop.start)
        self.loop_thread.daemon = True
        self.loop_thread.start()

    def stop_listening(self):
        self.loop.stop()
        self.loop_thread.join()
        self.socket.unbind(self.url)


l = Log('tcp://127.0.0.1:12345')
l.start_listening()

p = subprocess.Popen(['ipcontroller', '--IPControllerApp.log_url=tcp://127.0.0.1:12345'],
                     stdout=subprocess.PIPE, stderr=subprocess.PIPE)

time.sleep(10)

p.terminate()
p.wait()

print(l.messages)

I may be using zmq incorrectly, I don't really have much experience with it, but regardless it seems like there's a bug in ipyparallel when configuring the IPControllerApp.log_url that shouldn't be there. I did try putting this in my profile's ipcontroller_config.py file but I get the same behavior as when specifying it from the command line, so at least it's working consistently.

Example of how to use wait_interactive async result method()

I created a question on stackoverflow: http://stackoverflow.com/questions/33061111/monitoring-status-of-ipyparallel-completion. Basically, looking through the async.py file I see that there is a wait_interactive method, but have not been able to figure out how to use it. I don't imagine I'm going to be the last to hope to get periodic updates during large jobs, so If I can get some clarification I'll submit a file that can be added to the examples folder to help others.

Engine hangs if an SSH exception occurs

The ipengine will hang if the url file exists (such as when using reuse files on the controller and copying engine config to another computer) but registration fails for whatever reason (ssh connect/login failed, controller is down, etc..).

To test:

Install openssh-server (if not done already)
Start a controller with --reuse arg
Modify the ipcontroller-engine.json to ssh to '[email protected]' or a valid ip with no ssh server running
Start an engine --> Fails but never exits

Systems tested on:

>python -c "import IPython; print(IPython.sys_info())"
{'commit_hash': u'f534027',
 'commit_source': 'installation',
 'default_encoding': 'cp437',
 'ipython_path': 'C:\\Python27\\lib\\site-packages\\IPython',
 'ipython_version': '4.0.0',
 'os_name': 'nt',
 'platform': 'Windows-8-6.2.9200',
 'sys_executable': 'C:\\Python27\\python.exe',
 'sys_platform': 'win32',
 'sys_version': '2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Int
el)]'}

Example failure on windows with no ssh server running


Registering with controller at tcp://127.0.0.1:64993
Exception in callback <functools.partial object at 0x04F99E10>
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\tornado\ioloop.py", line 592, in _run_callback
    ret = callback()
  File "C:\Python27\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Python27\lib\site-packages\ipyparallel\engine\engine.py", line 297, in _start
    self.register()
  File "C:\Python27\lib\site-packages\ipyparallel\engine\engine.py", line 138, in register
    connect,maybe_tunnel = self.init_connector()
  File "C:\Python27\lib\site-packages\ipyparallel\engine\engine.py", line 103, in init_connector
    if self.tunnel_mod.try_passwordless_ssh(self.sshserver, self.sshkey, self.paramiko):
  File "C:\Python27\lib\site-packages\zmq\ssh\tunnel.py", line 78, in try_passwordless_ssh
    return f(server, keyfile)
  File "C:\Python27\lib\site-packages\zmq\ssh\tunnel.py", line 122, in _try_passwordless_paramiko
    look_for_keys=True)
  File "C:\Python27\lib\site-packages\paramiko\client.py", line 251, in connect
    retry_on_signal(lambda: sock.connect(addr))
  File "C:\Python27\lib\site-packages\paramiko\util.py", line 270, in retry_on_signal
    return function()
  File "C:\Python27\lib\site-packages\paramiko\client.py", line 251, in <lambda>
    retry_on_signal(lambda: sock.connect(addr))
  File "C:\Python27\lib\socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 10061] No connection could be made because the target machine actively refused it

Example of failure on linux (host not validated)

$ python -c "import IPython; print(IPython.sys_info())"
{'commit_hash': u'f534027',
 'commit_source': 'installation',
 'default_encoding': 'UTF-8',
 'ipython_path': '/usr/local/lib/python2.7/dist-packages/IPython',
 'ipython_version': '4.0.0',
 'os_name': 'posix',
 'platform': 'Linux-3.16.0-38-generic-x86_64-with-LinuxMint-17.2-rafaela',
 'sys_executable': '/usr/bin/python',
 'sys_platform': 'linux2',
 'sys_version': '2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]'}

~ $ ipengine
2015-10-28 13:09:04.162 [IPEngineApp] Loading url_file u'/home/labuser/.ipython/profile_default/security/ipcontroller-engine.json'
2015-10-28 13:09:04.166 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53231
ERROR:tornado.application:Exception in callback <functools.partial object at 0x7f3bcb1799f0>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/minitornado/ioloop.py", line 463, in _run_callback
    callback()
  File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/minitornado/stack_context.py", line 331, in wrapped
    raise_exc_info(exc)
  File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/minitornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ipyparallel/engine/engine.py", line 297, in _start
    self.register()
  File "/usr/local/lib/python2.7/dist-packages/ipyparallel/engine/engine.py", line 138, in register
    connect,maybe_tunnel = self.init_connector()
  File "/usr/local/lib/python2.7/dist-packages/ipyparallel/engine/engine.py", line 103, in init_connector
    if self.tunnel_mod.try_passwordless_ssh(self.sshserver, self.sshkey, self.paramiko):
  File "/usr/local/lib/python2.7/dist-packages/zmq/ssh/tunnel.py", line 78, in try_passwordless_ssh
    return f(server, keyfile)
  File "/usr/local/lib/python2.7/dist-packages/zmq/ssh/tunnel.py", line 99, in _try_passwordless_openssh
    raise SSHException('The authenticity of the host can\'t be established.')
SSHException: The authenticity of the host can't be established.

In both cases above the error is expected, but the engine hangs and never exits.

To fix it, the abort timeout should happen before calling register in _start and a check added in abort.

def _patch_ipengine():
    """ Patch fixes engine not exiting on ssh error """
    from ipyparallel.engine.engine import EngineFactory

    def start(self):
        loop = self.loop
        def _start():
            self._abort_timeout = loop.add_timeout(loop.time() + self.timeout, self.abort)
            self.register()
        self.loop.add_callback(_start)

    def abort(self):
        self.log.fatal("Registration timed out after %.1f seconds"%self.timeout)
        if self.url.startswith('127.'):
            self.log.fatal("""
            If the controller and engines are not on the same machine,
            you will have to instruct the controller to listen on an external IP (in ipcontroller_config.py):
                c.HubFactory.ip='*' # for all interfaces, internal and external
                c.HubFactory.ip='192.168.1.101' # or any interface that the engines can see
            or tunnel connections via ssh.
            """)

        # If SSH fails registrar will be None so check first
        if self.registrar:
            self.session.send(self.registrar, "unregistration_request", content=dict(id=self.id))
        time.sleep(1)
        sys.exit(255)

    EngineFactory.abort = abort    
    EngineFactory.start = start

Ipcontroller doesn't find ipcontroller-engine.json when '--work-dir' flag for c.SSHEngineSetLauncher.engines is set in ipcluster_config.py

My ipcluster_config.py is:

projectname='chase'
localuser='matus'
remoteuser='osadmin'
remoteips=['<SERVERIP>']
remoteworkdir='/home/'+remoteuser+'/'+projectname
c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
c.LocalControllerLauncher.controller_args = ["--ip='*'"]
c.SSHEngineSetLauncher.engines={}
for ri in remoteips:
    c.SSHEngineSetLauncher.engines[remoteuser+'@'+ri]=(1,
        ['--work-dir='+remoteworkdir])
print c.SSHEngineSetLauncher.engines
c.SSHLauncher.to_send = [('/home/'+localuser+'/.ipython/profile_'+projectname+'/security/ipcontroller-client.json', 
    '/home/'+remoteuser+'/.ipython/profile_'+projectname+'/security/ipcontroller-client.json'),
    ('/home/'+localuser+'/.ipython/profile_'+projectname+'/security/ipcontroller-engine.json', 
    '/home/'+remoteuser+'/.ipython/profile_'+projectname+'/security/ipcontroller-engine.json')]

Withipcluster start --profile=chase --debug I get

2016-02-12 20:38:52.691 [IPClusterStart] 2016-02-12 20:38:52.638 [IPEngineApp] Changing to working dir: /home/osadmin/chase
2016-02-12 20:38:52.691 [IPClusterStart] 2016-02-12 20:38:52.638 [IPEngineApp] WARNING | url_file u'.ipython/profile_chase/security/ipcontroller-engine.json' not found
2016-02-12 20:38:52.691 [IPClusterStart] 2016-02-12 20:38:52.638 [IPEngineApp] WARNING | Waiting up to 5.0 seconds for it to arrive.
2016-02-12 20:38:57.703 [IPClusterStart] 2016-02-12 20:38:57.649 [IPEngineApp] CRITICAL | Fatal: url file never arrived: .ipython/profile_chase/security/ipcontroller-engine.json
2016-02-12 20:38:57.721 [IPClusterStart] Connection to <SERVERIP> closed.
2016-02-12 20:38:57.721 [IPClusterStart] Process 'ssh' stopped: {'pid': 13282, 'exit_code': 1}
2016-02-12 20:39:22.409 [IPClusterStart] Engines appear to have started successfully

However, stat -c "%y %s %n" ~/.ipython/profile_chase/security/ipcontroller-engine.json says 2016-02-12 20:38:52.347508282 +0100 344 /home/osadmin/.ipython/profile_chase/security/ipcontroller-engine.json

Both client and the remote server are at version 4.1.1

When I omit the --work-dir flag in the config everything works.

Manual start on server (ipengine --profile=chase --work-dir=/home/osadmin/chase) also works.

Looks like the controller is not looking for ipcontroller-engine.json at the correct location.

Cannot read large buffer/file from engine

I'm trying to read a file/large string buffer from an engine . But it is returning <memory at 0x036F3B70> instead of the files data.

To reproduce:

Start an ipcluster
Create a client
Create a file > 1MB
Read the file using apply function

In [2]: from ipyparallel import Client

In [3]: c = Client()

In [4]: d = c[0]

In [5]: def read(path):
   ...:     with open(path,'rb') as f:
   ...:         return f.read(1024000)
   ...:

In [6]: r = d.apply_async(read,p)

In [7]: r.get()
Out[7]: '<memory at 0x0319E5D0>'

In [8]: def read():
   ...:     return 'a'*1024000
   ...:

In [9]: r = d.apply_async(read)

In [10]: r.get()
Out[10]: '<memory at 0x0319E580>'

This worked fine on 3.1.0.

`ipyparallel.Reference` vs `view.getitem`

If e0 is a blocked engine which has a local variable a, and this function returns the same object id, is the takeaway that these are two ways of calling the same object?

If so, what is the purpose of bringing up ipyparallel.Reference, especially because it doesn't explicitly state which engine this variable is coming from?

e0.apply(lambda x,y: (id(x),id(y)), ipyparallel.Reference('a'), e0['a'])

low-frequency error on module object with `use_dill`

Apparently there's a low-frequency issue (submitted by @jakirkham) that surfaces with code similar to this:

from ipyparallel import Client

client = Client()
client[:].use_dill().get()

with client[:].sync_imports():
    import nest
    from nest import A, B

Which produces:

AttributeError: 'module' object has no attribute '__main__'

I have not been able to reproduce the error, however @tcwalther has -- and provides a workaround.

See uqfoundation/dill#133.

To me, it smells like something ipyparallel should be handling… so, I'm punting this over to an ipyparallel ticket until I hear differently.

SSH engine_cmd has to be a list instead of a string

Current version fails to start when specifying an engine_cmd with a string like the documentation suggests.

For example:

#------------------------------------------------------------------------------
# SSHEngineLauncher configuration
#------------------------------------------------------------------------------
c.SSHEngineSetLauncher.engines = {
    'hostname01': {
        'n': 8,
        'engine_cmd': '/home/user/bin/ipengine',
    }
}

Produces the following output:

$ ipcluster start
2015-10-01 15:30:49.911 [IPClusterStart] Starting ipcluster with [daemon=False]
2015-10-01 15:30:49.912 [IPClusterStart] Creating pid file: /home/user/.ipython/profile_default/pid/ipcluster.pid
2015-10-01 15:30:49.912 [IPClusterStart] Starting Controller with LocalControllerLauncher
2015-10-01 15:30:50.914 [IPClusterStart] Starting 8 Engines with SSH
2015-10-01 15:30:50.917 [IPClusterStart] ERROR | Engine start failed
Traceback (most recent call last):
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/ipyparallel/apps/ipclusterapp.py", line 332, in start_engines
    self.engine_launcher.start(self.n)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/ipyparallel/apps/launcher.py", line 790, in start
    el.engine_cmd = cmd
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 450, in __set__
    new_value = self._validate(obj, value)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 471, in _validate
    value = self.validate(obj, value)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 1624, in validate
    value = super(List, self).validate(obj, value)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 1542, in validate
    value = super(Container, self).validate(obj, value)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 1045, in validate
    self.error(obj, value)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 899, in error
    raise TraitError(e)
TraitError: The 'engine_cmd' trait of a SSHEngineLauncher instance must be a list, but a value of type 'str' (i.e. '/home/user/bin/ipengine') was specified.
ERROR:tornado.application:Exception in callback <functools.partial object at 0x7fb639149e10>
Traceback (most recent call last):
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/tornado/ioloop.py", line 592, in _run_callback
    ret = callback()
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/ipyparallel/apps/ipclusterapp.py", line 332, in start_engines
    self.engine_launcher.start(self.n)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/ipyparallel/apps/launcher.py", line 790, in start
    el.engine_cmd = cmd
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 450, in __set__
    new_value = self._validate(obj, value)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 471, in _validate
    value = self.validate(obj, value)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 1624, in validate
    value = super(List, self).validate(obj, value)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 1542, in validate
    value = super(Container, self).validate(obj, value)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 1045, in validate
    self.error(obj, value)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/traitlets/traitlets.py", line 899, in error
    raise TraitError(e)
TraitError: The 'engine_cmd' trait of a SSHEngineLauncher instance must be a list, but a value of type 'str' (i.e. '/home/user/bin/ipengine') was specified.

Solution may be either encapsulating the value of cmd inside a list in case it is a string, or changing the documentation to indicate that engine_cmd must be a list.

remote restart ipengine

I think this actually a pretty old idea, but I was wondering if there has been any movement on it.

In the same way you can restart a kernel in a notebook, it would be awesome if you could restart an ipengine. My understanding is that this would require a nontrivial rewrite of the engine, involving an entire extra monitoring process that just isn't there right now.

Does this seem like something likely to be implemented? If I took a crack at it, would that be helpful? Or is there another plan...

Unhelpful error if engines start before controller

I'm starting an IPython cluster on an HPC cluster. My script to start the cluster first sends a job that starts ipcontroller, waits for 10 seconds, and then starts each engine 0.5 seconds apart. Occasionally, the ipcontroller job will take longer than 10 seconds to start (delay unrelated to ipyparallel, it's due to our HPC cluster), meaning the engines try to connect before the controller has started. I'm passing the engines the path to where the connection file should eventually exist, but seeing as the file does not exist yet the engines fail with the following message:

2015-11-23 08:16:50.073 [IPEngineApp] ERROR | Couldn't start the Engine
Traceback (most recent call last):
  File "\\hpc\python\lib\site-packages\ipyparallel\apps\ipengineapp.py", line 347, in init_engine
    connection_info=self.connection_info,
AttributeError: 'IPEngineApp' object has no attribute 'connection_info'

This seems to be because IPEngineApp.load_connector_file never gets called in IPEngineApp.init_engine because it doesn't wait for the url file. I guess I'm not supposed to be doing this (instead I should be telling it the IPython directory and the profile name, that way it should wait for the url file to arrive), but it would be nice if it would either fail more gracefully or just wait for the url file for the time specified by IPEngineApp.wait_for_url_file in this case.

problem with Futures.ipynb

hi, under windows / Python 3.4, I get an error on "time" (cell 10). Is it my install or is it a typo in this example ?

from tornado.gen import coroutine, sleep

from tornado.ioloop import IOLoop

import sys



def sleep_task(t):

    time.sleep(t)

    return os.getpid()



@coroutine

def background():

    """A backgorund coroutine to demonstrate that we aren't blocking"""

    while True:

        yield sleep(1)

        print('.', end=' ')

        sys.stdout.flush() # not needed after ipykernel 4.3



@coroutine

def work():

    """Submit some work and print the results when complete"""

    for t in [ 1, 2, 3, 4 ]:

        ar =  rc[:].apply(sleep_task, t)

        result = yield ar # this waits

        print(result)



loop = IOLoop()

loop.add_callback(background)

loop.run_sync(work)

[Engine Exception]
---------------------------------------------------------------------------NameError                                 Traceback (most recent call last)<string> in <module>()
<ipython-input-10-b9c3715261e4> in sleep_task(t)
NameError: name 'time' is not defined

[Engine Exception]
---------------------------------------------------------------------------NameError                                 Traceback (most recent call last)<string> in <module>()
<ipython-input-10-b9c3715261e4> in sleep_task(t)
NameError: name 'time' is not defined

Feedback misleading when stopping engines

It seems that the logger is using a cut/paste misleading output when closing down the engines:

$ ipcluster start -n 6
2016-03-07 16:19:20.301 [IPClusterStart] Starting ipcluster with [daemon=False]
2016-03-07 16:19:20.302 [IPClusterStart] Creating pid file: /Users/klay6683/.ipython/profile_default/pid/ipcluster.pid
2016-03-07 16:19:20.302 [IPClusterStart] Starting Controller with LocalControllerLauncher
2016-03-07 16:19:21.308 [IPClusterStart] Starting 6 Engines with LocalEngineSetLauncher
^C2016-03-07 16:19:50.707 [IPClusterStart] ERROR | IPython cluster: stopping
2016-03-07 16:19:50.707 [IPClusterStart] Stopping Engines...
2016-03-07 16:19:51.845 [IPClusterStart] Engines appear to have started successfully
2016-03-07 16:19:53.710 [IPClusterStart] Removing pid file: /Users/klay6683/.ipython/profile_default/pid/ipcluster.pid
(py35)

After the line with ERROR which is when I stopped the cluster with Ctrl-C, it again states that the "Engines appear to have started successfully.".

Version: ipcluster 4.1.2 on OSX

API Documentation missing?

http://ipyparallel.readthedocs.org/en/latest/api/ipyparallel.html is blank, also for the stable version.

ipyparallel - 'CannedFunction' object is not callable

I am trying to set up a cluster using StarCluster and ipyparallel. When I try to run the following commands, I get the error below. I have never seen a python error
missing a stack trace this way.

from ipyparallel import Client
rc = Client()
ipview = rc[:]
res = ipview.apply_async(lambda x: x+3, 3)
res.get()

I get the following error:

[0:apply]: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)<string> in <module>()
TypeError: 'CannedFunction' object is not callable

[1:apply]: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)<string> in <module>()
TypeError: 'CannedFunction' object is not callable

Not even sure where to start debugging this.

Here is the output from pip freeze on the master node

apt-xapian-index==0.45
backports-abc==0.4
backports.ssl-match-hostname==3.5.0.1
boto==2.3.0
certifi==2016.2.28
chardet==2.0.1
Cheetah==2.4.4
cloud-init==0.7.2
configobj==4.7.2
Cython==0.17.4
decorator==4.0.9
distro-info==0.10
Django==1.6.1
docutils==0.10
drmaa==0.7.5
dumbo==0.21.36
euca2ools==2.1.1
futures==3.0.5
ipykernel==4.3.1
ipyparallel==5.0.1
ipython==4.0.0
ipython-cluster-helper==0.5.1
ipython-genutils==0.1.0
Jinja2==2.6
jupyter-client==4.2.1
jupyter-core==4.1.0
Landscape-Client==13.7.3
M2Crypto==0.21.1
Mako==0.7.3
MarkupSafe==0.15
matplotlib==1.2.1
meld3==0.6.10
mercurial==2.2.2
mpi4py==1.3.1
multiprocessing==2.6.2.1
netifaces==0.10.4
nose==1.1.2
numexpr==2.0.1
numpy==1.8.0
oauth==1.0.1
openpyxl==1.5.8
PAM==0.4.2
pandas==0.12.0
paramiko==1.7.7.1
path.py==8.1.2
pexpect==4.0.1
pickleshare==0.6
Pillow==2.3.0
prettytable==0.6.1
pssh==2.2.2
ptyprocess==0.5.1
pudb==2013.5.1
pycrypto==2.6
pycurl==7.19.0
Pygments==1.6
pyOpenSSL==0.13
pyparsing==1.5.7
pyserial==2.6
python-apt===0.8.8ubuntu6
python-dateutil==1.5
python-debian===0.1.21-nmu2ubuntu1
python-distutils-extra==2.37
pytz==2012rc0
PyYAML==3.10
pyzmq==15.2.0
requests==1.1.0
roman==1.4.0
scipy==0.13.2
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
Sphinx==1.1.3
ssh-import-id==3.14
statsmodels==0.4.2
supervisor==3.0
tables==2.4.0
tornado==4.3
traitlets==4.1.0
Twisted-Core==12.3.0
Twisted-Names==12.3.0
Twisted-Web==12.3.0
typedbytes==0.3.8
urllib3==1.5
urwid==1.1.2
virtualenv==1.11
xlrd==0.6.1
xlwt==0.7.4
zope.interface==4.0.5

Here is the pip freeze from the only slave node

apt-xapian-index==0.45
backports-abc==0.4
backports.ssl-match-hostname==3.5.0.1
boto==2.3.0
certifi==2016.2.28
chardet==2.0.1
Cheetah==2.4.4
cloud-init==0.7.2
configobj==4.7.2
Cython==0.17.4
decorator==4.0.9
distro-info==0.10
Django==1.6.1
docutils==0.10
drmaa==0.7.5
dumbo==0.21.36
euca2ools==2.1.1
futures==3.0.5
ipykernel==4.3.1
ipyparallel==5.0.1
ipython==4.0.0
ipython-cluster-helper==0.5.1
ipython-genutils==0.1.0
Jinja2==2.6
jupyter-client==4.2.1
jupyter-core==4.1.0
Landscape-Client==13.7.3
M2Crypto==0.21.1
Mako==0.7.3
MarkupSafe==0.15
matplotlib==1.2.1
meld3==0.6.10
mercurial==2.2.2
mpi4py==1.3.1
multiprocessing==2.6.2.1
netifaces==0.10.4
nose==1.1.2
numexpr==2.0.1
numpy==1.8.0
oauth==1.0.1
openpyxl==1.5.8
PAM==0.4.2
pandas==0.12.0
paramiko==1.7.7.1
path.py==8.1.2
pexpect==4.0.1
pickleshare==0.6
Pillow==2.3.0
prettytable==0.6.1
pssh==2.2.2
ptyprocess==0.5.1
pudb==2013.5.1
pycrypto==2.6
pycurl==7.19.0
Pygments==1.6
pyOpenSSL==0.13
pyparsing==1.5.7
pyserial==2.6
python-apt===0.8.8ubuntu6
python-dateutil==1.5
python-debian===0.1.21-nmu2ubuntu1
python-distutils-extra==2.37
pytz==2012rc0
PyYAML==3.10
pyzmq==15.2.0
requests==1.1.0
roman==1.4.0
scipy==0.13.2
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
Sphinx==1.1.3
ssh-import-id==3.14
statsmodels==0.4.2
supervisor==3.0
tables==2.4.0
tornado==4.3
traitlets==4.1.0
Twisted-Core==12.3.0
Twisted-Names==12.3.0
Twisted-Web==12.3.0
typedbytes==0.3.8
urllib3==1.5
urwid==1.1.2
virtualenv==1.11
xlrd==0.6.1
xlwt==0.7.4
zope.interface==4.0.5

Feature proposal: Launch ipcluster from python

For simple parallelization requirements on a single host it is often an extra burden (especially when designing APIs) to require the user to launch ipcluster. If instead I could also launch these from within my Python library it could be a very nice replacement for multiprocessing which supports this mode. Specifically, if my library is using multiprocessing the user doesn't have to care about anything and get single-machine parallelization for free. The problem is that multiprocessing sucks and I'd much rather use ipyparallel for this as well.

I suppose there might be an unsupported way to do this already by importing ipcluster somehow.

Anyway, I think that would be useful for certain problems.

Yielding results instead of returning

Hello,

We are trying to make it possible to yield results from ipyparallel. This would make it possible to 'stream' the results from a notebook, instead of returning one result.

The python code would look like this:

for i in range(10):
   do_yield(i)

Some of our notebooks run for a long time and have very big results. Streaming the results would make that easier.

Would the current architecture of ipyparallel/jupyter allow this?

Marco

Tasks DB not cleaned up after `ipcluster stop`

Discovered after the database swelled up to 2/3 of TB after many runs. It would be nice if this was cleaned up on cluster shutdown. If that is intended behavior, then it appears not to be working.

Segmentation fault from ipcontroller

I'm getting a Segmentation fault when launching ipcontroller:

ipcontroller --ip=192.168.10.1
2016-03-04 17:41:14.897 [IPControllerApp] Hub listening on tcp://192.168.10.1:43318 for registration.
2016-03-04 17:41:14.900 [IPControllerApp] Hub using DB backend: 'DictDB'
2016-03-04 17:41:15.153 [IPControllerApp] hub::created hub
2016-03-04 17:41:15.158 [IPControllerApp] writing connection info to /home/ubuntu/.ipython/profile_default/security/ipcontroller-client.json
2016-03-04 17:41:15.159 [IPControllerApp] writing connection info to /home/ubuntu/.ipython/profile_default/security/ipcontroller-engine.json
2016-03-04 17:41:15.161 [IPControllerApp] task::using Python leastload Task scheduler
2016-03-04 17:41:15.162 [IPControllerApp] Heartmonitor started
2016-03-04 17:41:15.171 [IPControllerApp] Creating pid file: /home/ubuntu/.ipython/profile_default/pid/ipcontroller.pid
2016-03-04 17:41:15.180 [scheduler] 
Segmentation fault (core dumped)

Here is the output from /var/log/syslog:

Mar  4 17:50:03 clustercontroller2 kernel: [176816.900730] ipcontroller[5775]: segfault at 90b517b990 ip 00007f90b861044d sp 00007ffc18343ee0 error 6 in socket.so[7f90b8603000+17000]
Mar  4 17:50:05 clustercontroller2 kernel: [176819.884944] ipcontroller[5759]: segfault at 90a71135a8 ip 00007f90b861044d sp 00007ffc18344490 error 6 in socket.so[7f90b8603000+17000]

OS: Ubuntu 14.04
Python 2.7.11 :: Anaconda 2.4.1 (64-bit)
ipyparallel: 5.0.1

Question: possible to refresh asyncResult for new client without holding msg_ids?

I'm using a client that connects to the hub via ssh. That ssh connection sometimes times out.
Right now, to reconnect, I start a new client, hold a list of all jobs' msg_ids, and rebuild the asyncresults by querying the hub database by msg_id.

Is it possible to do the same thing without writing code to keep track of all the msg_ids and refresh each async result? This is not that hard when kept in a single list, but it's a pain when nested lists of jobs are needed, or jobs need restarting (and thus msg_ids change).

Is there a way to reconnect an existing client?
Or a way to tell an async result to associate itself to a new client?

Thanks.

Strategies for profiling functions in ipyparallel

I have a function that is going slow and I would like to figure out what is holding it up. I have profiled the underlying non-parallel function and it goes about as fast as it can on data of a size which it can handle. However, it really needs to work on blocks of data in parallel to get done in a reasonable amount of time and to work on data of any significance. So, I really need to profile it while it is running in parallel with ipyparallel. Are there any suggested ways to go about this? Is it possible to use some existing tool like line_profiler? If not, what other tools might be look at? Does ipyparallel have any tricks for doing this sort of thing?

Obsolete description of @require in the docs.

See https://ipyparallel.readthedocs.org/en/latest/task.html#require

It states that @require can only be used to import modules, which isn't the case since ipython/ipython#3041

Stop cluster command under Python 3 for ssh

Hi, I have been trying to get ipyparrallel to work and I get the following exception:

[E 2015-10-23 11:18:50.408 john ioloop:612] Exception in callback functools.partial(.null_wrapper at 0x7f6e3c0ba730>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.4/site-packages/tornado/ioloop.py", line 592, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.4/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(args, *kwargs)
File "/opt/conda/lib/python3.4/site-packages/ipyparallel/apps/launcher.py", line 297, in 
self.killer = self.loop.add_timeout(self.loop.time() + delay, lambda : self.signal(SIGKILL))
File "/opt/conda/lib/python3.4/site-packages/ipyparallel/apps/launcher.py", line 638, in signal
self.process.stdin.write('~.')
TypeError: 'str' does not support the buffer interface

I got this when clicking "Stop" on my running cluster.
I am running a ssh cluster (controller and enginees over ssh).
I think the problem, having looked at stack overflow, is that the write syntax is different in python 3 than 2 and this in within my running kernel.
Can we change the line in ipyparallel/apps/launcher.py from:
self.process.stdin.write("")
to
write(bytearray(".", "ascii"))

PBS template issue, no such file or directory

I encounter something very similar that of issue #37
I follow the guide here http://ipyparallel.readthedocs.org/en/stable/process.html#using-ipcluster-in-pbs-mode to setup ipython cluster in PBS mode

However, I could not start it and received the following error message

user@pc:~> ipcluster start --profile=pbs -n 1
2015-12-17 19:59:46.950 [IPClusterStart] Starting ipcluster with [daemon=False]
2015-12-17 19:59:46.951 [IPClusterStart] Creating pid file: /home/user/.ipython/profile_pbs/pid/ipcluster.pid
2015-12-17 19:59:46.952 [IPClusterStart] Starting Controller with PBS
2015-12-17 19:59:46.965 [IPClusterStart] ERROR | Controller start failed
Traceback (most recent call last):
  File "/myapps/anaconda/lib/python2.7/site-packages/ipyparallel/apps/ipclusterapp.py", line 503, in start_controller
    self.controller_launcher.start()
  File "/myapps/anaconda/lib/python2.7/site-packages/ipyparallel/apps/launcher.py", line 1203, in start
    return super(PBSControllerLauncher, self).start(1)
  File "/myapps/anaconda/lib/python2.7/site-packages/ipyparallel/apps/launcher.py", line 1151, in start
    output = check_output(self.args, env=os.environ)
  File "/myapps/anaconda/lib/python2.7/subprocess.py", line 566, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/myapps/anaconda/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/myapps/anaconda/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
ERROR:tornado.application:Exception in callback <functools.partial object at 0x7fffeeafc788>
Traceback (most recent call last):
  File "/myapps/anaconda/lib/python2.7/site-packages/tornado/ioloop.py", line 600, in _run_callback
    ret = callback()
  File "/myapps/anaconda/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/myapps/anaconda/lib/python2.7/site-packages/ipyparallel/apps/ipclusterapp.py", line 548, in start
    self.start_controller()
  File "/myapps/anaconda/lib/python2.7/site-packages/ipyparallel/apps/ipclusterapp.py", line 503, in start_controller
    self.controller_launcher.start()
  File "/myapps/anaconda/lib/python2.7/site-packages/ipyparallel/apps/launcher.py", line 1203, in start
    return super(PBSControllerLauncher, self).start(1)
  File "/myapps/anaconda/lib/python2.7/site-packages/ipyparallel/apps/launcher.py", line 1151, in start
    output = check_output(self.args, env=os.environ)
  File "/myapps/anaconda/lib/python2.7/subprocess.py", line 566, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/myapps/anaconda/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/myapps/anaconda/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

I'm sure that it reads my PBS template file correctly (it prompted no such file error along with wrong path when I troubleshoot with some fake path)

Here is my PBS template file for both ipengine and ipcontroller

ipengine template

#PBS -V
#PBS -l walltime=8:00:00
#PBS -l nodes={n}:ppn=8
#PBS -j oe
#PBS -N ipcluster
#PBS -q myqueuename

cd $PBS_O_WORKDIR
mpiexec -n {n} ipengine --profile-dir={profile_dir}

ipcontroller template

#PBS -V
#PBS -l walltime=8:00:00
#PBS -l nodes={n}:ppn=8
#PBS -j oe
#PBS -N ipcluster
#PBS -q myqueuename

cd $PBS_O_WORKDIR
ipcontroller --profile-dir={profile_dir}

Ipython version 4.0.1
Ipyparallel version 4.1.0
Python 2.7.11

SSH Engine dict configuration not fully supported

Current version fails to start when configuring SSH Engines using dicts.

For example:

#------------------------------------------------------------------------------
# SSHEngineLauncher configuration
#------------------------------------------------------------------------------
c.SSHEngineSetLauncher.engines = {
    'delfoslab02': {
        'n': 8,
        'engine_cmd': '/home/user/bin/ipengine',
    }
}

Produces the following output:

$ ipcluster start
2015-10-01 15:16:19.313 [IPClusterStart] Starting ipcluster with [daemon=False]
2015-10-01 15:16:19.314 [IPClusterStart] Creating pid file: /home/user/.ipython/profile_default/pid/ipcluster.pid
2015-10-01 15:16:19.314 [IPClusterStart] Starting Controller with LocalControllerLauncher
ERROR:tornado.application:Exception in callback <functools.partial object at 0x7f866ebc8e10>
Traceback (most recent call last):
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/tornado/ioloop.py", line 592, in _run_callback
    ret = callback()
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/ipyparallel/apps/ipclusterapp.py", line 329, in start_engines
    n = getattr(self.engine_launcher, 'engine_count', self.n)
  File "/home/user/.virtualenvs/ipyparallel/local/lib/python2.7/site-packages/ipyparallel/apps/launcher.py", line 745, in engine_count
    count += n
TypeError: unsupported operand type(s) for +=: 'int' and 'dict'

Apparently, it attempts to sum the whole dict into engine count ariable instead of its 'n' value.

Not able to install IPython Clusters tab in Jupyter Notebook

I Followed the readme but the Clusters tab it is not showing up and when I launch the notebook I get:

[W 16:30:03.356 NotebookApp] Collisions detected in jupyter_notebook_config.py and jupyter_notebook_config.json config files. jupyter_notebook_config.json has higher priority: { "NotebookApp": { "server_extensions": "<traitlets.config.loader.LazyConfigValue object at 0x7f3dab33beb8> ignored, using ['nbextensions']" }

I am on linux mint 17.2 python 3 conda and already installed nbextension, any idea?

Logging messages from engines

I am looking for a good way to log messages from the running engines. I noticed iopubwatcher.py and have used something similar, but I was looking for something more robust. Currently, I start the watcher and then launch my code in a separate process and it is starting to get a little messy.

ipcluster disappeared from 4.0.1

Hello,

I've pip upgraded ipyparallel to 4.0.1 using python3 on ubuntu 14.04, and ipcluster command was not there anymore.

To be sure, I've pip installed back ipyparallel 4.0.0, and ipcluster command came back.

Can you fix this? It makes my system unusable...

Thanks a lot and best regards.

Seeing `get_default_value` is deprecated when running `ipcontroller`

Here is the log I am seeing in stderr for ipcontroller.

2015-10-21 15:39:23.944 [IPControllerApp] Using existing profile dir: u'/root/.ipython/profile_sge'
/opt/conda/lib/python2.7/site-packages/ipyparallel/controller/hub.py:261: UserWarning: get_default_value is deprecated: use the .default_value attribute
  scheme = TaskScheduler.scheme_name.get_default_value()
/opt/conda/lib/python2.7/site-packages/ipyparallel/apps/ipcontrollerapp.py:406: UserWarning: get_default_value is deprecated: use the .default_value attribute
  scheme = TaskScheduler.scheme_name.get_default_value()
2015-10-21 15:39:24.225 [scheduler] Scheduler started [leastload]

Seems there are a couple of places this is in use (

ipyparallel/ipyparallel/apps/ipcontrollerapp.py

Line 406 in 9b364b0

scheme = TaskScheduler.scheme_name.get_default_value()

) (

ipyparallel/ipyparallel/controller/hub.py

Line 261 in 9b364b0

scheme = TaskScheduler.scheme_name.get_default_value()

Related to #35

command-line flag for --ip=*

The simplest way I've found to connect to a multi-node HPC setup has been to use qsub to connect to multiple compute nodes, and then to use ipcluster start with --Engines=MPI to talk to all the nodes, like below:

## from Login node connect interactively (or not) to 3 nodes with 8 cpus each
qsub -I -q {queue_name} -l nodes=3:ppn=8

## ... wait for connections to establish

## Now on the compute node run ipcluster with MPI setup
ipcluster start --n 24 --Engines=MPI --daemon

## run ipyparallel script which shows I'm connected to multiple hostnames
...

However, following the documentation I found I also had to create a new IPython profile and manually edit the ipcluster_config.py file setting c.HubFactory.ip = '*' to get the client to view all nodes.

Since that is the only variable that needs to be edited by hand in order to setup an MPI connection across multiple nodes on an HPC it seems it would really handy if it could just be set with a flag from ipcluster, like below:

ipcluster start --n 24 --Engines=MPI --ip=* --daemon

As a separate question: I also tried using the PBS setup through ipcluster, but I'm not sure I understand the intended workflow for that mode. It seems that once the templates are setup and submitted ipcluster will be connected to multiple nodes/engines, but the user will still be left hanging out on the Login node, which doesn't seem ideal. Should a user generally login to a compute node, and then submit ipcontroller to another compute node, and ipengines to multiple other nodes. That setup seems like it would always require more requests, and thus likely more wait times. Perhaps a more detailed example of the intended use of the PBS mode would make it more clear. Thanks!

Windows ssh tunnel processes do not exit on Client close

On windows, when using ssh, the Client should hold a reference to the tunnels created and correctly close/end the processes when the client closes. Currently they hang around forever in the case of a crash/unclean shutdown.

Here:

ipyparallel/ipyparallel/client/client.py

Line 455 in 57af67f

 tunnel.tunnel_connection(self._query_socket, cfg['registration'], sshserver, **ssh_kwargs) 

and

ipyparallel/ipyparallel/client/client.py

Line 598 in 57af67f

connect_socket(self._mux_socket, cfg['mux'])

+ following similar connections.

Clusters tab

I'm confused as to how to enable the clusters tab in the latest development version

Is the syntax

c.NotebookApp.jupyter_server_extensions.append('ipyparallel.nbextension')

as per https://github.com/ipython/ipyparallel/blob/master/README.md or

c.NotebookApp.server_extensions.append('ipyparallel.nbextension')

as suggested in #11?

Neither of the options works for me. I just get the link to the repository. I also can't find any hint in the debug messages as to what is going on. Is this a bug or am I missing something?

$ipython notebook --debug
[D 17:26:44.196 NotebookApp] Config changed:
[D 17:26:44.196 NotebookApp] {'NotebookApp': {'log_level': 10}}
...
[D 17:26:44.199 NotebookApp] Loaded config file: /Users/rein/.jupyter/jupyter_notebook_config.py
[D 17:26:44.200 NotebookApp] Config changed:
[D 17:26:44.200 NotebookApp] {'NotebookApp': {'open_browser': False, 'server_extensions': ['ipyparallel.nbextension'], 'log_level': 10, 'ip': '*', 'certfile': u'/Users/rein/git/rebound/mycert.pem', 'password': u'sha1:****:*****', 'port': 9999}}
[D 17:26:44.200 NotebookApp] Attempting to load config file jupyter_notebook_config.py in path /Users/rein/ipython
[D 17:26:44.200 NotebookApp] Attempting to load config file jupyter_notebook_config.json in path /Users/rein/ipython
[D 17:26:44.226 NotebookApp] Found kernel python2 in /Users/rein/Library/Jupyter/kernels
[I 17:26:44.244 NotebookApp] Serving notebooks from local directory: /Users/rein/ipython
[I 17:26:44.244 NotebookApp] 0 active kernels 
[I 17:26:44.244 NotebookApp] The IPython Notebook is running at: https://[all ip addresses on your system]:9999/
[I 17:26:44.244 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

PBS job templates not picked up?

I'm using ipyparallel 4.0.2 and friends (with Python 3.4) installed in a virtualenv with pip. I'm trying to get a set of engines+controller running on a system with the Torque batch system, i.e. PBS. I've followed the documentation at https://ipyparallel.readthedocs.org/en/stable/process.html#using-ipcluster-in-pbs-mode to get the configuration set up.

When I launch with "ipcluster start --profile=1engine-per-node -n 2" my job templates are not being used (the actual job name is different from what I specify in the templates with -N and the walltime is incorrect).

The steps I've done:

ipython profile create --parallel --profile=1engine-per-node
Edit in ~/.ipython/profile_1engine-per-node
a) ipcluster_config.py:

c.IPClusterEngines.engine_launcher_class = 'PBS'
c.IPClusterStart.controller_launcher_class = 'PBS'
c.IPClusterEngines.n = 4
c.PBSControllerLauncher.batch_file_name = 'controller.template'
c.PBSEngineSetLauncher.batch_file_name = 'engine.template'

b) ipcontroller_config.py:

c.HubFactory.ip = '*'

Added templates:
a) .ipython/profile_1engine-per-node/controller.template
#PBS -N ipyparallel-controller
#PBS -j oe
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=4

cd $PBS_O_WORKDIR

source $HOME/pyenv/3.4/bin/activate

ipcontroller --profile-dir={profile_dir}

b) .ipython/profile_1engine-per-node/engine.template

PBS -N ipyparallel-engine

PBS -j oe

PBS -l walltime=01:00:00

PBS -l nodes={n}:ppn=1

cd $PBS_O_WORKDIR

source $HOME/pyenv/3.4/bin/activate

module load openmpi/gnu/1.6.5

which mpiexec -n {n} ipengine --profile-dir={profile_dir}

Btw, are the profiles that get generated with "profile create" written based on inspecting Python classes or something? I generated a new dummy parallel profile so I could compare what I had changed in my 1engine-per-node profile, but a diff show vastly different order of the config items, making direct comparison hard.

Monte Carlo Options.ipynb has non-python3 compatible syntax

print "Strike prices: ", strike_vals
print "Volatilities: ", sigma_vals

and

%timeit -n1 -r1 print price_option(S=100.0, K=100.0, sigma=0.25, r=0.05, days=260, paths=10000)

memory overflow while parallelizing

Hi all,
I am parallelizing a linear regression on an azure cluster (16 cores, 110 gb ram) and I am constantly running in memory overflow problems (which should not be the case, the data is of size 1500 x 500, thus rather small).
I am running my code via a jupyter notebook on python 2.7.
Is this a known issue? I tried to search for similar problems but could not find any.
Thank you in advance.

import ipyparallel as ipp
from functools import partial

def regression(index, model, res, type_run):
    import numpy as np
    from sklearn import linear_model
    from numpy import exp, log
    # do a linear regression with certain prameters
    pass


def loop_regressions(model, res, type_run):
    """
    Function that runs over the different cv loops
    :param model:
    :param res: data input
    :param type_run:
    :return:
    """
    print('################# start loop %s #################' % model)
    max_index = 25
    c = ipp.Client()
    print c.ids

    c[:].push(dict(regression=regression))

    if True:
        results = c[:].map_sync(partial(regression, model=model, res=res, type_run=type_run,  range(max_index))

    return(results)
a = loop_regression("LinearRegression", res, "benchmark")


---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-8-ddf8efec8c4e> in <module>()
     27     #print('################# end loop CVs %s ###################' % model)
     28     return(results)
---> 29 a =loop_regression("LinearRegression", res, "benchmark")
     30 print a

<ipython-input-8-ddf8efec8c4e> in loop_cv_approaches(model, res, burnin, type_run, cv_selector)
     19     if True:
     20        
---> 21         results = c[:].map_sync(partial(regression, model=model, res=res,  type_run=type_run), range(max_index))
     22        
     23 

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in map_sync(self, f, *sequences, **kwargs)
    351             raise TypeError("map_sync doesn't take a `block` keyword argument.")
    352         kwargs['block'] = True
--> 353         return self.map(f,*sequences,**kwargs)
    354 
    355     def imap(self, f, *sequences, **kwargs):

<decorator-gen-142> in map(self, f, *sequences, **kwargs)

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in sync_results(f, self, *args, **kwargs)
     52     self._in_sync_results = True
     53     try:
---> 54         ret = f(self, *args, **kwargs)
     55     finally:
     56         self._in_sync_results = False

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in map(self, f, *sequences, **kwargs)
    616         assert len(sequences) > 0, "must have some sequences to map onto!"
    617         pf = ParallelFunction(self, f, block=block, **kwargs)
--> 618         return pf.map(*sequences)
    619 
    620     @sync_results

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/remotefunction.pyc in map(self, *sequences)
    266         self._mapping = True
    267         try:
--> 268             ret = self(*sequences)
    269         finally:
    270             self._mapping = False

<decorator-gen-129> in __call__(self, *sequences)

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/remotefunction.pyc in sync_view_results(f, self, *args, **kwargs)
     73     view = self.view
     74     if view._in_sync_results:
---> 75         return f(self, *args, **kwargs)
     76     view._in_sync_results = True
     77     try:

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/remotefunction.pyc in __call__(self, *sequences)
    238             view = self.view if balanced else client[t]
    239             with view.temp_flags(block=False, **self.flags):
--> 240                 ar = view.apply(f, *args)
    241 
    242             msg_ids.extend(ar.msg_ids)

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in apply(self, f, *args, **kwargs)
    218         ``f(*args, **kwargs)``.
    219         """
--> 220         return self._really_apply(f, args, kwargs)
    221 
    222     def apply_async(self, f, *args, **kwargs):

<decorator-gen-141> in _really_apply(self, f, args, kwargs, targets, block, track)

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in sync_results(f, self, *args, **kwargs)
     52     self._in_sync_results = True
     53     try:
---> 54         ret = f(self, *args, **kwargs)
     55     finally:
     56         self._in_sync_results = False

<decorator-gen-140> in _really_apply(self, f, args, kwargs, targets, block, track)

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in save_ids(f, self, *args, **kwargs)
     37     n_previous = len(self.client.history)
     38     try:
---> 39         ret = f(self, *args, **kwargs)
     40     finally:
     41         nmsgs = len(self.client.history) - n_previous

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/view.pyc in _really_apply(self, f, args, kwargs, targets, block, track)
    557         for ident in _idents:
    558             msg = self.client.send_apply_request(self._socket, f, args, kwargs, track=track,
--> 559                                     ident=ident)
    560             if track:
    561                 trackers.append(msg['tracker'])

/usr/local/lib/python2.7/dist-packages/ipyparallel/client/client.pyc in send_apply_request(self, socket, f, args, kwargs, metadata, track, ident)
   1261         bufs = serialize.pack_apply_message(f, args, kwargs,
   1262             buffer_threshold=self.session.buffer_threshold,
-> 1263             item_threshold=self.session.item_threshold,
   1264         )
   1265 

/usr/local/lib/python2.7/dist-packages/ipykernel/serialize.pyc in pack_apply_message(f, args, kwargs, buffer_threshold, item_threshold)
    142 
    143     arg_bufs = list(chain.from_iterable(
--> 144         serialize_object(arg, buffer_threshold, item_threshold) for arg in args))
    145 
    146     kw_keys = sorted(kwargs.keys())

/usr/local/lib/python2.7/dist-packages/ipykernel/serialize.pyc in <genexpr>((arg,))
    142 
    143     arg_bufs = list(chain.from_iterable(
--> 144         serialize_object(arg, buffer_threshold, item_threshold) for arg in args))
    145 
    146     kw_keys = sorted(kwargs.keys())

/usr/local/lib/python2.7/dist-packages/ipykernel/serialize.pyc in serialize_object(obj, buffer_threshold, item_threshold)
     88         buffers.extend(_extract_buffers(cobj, buffer_threshold))
     89 
---> 90     buffers.insert(0, pickle.dumps(cobj, PICKLE_PROTOCOL))
     91     return buffers
     92 

MemoryError:

AsyncResult._wait_for_output is too heavy on the CPU

When I launch several independent jobs and wait() for them in a PBS cluster, each of them consumes around 60% of the CPU for just waiting. This makes it really inconvenient to launch several jobs from a shared main node that many people are using. I notice that the sleep timeout is just 0.01 seconds which seems very short to me (especially if I expect the job to take hours).

The fanciest would be a sleeping time that increases over time itself (up to some max value).

'CannedArray' object has no attribute 'pickled'

Have seen this once in code that normally doesn't produce this. Using Jupyter/iPython cluster. Related question ( http://stackoverflow.com/q/27596463 ). Related issue opened ( uqfoundation/dill#134 ).

jupyterhub not seeing running ipclusters

I have Jupyterhub 0.2.0 and ipyparallel 4.0.2 installed and working and can successfully start an ipcluster using the "ipython clusters" tab of the web interface. I've confirmed that the engines are running by creating a basic notebook that prints out the id's of the running engines:

from ipyparallel import Client
c = Client(profile='default')
c.ids

The notebook shows the ids correctly. The problem is, when I stop the server from the control panel, start it back up again, and I click over to the "ipython clusters" tab, it says there are none running. However, the engines are still running in my process tree when I do ps aux|grep ipyparallel. When I re-run the notebook it still sees the engines and prints out the ids.

Is this a known issue? I plan on using this in a cluster-computing environment using a scheduling system, so users will need to see if they have ipclusters running. As of now, users will be checking the tab, maybe seeing erroneous information and starting up potentially a whole bunch of different clusters. This would create a bit of a management nightmare. Any thoughts?

Suspecting memory leak

Hello,

I'm using ipyparalle 4.1.0, and I'm under the impression that the engine is suffering of memory leaks.

Here's an example code:

#!/usr/bin/env python3

import ipyparallel

client = ipyparallel.Client()
client.direct_view().use_dill()
balancer = client.load_balanced_view()
balancer.block = True

def myMap(i):
    from numpy.random import rand as nprand
    return nprand(100000000)

def myReduce(mappedData):
    return [x.mean() for x in mappedData]

results = myReduce(balancer.map(myMap, range(1, 10)))

The listings below are showing ps result of: pid,vsize,rss,size,%mem,args, before and after the execution. You can see that the used memory is increased.

Please note that those listings were really produced from the shell, when the top script was not running.

**************
* Before run *
**************
24180 517892 36724 322056  0.1 /usr/bin/python3 -m ipyparallel.controller --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24193 517636 29696 321800  0.0 /usr/bin/python3 -m ipyparallel.controller --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24194 517636 29696 321800  0.0 /usr/bin/python3 -m ipyparallel.controller --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24197 517636 29700 321800  0.0 /usr/bin/python3 -m ipyparallel.controller --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24198 517892 30452 322056  0.0 /usr/bin/python3 -m ipyparallel.controller --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24205 371392 28132 258928  0.0 /usr/bin/python3 -m ipyparallel.engine --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24206 371364 28128 258900  0.0 /usr/bin/python3 -m ipyparallel.engine --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24209 371364 28132 258900  0.0 /usr/bin/python3 -m ipyparallel.engine --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24219 371360 28128 258896  0.0 /usr/bin/python3 -m ipyparallel.engine --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20

*************
* After run *
*************
24180 517892 36860 322056  0.1 /usr/bin/python3 -m ipyparallel.controller --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24193 517636 29696 321800  0.0 /usr/bin/python3 -m ipyparallel.controller --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24194 517636 29696 321800  0.0 /usr/bin/python3 -m ipyparallel.controller --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24197 517636 29700 321800  0.0 /usr/bin/python3 -m ipyparallel.controller --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24198 657292 31224 461456  0.0 /usr/bin/python3 -m ipyparallel.controller --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24205 867612 38604 667376  0.1 /usr/bin/python3 -m ipyparallel.engine --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24206 867612 38420 667376  0.1 /usr/bin/python3 -m ipyparallel.engine --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24209 867608 38436 667372  0.1 /usr/bin/python3 -m ipyparallel.engine --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20
24219 867612 38600 667376  0.1 /usr/bin/python3 -m ipyparallel.engine --profile-dir /home/foo/bar/.ipython/profile_default --cluster-id  --log-to-file --log-level=20

I have the feeling that on a single thread, the momory is freed up at starting of the first iteration.

It is a problem for me because the clusters are in fact always running, and the map process are usually consuming a lot of memory (up to several gigs). Since I have 4 threads, the memory of 4 iterations is kept unless I restart the cluster.

Thanks a lot for your help.

Best regards.

Support port specification in SSHLauncher engines config dict

SSHLauncher is aware of the user and host parts of the SSH connection string ([user@]hostname[;port]) in the engines configuration dict, but it does not handle the port block.

Error during data passing in cluster. TypeError: Read-only buffer does not support the buffer interface

I've updated ipython (and ipython cluster).

When I submit some job to cluster, and any of arguments has large size, this results in failure:

def test_client(x):
    return x

from ipyparallel import Client
import numpy
x = Client(profile='ssh-ipy').load_balanced_view()
x.map_sync(test_client, [numpy.ones(100)])

>> [array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])]

x.map_sync(test_client, [numpy.ones(10000)])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-1dc792795519> in <module>()
      3 x = Client(profile='ssh-ipy').load_balanced_view()
      4 print x.map_sync(test_client, [numpy.ones(100)])
----> 5 print x.map_sync(test_client, [numpy.ones(10000)])

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in map_sync(self, f, *sequences, **kwargs)
    351             raise TypeError("map_sync doesn't take a `block` keyword argument.")
    352         kwargs['block'] = True
--> 353         return self.map(f,*sequences,**kwargs)
    354 
    355     def imap(self, f, *sequences, **kwargs):

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in map(self, f, *sequences, **kwargs)

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in sync_results(f, self, *args, **kwargs)
     52     self._in_sync_results = True
     53     try:
---> 54         ret = f(self, *args, **kwargs)
     55     finally:
     56         self._in_sync_results = False

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in map(self, f, *sequences, **kwargs)

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in save_ids(f, self, *args, **kwargs)
     37     n_previous = len(self.client.history)
     38     try:
---> 39         ret = f(self, *args, **kwargs)
     40     finally:
     41         nmsgs = len(self.client.history) - n_previous

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in map(self, f, *sequences, **kwargs)
   1117 
   1118         pf = ParallelFunction(self, f, block=block, chunksize=chunksize, ordered=ordered)
-> 1119         return pf.map(*sequences)
   1120 
   1121 __all__ = ['LoadBalancedView', 'DirectView']

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/remotefunction.pyc in map(self, *sequences)
    266         self._mapping = True
    267         try:
--> 268             ret = self(*sequences)
    269         finally:
    270             self._mapping = False

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/remotefunction.pyc in __call__(self, *sequences)

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/remotefunction.pyc in sync_view_results(f, self, *args, **kwargs)
     73     view = self.view
     74     if view._in_sync_results:
---> 75         return f(self, *args, **kwargs)
     76     view._in_sync_results = True
     77     try:

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/remotefunction.pyc in __call__(self, *sequences)
    238             view = self.view if balanced else client[t]
    239             with view.temp_flags(block=False, **self.flags):
--> 240                 ar = view.apply(f, *args)
    241 
    242             msg_ids.extend(ar.msg_ids)

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in apply(self, f, *args, **kwargs)
    218         ``f(*args, **kwargs)``.
    219         """
--> 220         return self._really_apply(f, args, kwargs)
    221 
    222     def apply_async(self, f, *args, **kwargs):

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in _really_apply(self, f, args, kwargs, block, track, after, follow, timeout, targets, retries)

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in sync_results(f, self, *args, **kwargs)
     49     """sync relevant results from self.client to our results attribute."""
     50     if self._in_sync_results:
---> 51         return f(self, *args, **kwargs)
     52     self._in_sync_results = True
     53     try:

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in _really_apply(self, f, args, kwargs, block, track, after, follow, timeout, targets, retries)

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in save_ids(f, self, *args, **kwargs)
     37     n_previous = len(self.client.history)
     38     try:
---> 39         ret = f(self, *args, **kwargs)
     40     finally:
     41         nmsgs = len(self.client.history) - n_previous

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/view.pyc in _really_apply(self, f, args, kwargs, block, track, after, follow, timeout, targets, retries)
   1044 
   1045         msg = self.client.send_apply_request(self._socket, f, args, kwargs, track=track,
-> 1046                                 metadata=metadata)
   1047         tracker = None if track is False else msg['tracker']
   1048 

/moosefs/ipython_env/local/lib/python2.7/site-packages/ipyparallel/client/client.pyc in send_apply_request(self, socket, f, args, kwargs, metadata, track, ident)
   1243 
   1244         msg = self.session.send(socket, "apply_request", buffers=bufs, ident=ident,
-> 1245                             metadata=metadata, track=track)
   1246 
   1247         msg_id = msg['header']['msg_id']

/moosefs/ipython_env/local/lib/python2.7/site-packages/jupyter_client/session.pyc in send(self, stream, msg_or_type, content, parent, ident, buffers, track, header, metadata)
    670         if buffers and track and not copy:
    671             # only really track when we are doing zero-copy buffers
--> 672             tracker = stream.send_multipart(to_send, copy=False, track=True)
    673         else:
    674             # use dummy tracker, which will be done immediately

/moosefs/ipython_env/local/lib/python2.7/site-packages/zmq/sugar/socket.pyc in send_multipart(self, msg_parts, flags, copy, track)
    324                 raise TypeError(
    325                     "Frame %i (%s) does not support the buffer interface." % (
--> 326                     i, rmsg,
    327                 ))
    328         for msg in msg_parts[:-1]:

TypeError: Frame 10 (<read-only buffer for 0x1ea31c0,...) does not support the buffer interface.

Configuration:

ipykernel==4.1.1
ipyparallel==4.0.2 
ipython==4.0.0
ipython-genutils==0.1.0
jupyter-client==4.1.1
jupyter-core==4.0.6
numpy==1.10.1
pyzmq==14.7.0

The same result with the latest ipyparallel.

display_outputs should work even when tasks failed

It should display, not raise, the errors. Currently it calls .get(), which wraps and raises errors, preventing the display of logging/print/display output of failed tasks.

Missing dependencies

Appears this is dependent on a few things that aren't explicitly stated, but the requirements and version constraints are unclear as they are not listed in setup.py.

notebook ipyparallel/nbextension/handlers.py#L12-L14
nose ipyparallel/tests/init.py#L12

If these are not hard requirements, maybe they should be locally imported or conditionally imported so as to avoid issues when importing things from ipyparallel. Admittedly, not importing those modules seems to avoid the issue presented by them.

ipyparallel.nbextension doesn't work in Jupyterhub

I've added c.NotebookApp.server_extensions.append('ipyparallel.nbextension') to the jupyterhub_config.py but the cluster tap doesn't show up.

Tried latest master version of Jupyterhub and ipyparallel.

EDIT:

I've noticed that it does work when I add the line in /home/bnijholt/.jupyter/jupyter_notebook_config.py, but the tab shows up without any options when I add it in /etc/ipython/ipython_config.py.

EDIT 2:

I fixed it by making a config in /etc/jupyter/jupyter_notebook_config.py.

There will be others with similar problems that have defined kernels and engines as in http://stackoverflow.com/questions/29773954/change-ipython-3-for-python-3-kernel-to-python2-for-the-cluster-too/29798906#29798906

Deprecation warnings and assertion failures with ipcontroller

Hi,

I've been using ipcontroller and ipengine commands separately to start up an IPython cluster. However, upon updating IPython to 4.0, I'm now getting some errors and warnings printed in the console:

C:\Users\sam.lishak>ipcontroller --cluster-id=cluster
c:\python27\lib\site-packages\ipyparallel\controller\hub.py:261: UserWarning: get_default_value is deprecated: use the .default_value attribute
  scheme = TaskScheduler.scheme_name.get_default_value()
c:\python27\lib\site-packages\ipyparallel\apps\ipcontrollerapp.py:406: UserWarning: get_default_value is deprecated: use the .default_value attribute
  scheme = TaskScheduler.scheme_name.get_default_value()
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\python27\lib\multiprocessing\forking.py", line 380, in main
    prepare(preparation_data)
  File "c:\python27\lib\multiprocessing\forking.py", line 488, in prepare
    assert main_name not in sys.modules, main_name
AssertionError: __main__
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\python27\lib\multiprocessing\forking.py", line 380, in main
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\python27\lib\multiprocessing\forking.py", line 380, in main
    prepare(preparation_data)
  File "c:\python27\lib\multiprocessing\forking.py", line 488, in prepare
    prepare(preparation_data)
  File "c:\python27\lib\multiprocessing\forking.py", line 488, in prepare
    assert main_name not in sys.modules, main_name
AssertionError: __main__
    assert main_name not in sys.modules, main_name
Traceback (most recent call last):
AssertionError  File "<string>", line 1, in <module>
: __main__
  File "c:\python27\lib\multiprocessing\forking.py", line 380, in main
    prepare(preparation_data)
  File "c:\python27\lib\multiprocessing\forking.py", line 488, in prepare
    assert main_name not in sys.modules, main_name
AssertionError: __main__

I don't get the AssertionErrors when I also add --usethreads to the command. I don't get any errors or warnings when using ipcluster.

I'm using Windows 7 with Python 2.7.9 (32-bit) and IPython 4.0/ipyparallel 4.0.2.

Cheers,

Sam

IPython Cluster (SGE) Registration Timeouts

Moved from here ( ipython/ipython#8569 ).

I am trying to debug a situation where I am running into sporadic registration timeouts on the engine. In this Gist, I have included relevant config files and sample output ( https://gist.github.com/jakirkham/b0452178331db511dd0d ). All other config files were simply the result of running ipython profile create --parallel --profile=sge.
To provide more information, this is on a CentOS 6.6 VM on a single machine; as such, there is no need to worry about accessibility between the jobs. The queue has 7 jobs in it in this case and has been configured to limit the number of running jobs based on the number of accessible cores. However, I have run into the same problem with less than 7 running jobs, as well. All of the jobs are able to start and run successfully. I don't believe this to be a resource issue as I have ran heavy duty machine learning algorithms in the VM without error repeatably.

As it is sporadic, I am wondering if timing differences between engines communicating to the controller could be causing the problem. For example, all the engines slam the controller at the same time leaving the controller unable to respond and this happens up to the timeout limit. Unfortunately, I have had trouble finding more information about parameters that could institute delays between engine queries or similar to test this hypothesis.

Any pointers would be appreciated?

concurrent.futures compatibility (IPEP 19)

@aarchiba opened ipython/ipython#8893

Since concurrent.futures is standard in python >=3.4 and backported to python 2.7, it is a good way to write portable parallel code. Algorithms that support parallelism can take a pool argument and work with whatever form of parallelism the user chooses - except not IPython parallelism, right now. It would be valuable to add an Executor/Future compatibility layer.

ipcluster

Hi,

I am trying to get ipcluster to work on my universities cluster (which uses torque as its batch system). The issue I am facing is that since the startup of the ipengines is done in array job they do not necessarily start at the same time (this means that I have a lot of idle time waiting for all the engines to come up). Has anybody got any ideas about how to mitigate this problem - it would be ideal if the job waits in the queue until all resources are available and then starts the engines and the controller rather than only starting a few engines and then waiting for more resources...

Also no matter what I change in my controller and engine templates I get a walltime of 2hrs - I am currently using qalter to change the walltime after submission which is a pain...

ipython / ipyparallel Goto Github PK

ipyparallel's Introduction

Interactive Parallel Computing with IPython

Install

Run

ipyparallel's People

Contributors

Stargazers

Watchers

Forkers

ipyparallel's Issues

ipengine template

ipcontroller template

PBS -N ipyparallel-engine

PBS -j oe

PBS -l walltime=01:00:00

PBS -l nodes={n}:ppn=1

Recommend Projects

Recommend Topics

Recommend Org

Jobs