GithubHelp home page GithubHelp logo

xun's Introduction

Xùn

https://en.wikipedia.org/wiki/Bagua


equinor

Xun is a distributed and functional Python framework for cluster compute. Rather than focusing on batching jobs, xun is about defining values declaratively.

Tutorial

Quick Start

Standalone example xun project file for computing fibonacci numbers

import xun


@xun.function()
def fibonacci_number(n):
    return f_n_1 + f_n_2
    with ...:
        f_n_1 = (
            0 if n == 0 else
            1 if n == 1 else
            fibonacci_number(n - 1)
        )
        f_n_2 = fibonacci_number(n - 2) if n > 1 else 0


@xun.function()
def fibonacci_sequence(n):
    return sequence
    with ...:
        sequence = [fibonacci_number(i) for i in range(n)]


def main():
    """
    Compute and print the first 10 fibonacci numbers
    """
    blueprint = fibonacci_sequence.blueprint(10)
    sequence = blueprint.run(
        driver=xun.functions.driver.Sequential(),
        store=xun.functions.store.Memory(),
    )
    for num in sequence:
        print(num)


if __name__ == '__main__':
    main()

To see a visualization of the call graph:

xun graph examples/fibonacci.py "fibonacci_sequence(10)"

A closer look

Let's break down the code from fibonacci_number in the example above in to 4 parts

@xun.function()

The decorator @xun.function() compiles this function into a xun function. Xun functions are functions that are meant to be executed in parallel, possibly on remote workers.

def fibonacci_number(n):

The function definition is just a normal python function definition.

    return f_n_1 + f_n_2

The body of the function is just regular python, it has as expected access to the function arguments, but it also has access to the variables defined in the special xun definitions statement.

    with ...:
        f_n_1 = (
            0 if n == 0 else
            1 if n == 1 else
            fibonacci_number(n - 1)
        )
        f_n_2 = fibonacci_number(n - 2) if n > 1 else 0

Statements on the form with ...: we refer to as xun definitions. They introduce new syntax and rules that we will get more into in the next section. Note for example that the recursive calls to fibonacci_number(n) are memoized in the context store and can therefore, after scheduling, be run in parallel.

In fact, xun works by first figuring out all the calls that will happen, building a call graph, and scheduling the calls such that any previous call that we may depend on is executed before we evaluate the current call. This requires the call graph to be a directed acyclic graph (DAG).

Xun Definitions

@xun.function()
def do_some_work(some_values):
    result = expensive_computation(data)
    with ...:
        data = depencency(fixed_values)
        fixed_values = [fix(v) for v in some_values]

In the above example, a job takes in some iterable some_values as argument, polishes the values in it and calls another context function that it depends on. Note that the order of the statements inside the xun defintions statements does not matter. The syntax of xun definitions statements is similar to where clauses in Haskell and has rules that differ from standard python. In general, for xun definitions statements the following apply:

  • Order of statements is arbitrary
  • Xun functions can only be called from xun definition statements (with ...:)
  • Only assignments and free expressions are allowed
  • There can only be one xun definitions statement per xun function
  • Values cannot be modified
  • If a function modifies a value passed to it, the changes will not be reflected for the value in the definitions. That is, arguments to calls are passed by value.
  • Any code in xun definitions statements will be executed during scheduling, so the heavy lifting should be done in the function body, and not inside the xun definitions statements

Xun definition statements allow xun to figure out the order of calls needed to execute a xun program.

Stores

As calls to xun functions are executed and finished, the results are saved in the store of the context. Stores are classes that satisfy the requirements of collections.abc.MutableMapping, are pickleable, and whos state is shared between all instances. Stores can be defined by users by defining a new class with extending xun.functions.store.Store.

Drivers

Drivers are the classes that have the responsibility of executing programs. This includes scheduling the calls of the call graph and managing any concurrency.

The @xun.make_shared decorator

from math import radians
import numpy as np


def not_installed():
    pass


@xun.make_shared
def not_installed_but_shared():
    pass


@xun.function()
def xun_function():
    not_installed()            # Not OK
    not_installed_but_shared() # OK
    radians(180)               # OK because the function is builtin
    np.array([1, 2, 3])        # OK because the function is defined in an installed module

Because xun functions are pickled, any function they reference must either be installed on the system or be represented differently. xun comes with a decorator, @xun.make_shared, that can make many functions serializable.

Function Scope and Best Practices

  • You should only reference global scope from a function that would not change the outcome of the function if changed. Global scope is not considered when identifying results, thus changes to the global scope might give undesired results.
  • A good use for global scope is to specify configuration values, such as cluster addresses or file system paths.
    data_dir = '/path/to/data'
    
    @xun.function()
    def load_data():
        return load(data_dir)
  • If you want a variable change to impact your results, define it as a xun function. For example to configure a simulation with a fixed seed, define the value as a xun function rather than a variable in global scope
    @xun.function()
    def simulation_seed():
        return 196883

Yielding Auxiliary Results

Functions can yield auxiliary results accessible through interfaces. This is useful if a function returns large, but separable results. Results can be yielded to interfaces specified by a decorator @<xun.Function>.interface. This let's the original function write results accessible through the interface as if they were xun functions. Yields from a xun function are declared in the function body as yield statements of the form yield <call-to-interface> is <expr>.

Interfaces must specify which function that should be responsible for producing it's result.

In this example the function f returns what is passed to it, but in addition yields results to interfaces even and odd. Calling even and odd interfaces will return the n-th even and odd integer respectively.

import xun


@xun.function()
def f(n):
    yield even(n) is n * 2
    yield odd(n) is n * 2 + 1
    return n


@f.interface
def even(n):
    yield from f(n)


@f.interface
def odd(n):
    yield from f(n)

xun's People

Contributors

arnerek avatar ivankolesarequinor avatar jensgm avatar laurasmanns avatar tsundvoll avatar vero-so avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xun's Issues

Flag previous calls as invalid

This could possibly be done by scrambling the result hashes

    def invalidate(self, call, func=None):
        if not self.completed(call, func):
            return

        namespace = self.store / 'results' / call

        hash = func.hash if func is not None else namespace['latest']

        sha256 = hashlib.sha256(bytes.fromhex(hash))
        sha256.update(secrets.token_bytes(32))
        distorted = sha256.hexdigest()

        namespace[distorted] = namespace.pop(hash)
        if namespace['latest'] == hash:
            namespace['latest'] = distorted

Implement resources

Some tasks require limited concurrency, this can be achieved through a resource system.

@xun.function(resources=[xun.GlobalResource('zephyre', 1, default_available=2),
                         xun.WorkerResource('GPU', 1)])
def zephyre():
    pass

or

@xun.GlobalResource('zephyre', 1, default_available=2)
@xun.GlobalResource('another', 2, default_available=5)
@xun.WorkerResource('GPU', 1)
@xun.function()
def zephyre():
    pass

Steps

  • Change xun.function constructor to use resources
  • Update decorator to accept resource argument or create two new decorators
  • Enforcing resources
    1. Dask worker resources
    2. Xun global resource (Invent new algorithm, Kahn derivative (Dual queue?))

Structures unpacking

Allow structured unpacking of constants. At the moment, the following is not allowed

@xun.function
def f():
    return a + b
    with ...:
        a, b = some_xun_function()

Graph-based store interface

Heres a test I wrote for testing invalidation of data in such a system:

@given(invalidations)
def test_call_invalidation(invalidation):
    graph_store, root, invalidated_node = invalidation
    pre = graph_store.graph[root]
    graph_store.invalidate(invalidated_node)
    post = graph_store.graph[root]

    DiGM = nx.algorithms.isomorphism.DiGraphMatcher(pre, post)
    DiGM.is_isomorphic() # Populates DiGM.mapping if isomorphic
    mapping = DiGM.mapping # Isomorphism

    invalid_nodes = nx.algorithms.dag.descendants(pre, invalidated_node)
    invalid_nodes.add(invalidated_node)
    valid_nodes = pre.nodes - invalid_nodes

    valid_pre_graph = pre.subgraph(valid_nodes)
    valid_post_graph = post.subgraph(mapping[n] for n in valid_nodes)
    invalidated_pre_graph = pre.subgraph(invalid_nodes)
    invalidated_post_graph = post.subgraph(mapping[n] for n in invalid_nodes)

    assert invalidated_node in pre and invalidated_node not in post
    assert nx.is_isomorphic(pre, post)
    assert not nx.is_isomorphic(pre, post, node_match=lambda a, b: a == b)
    assert nx.is_isomorphic(
        invalidated_pre_graph,
        invalidated_post_graph,
        node_match=lambda a, b: a != b
    )
    assert nx.is_isomorphic(
        valid_pre_graph,
        valid_post_graph,
        node_match=lambda a, b: a == b
    )

Don't load results from dependensies that are not used by the function body

Example, download_image is a dependency, but the result value is not used in the function body and does not need to be loaded when running.

Source:

@xun.function()
def wordcloud(topic, max_resolution=512):
    return image, text

    with ...:
        text = download_text(topic)
        raw_image = download_image(topic)
        image = resize_image(raw_image, max_resolution)

Graph_builder:

def wordcloud(topic, max_resolution=512):
    from xun.functions import CallNode as _xun_CallNode
    from xun.functions import CopyError as _xun_CopyError
    from xun.functions import TargetNameOnlyNode as _xun_TargetNameOnlyNode
    from xun.functions import FutureValueNode as _xun_FutureValueNode
    import networkx as _xun_nx
    _xun_graph = _xun_nx.DiGraph()

    def _xun_register_sentinel(fname, external_names, targets, *args, **kwargs
        ):
        dependencies = list(filter(lambda a: a in _xun_graph, map(
            _xun_TargetNameOnlyNode, external_names)))
        outputs = [_xun_TargetNameOnlyNode(name) for name in targets]
        call = _xun_CallNode(fname, *args, **kwargs)
        _xun_graph.add_node(call)
        _xun_graph.add_edges_from((dep, call) for dep in dependencies)
        _xun_graph.add_edges_from((call, tar) for tar in outputs)
        return _xun_FutureValueNode(call)
    from copy import deepcopy
    raw_image = _xun_register_sentinel('download_image', ['topic',
        'download_image'], ['raw_image'], topic)
    image = _xun_register_sentinel('resize_image', ['raw_image',
        'resize_image', 'max_resolution'], ['image'], raw_image, max_resolution
        )
    text = _xun_register_sentinel('download_text', ['topic',
        'download_text'], ['text'], topic)
    return _xun_graph

Callable:

def wordcloud(topic, max_resolution=512):
    from xun.functions import CallNode as _xun_CallNode
    from copy import deepcopy
    raw_image = _xun_store[_xun_CallNode('download_image', topic)]
    image = _xun_store[_xun_CallNode('resize_image', raw_image, max_resolution)
        ]
    text = _xun_store[_xun_CallNode('download_text', topic)]
    return image, text

Symbolic results are not loaded when used in expressions such as dicts later

Look at tp_val and hs_val in the generated code for this expression:

@xun.function()
def sima_parameters(start, end, seed=1):
    # import pudb; pudb.set_trace()
    return json.dumps(params)
    with ...:
        params = {
            'stask': '/work53/xun/sima/model/HYS-HS4-SGRE-Mann_v1_sima401.stask',
            'input': {
                'windDirectory': '/work53/xun/sima/wind-fields/200m_mann_classC/',
                'CtrlPath': '/work53/xun/sima/controllers/EQN-HYS/',
                'Umean': u_mean(start, end),
                'WindDirMET': wind_dir_met(start, end),
                'Current_Dir_11_72': current_dir(start, end, '11.72'),
                'Current_Dir_31_72': current_dir(start, end, '31.72'),
                'Current_Dir_3_72': current_dir(start, end, '3.72'),
                'Current_Dir_67_72': current_dir(start, end, '67.72'),
                'Current_Speed_11_72': current_speed(start, end, '11.72'),
                'Current_Speed_31_72': current_speed(start, end, '31.72'),
                'Current_Speed_3_72': current_speed(start, end, '3.72'),
                'Current_Speed_67_72': current_speed(start, end, '67.72'),
                'Hs': hs_val,
                'WaveDirMET': wave_dir(start, end),
                'WindSDev': wind_sdev(start, end),
                'Tp': tp_val,
                'GammaWindSea': gamma_jonswap(hs_val, tp_val),
                'GammaWindSea': gamma_jonswap(hs(start, end), tp(start, end)),
                'SpreadingWindSea': wave_spreading(start, end),
                'inProduction': 1,
                'SimTime': 3600,
                'seedWaves': seed,
            }
        }
        hs_val = hs(start, end)
        tp_val = tp(start, end)
def sima_parameters(start, end, seed=1):
    def _xun_load_constants():
        from copy import deepcopy
        from xun.functions import CallNode as _xun_CallNode
        from xun.functions.store import StoreAccessor as _xun_StoreAccessor
        _xun_store_accessor = _xun_StoreAccessor(_xun_store)
        tp_val = _xun_CallNode('tp', start, end)
        hs_val = _xun_CallNode('hs', start, end)
        params = {
            'stask': '/work53/xun/sima/model/HYS-HS4-SGRE-Mann_v1_sima401.stask',
            'input': {
                'windDirectory': '/work53/xun/sima/wind-fields/200m_mann_classC/',
                'CtrlPath': '/work53/xun/sima/controllers/EQN-HYS/',
                'Umean': _xun_CallNode('u_mean', start, end),
                'WindDirMET': _xun_CallNode('wind_dir_met', start, end),
                'Current_Dir_11_72': _xun_CallNode('current_dir', start, end, '11.72'),
                'Current_Dir_31_72': _xun_CallNode('current_dir', start, end, '31.72'),
                'Current_Dir_3_72': _xun_CallNode('current_dir', start, end, '3.72'),
                'Current_Dir_67_72': _xun_CallNode('current_dir', start, end, '67.72'),
                'Current_Speed_11_72': _xun_CallNode('current_speed', start, end, '11.72'),
                'Current_Speed_31_72': _xun_CallNode('current_speed', start, end, '31.72'),
                'Current_Speed_3_72': _xun_CallNode('current_speed', start, end, '3.72'),
                'Current_Speed_67_72': _xun_CallNode('current_speed', start, end, '67.72'),
                'Hs': hs_val,
                'WaveDirMET': _xun_CallNode('wave_dir', start, end),
                'WindSDev': _xun_CallNode('wind_sdev', start, end),
                'Tp': tp_val,
                'GammaWindSea': _xun_CallNode('gamma_jonswap', hs_val, tp_val),
                'GammaWindSea': _xun_CallNode('gamma_jonswap', _xun_CallNode('hs', start, end), _xun_CallNode('tp', start, end)),
                'SpreadingWindSea': _xun_CallNode('wave_spreading', start, end),
                'inProduction': 1,
                'SimTime': 3600,
                'seedWaves': seed
            }
        }
        return {
            'stask': '/work53/xun/sima/model/HYS-HS4-SGRE-Mann_v1_sima401.stask',
            'input': {
                'windDirectory': '/work53/xun/sima/wind-fields/200m_mann_classC/',
                'CtrlPath': '/work53/xun/sima/controllers/EQN-HYS/',
                'Umean': _xun_store_accessor.load_result(_xun_CallNode('u_mean', start, end), hash=b'\xb6\xe9\x13:\xa0\xeai\xf9\x12`\xc4\xfc$X\n\xfc\x8e\x1f\xb1Zfw\x07\nk(\xf2/\xf2\xc4\xb5\xdd'),
                'WindDirMET': _xun_store_accessor.load_result(_xun_CallNode('wind_dir_met', start, end), hash=b'\x1b\xb2\x9dj\xa2\x87\x03"u\xf3e^\xa8.\xb7\x95\xc7`\xbe\xce\x81\xc9\n\x92cjW\'s\x86\xbcY'),
                'Current_Dir_11_72': _xun_store_accessor.load_result(_xun_CallNode('current_dir', start, end, '11.72'), hash=b'\x1d\x80\xab\x8e4\x1b\xe4\xcc\nd\x9f\x0c=\xccB\xd5\x8ev\xa9\xe3{\x84\xe5\xc6\xf2o\x82\xa3\xf9\xe3\xea\xcc'),
                'Current_Dir_31_72': _xun_store_accessor.load_result(_xun_CallNode('current_dir', start, end, '31.72'), hash=b'\x1d\x80\xab\x8e4\x1b\xe4\xcc\nd\x9f\x0c=\xccB\xd5\x8ev\xa9\xe3{\x84\xe5\xc6\xf2o\x82\xa3\xf9\xe3\xea\xcc'),
                'Current_Dir_3_72': _xun_store_accessor.load_result(_xun_CallNode('current_dir', start, end, '3.72'), hash=b'\x1d\x80\xab\x8e4\x1b\xe4\xcc\nd\x9f\x0c=\xccB\xd5\x8ev\xa9\xe3{\x84\xe5\xc6\xf2o\x82\xa3\xf9\xe3\xea\xcc'),
                'Current_Dir_67_72': _xun_store_accessor.load_result(_xun_CallNode('current_dir', start, end, '67.72'), hash=b'\x1d\x80\xab\x8e4\x1b\xe4\xcc\nd\x9f\x0c=\xccB\xd5\x8ev\xa9\xe3{\x84\xe5\xc6\xf2o\x82\xa3\xf9\xe3\xea\xcc'),
                'Current_Speed_11_72': _xun_store_accessor.load_result(_xun_CallNode('current_speed', start, end, '11.72'), hash=b'\xcd\xbfoI\xb1\xc1l\xfc%\x89\x95I\xb7\xc2\x10)\xdc|\xd7\xdb(F\x03\x01\xbf\x0e|\xf0\xbd/\x90\x92'),
                'Current_Speed_31_72': _xun_store_accessor.load_result(_xun_CallNode('current_speed', start, end, '31.72'), hash=b'\xcd\xbfoI\xb1\xc1l\xfc%\x89\x95I\xb7\xc2\x10)\xdc|\xd7\xdb(F\x03\x01\xbf\x0e|\xf0\xbd/\x90\x92'),
                'Current_Speed_3_72': _xun_store_accessor.load_result(_xun_CallNode('current_speed', start, end, '3.72'), hash=b'\xcd\xbfoI\xb1\xc1l\xfc%\x89\x95I\xb7\xc2\x10)\xdc|\xd7\xdb(F\x03\x01\xbf\x0e|\xf0\xbd/\x90\x92'),
                'Current_Speed_67_72': _xun_store_accessor.load_result(_xun_CallNode('current_speed', start, end, '67.72'), hash=b'\xcd\xbfoI\xb1\xc1l\xfc%\x89\x95I\xb7\xc2\x10)\xdc|\xd7\xdb(F\x03\x01\xbf\x0e|\xf0\xbd/\x90\x92'),
                'Hs': hs_val, # This has to be loaded
                'WaveDirMET': _xun_store_accessor.load_result(_xun_CallNode('wave_dir', start, end), hash=b'\xf9\x1a\x8bM.\xd2\xe3V\xa6\x9a\x93\xc1\xa1k\xa6\xaaOa\x83r\xf8\n\xccU\xaa\x94\xf9\xfa\xf6\x19\xe6\xc2'),
                'WindSDev': _xun_store_accessor.load_result(_xun_CallNode('wind_sdev', start, end), hash=b'\xb1\x9a\xbd\x95h\xb3-\x8a\xb0\xb7\xea\xd4^|&\xed\xa3\x1eC\r\xab\t\x93\xd7\xfd\xc7d\xb1\xd4\xdd\x06\xb0'),
                'Tp': tp_val, # This too
                'GammaWindSea': _xun_store_accessor.load_result(_xun_CallNode('gamma_jonswap', hs_val, tp_val), hash=b"\xd0T\xe82B.\x86\xa9r*\x1d4F\xe2N\xbc\x06*\x85U\xacgm\x01C\xb7\xbf'\xe87u\xd1"),
                'GammaWindSea': _xun_store_accessor.load_result(_xun_CallNode('gamma_jonswap', _xun_CallNode('hs', start, end), _xun_CallNode('tp', start, end)), hash=b"\xd0T\xe82B.\x86\xa9r*\x1d4F\xe2N\xbc\x06*\x85U\xacgm\x01C\xb7\xbf'\xe87u\xd1"),
                'SpreadingWindSea': _xun_store_accessor.load_result(_xun_CallNode('wave_spreading', start, end), hash=b'\xba\xb57W/\x8ee\x8f\xc0,\xf3)\x8c\xef\x19R\n/\xb85\x9e\x98\x8d\x8at\xa5\xf9\xd7\x19\x11\xb2\x1c'),
                'inProduction': 1,
                'SimTime': 3600,
                'seedWaves': seed
            }
        },

    params, = _xun_load_constants()
    return json.dumps(params)

Nested xun calls

In test_functions, there is a test failing named test_nested_calls.

@pytest.mark.skip(reason="Nested calls not supported yet")
def test_nested_calls():
    @xun.function()
    def f():
        return 'a'

    @xun.function()
    def g(v):
        return v + 'b'

    @xun.function()
    def h():
        with ...:
            r = g(f())
        return r

    result = h.blueprint().run(
        driver=xun.functions.driver.Sequential(),
        store=xun.functions.store.Memory(),
    )

    assert result == 'ab'

Unpack from subscripted function

The following test currently fails. However, it is correct Python syntax, so it should also be correct Xun syntax.

@xun.function()
    def f():
        return 'a', ('b', 'c')

    @xun.function()
    def h():
        with ...:
            b, c = f()[1]
        return b, c

    result = h.blueprint().run(
        driver=xun.functions.driver.Sequential(),
        store=xun.functions.store.Memory(),
    )

    assert result == ('b', 'c')

Store definition

Currently there is a StoreMeta metaclass that adds requirements to Store implementations. This should be redone, the Redis store comming with the celery driver does not implement this interface.

sample_sin_blueprint step size is incorrect

The code under can be found in xun/tests/helpers.py, but it erroneously defines step size. Change the definition to a correct one.

def sample_sin_blueprint(offset=42, sample_count=10, step_size=36):
    @xun.function()
    def mksample(i, step_size):
        return i / step_size

    @xun.function()
    def deg_to_rad(deg):
        return radians(deg)

    @xun.function()
    def sample_sin(offset, sample_count, step_size):
        return [sin(s) + offset for s in radians]
        with ...:
            samples = [mksample(i, step_size) for i in range(sample_count)]
            radians = [deg_to_rad(s) for s in samples]

    blueprint = sample_sin.blueprint(offset, sample_count, step_size)
    expected = [
        sin(radians(i / step_size)) + offset for i in range(sample_count)
    ]

    return blueprint, expected

Accept function without target in with constants statement

import xun

@xun.function()
def returns_complex(size):
    return {i: i * 2 for i in range(size)}

@xun.function()
def accepts_complex(complex):
    print(complex)

@xun.function()
def workflow():
    with ...:
        accepts_complex(c)
        c = returns_complex(4)

This currently fails, because accepts_complex(c) does not have a target in workflow(). However, having a target should not be necessary in this case.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.