GithubHelp home page GithubHelp logo

xorbitsai / xoscar Goto Github PK

View Code? Open in Web Editor NEW
64.0 64.0 15.0 721 KB

Python actor framework for heterogeneous computing.

Home Page: https://xoscar.dev

License: Apache License 2.0

Python 72.29% Cython 9.62% CMake 0.52% C++ 17.57%
actor-model asynchronous distributed-computing

xoscar's People

Contributors

aresnow1 avatar chengjieli28 avatar codingl2k1 avatar marcelhoh avatar qianduoduo0904 avatar qinxuye avatar uranusseven avatar yibinliu666 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

xoscar's Issues

BUG: import xoscar.collective failed on Mac

Describe the bug

When I use pip install xoscar and try to import xoscar.collective on Mac, there will be an error that ImportError: dlopen(/Users/liuyibin/miniconda3/lib/python3.10/site-packages/xoscar/collective/xoscar_pygloo.cpython-310-darwin.so, 0x0002): symbol not found in flat namespace '_uv_async_init'

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
  2. The version of Xoscar you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

FEAT: Command lines to start Xoscar

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Now Xoscar need a way to start with command line.

Describe the solution you'd like

A clear and concise description of what you want to happen.

We can inherit the option from Xorbits.

FEAT: add and remove subpool

Is your feature request related to a problem? Please describe

CUDA_VISIBLE_DEVICES is frequently used for cuda device management, and currently, this environment variable can be assigned when setting up actor pools, resulting in fixed cuda device allocation.

Nonetheless, there are situations where dynamic cuda device allocation is beneficial. For instance, when performing inference on models of varying sizes, the required number of cuda devices may differ. To optimize cuda device utilization effectively, it is advantageous to determine CUDA_VISIBLE_DEVICES at runtime based on the model size.

Describe the solution you'd like

Support adding and removing subpools.

[BUG] State is changing in Stateless actor

Describe the bug

Started playing with the library, please correct me if I'm wrong but state should not change in Stateless actor and following ideally should not work like that or at least should warn in runtime.

To Reproduce

To help us to reproduce this bug, please provide information below:

import asyncio
import xoscar as xo
import nest_asyncio
nest_asyncio.apply()

class Counter(xo.StatelessActor):  # <-- intentional to check it there's at least a runtime warning!
    count = 0

    def inc(self):
        self.count += 1
        print(self.count)

async def main():
    address = "localhost:9999"
    await xo.create_actor_pool(address=address, n_process=1)
    actor = await xo.create_actor(
        Counter,
        address=address,
        uid="1",
    )
    tasks = [actor.inc() for _ in range(10)]
    await asyncio.gather(*tasks)
    await xo.destroy_actor(actor)

asyncio.run(main())

outputs

1
2
3
4
5
6
7
8
9
10
  1. Your Python version: 3.10
  2. The version of Xoscar you use: latest 0.1
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

[BUG] python 3.12 pkgutil has no attribute 'ImpImporter'

Describe the bug

A clear and concise description of what the bug is.

tuptools_init_.py", line 16, in
import setuptools.version
File "C:\Users\AppData\Local\Temp\pip-build-env-5c2u67or\overlay\Lib\site-packages\se
tuptools\version.py", line 1, in
import pkg_resources
File "C:\Users\AppData\Local\Temp\pip-build-env-5c2u67or\overlay\Lib\site-packages\pk
g_resources_init_.py", line 2191, in
register_finder(pkgutil.ImpImporter, find_on_path)
^^^^^^^^^^^^^^^^^^^
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
  2. The version of Xoscar you use #xoscar-0.1.3.tar.gz
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

FEAT: support collective communication

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Collective communication is widely used for deep learning workload, we can add support for it.

TST: test_copy_to_file_objects sometimes failed in CI

Describe the bug

test_copy_to_file_objects sometimes failed in CI.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
  2. The version of Xoscar you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.
__________________________ test_copy_to_file_objects ___________________________

    @pytest.mark.asyncio
    async def test_copy_to_file_objects():
        start_method = (
            os.environ.get("POOL_START_METHOD", "forkserver")
            if sys.platform != "win32"
            else None
        )
        pool = await create_actor_pool(
            "127.0.0.1",
            pool_cls=MainActorPool,
            n_process=2,
            subprocess_start_method=start_method,
        )
    
        d = tempfile.mkdtemp()
        async with pool:
            ctx = get_context()
    
            # actor on main pool
            actor_ref1 = await ctx.create_actor(
                FileobjTransferActor,
                uid="test-1",
                address=pool.external_address,
                allocate_strategy=ProcessIndex(1),
            )
            actor_ref2 = await ctx.create_actor(
                FileobjTransferActor,
                uid="test-2",
                address=pool.external_address,
                allocate_strategy=ProcessIndex(2),
            )
            sizes = [10 * 1024**2, 3 * 1024**2, 0.5 * 1024**2, 0.25 * 1024**2]
            names = []
            for _ in range(2 * len(sizes)):
                _, p = tempfile.mkstemp(dir=d)
                names.append(p)
    
>           await actor_ref1.copy_data(actor_ref2, names[::2], names[1::2], sizes=sizes)

xoscar/backends/test/tests/test_transfer.py:293: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
xoscar/backends/context.py:227: in send
    return self._process_result_message(result)
xoscar/backends/context.py:102: in _process_result_message
    raise message.as_instanceof_cause()
xoscar/backends/pool.py:657: in send
    result = await self._run_coro(message.message_id, coro)
xoscar/backends/pool.py:368: in _run_coro
    return await coro
xoscar/api.py:306: in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
xoscar/core.pyx:527: in __on_receive__
    raise ex
xoscar/core.pyx:497: in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
xoscar/core.pyx:498: in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
xoscar/core.pyx:503: in xoscar.core._BaseActor.__on_receive__
    result = await result
xoscar/backends/test/tests/test_transfer.py:239: in copy_data
    fobj.write(np.random.bytes(size))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: [address=127.0.0.1:45079, pid=6288] 'float' object cannot be interpreted as an integer

[BUG] recover_sub_pool may raises RuntimeError: dictionary changed size during iteration

Describe the bug

A clear and concise description of what the bug is.

Traceback (most recent call last):
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/xoscar/backends/pool.py", line 1402, in monitor_sub_pools
    await self.recover_sub_pool(address)
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/xoscar/backends/indigen/pool.py", line 329, in recover_sub_pool
    for _, message in self._allocated_actors[address].values():
RuntimeError: dictionary changed size during iteration

To Reproduce

To help us to reproduce this bug, please provide information below:

Not easy to reproduce this error.

  1. Your Python version 3.11.4
  2. The version of Xoscar you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.