Comments (1)
I took an initial shot at this in our sandbox environment and hit the following blockers:
class RayXLADDPStrategy(pl.pytorch.strategies.DDPStrategy):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
record_extra_usage_tag(TagKey.TRAIN_LIGHTNING_RAYDDPSTRATEGY, "1")
@property
def root_device(self) -> torch.device:
device = xm.xla_device()
return device
@property
def distributed_sampler_kwargs(self) -> Dict[str, Any]:
return dict(
num_replicas=self.world_size,
rank=self.global_rank,
)
and hit the following stacktrace:
TypeError: Could not serialize the put value <function train_func at 0x7f7016477250>:
===================================================================
Checking Serializability of <function train_func at 0x7f7016477250>
===================================================================
!!! FAIL serialization: cannot pickle 'google.protobuf.pyext._message.EnumDescriptor' object
Detected 6 global variables. Checking serializability...
Serializing 'FashionMNIST' <class 'torchvision.datasets.mnist.FashionMNIST'>...
Serializing 'transforms' <module 'torchvision.transforms' from '/home/ec2-user/miniconda3/envs/raytest/lib/python3.10/site-packages/torchvision/transforms/__init__.py'>...
Serializing 'DataLoader' <class 'torch.utils.data.dataloader.DataLoader'>...
Serializing 'MNISTClassifier' <class '__main__.MNISTClassifier'>...
Serializing 'RayXLADDPStrategy' <class '__main__.RayXLADDPStrategy'>...
!!! FAIL serialization: cannot pickle 'google.protobuf.pyext._message.EnumDescriptor' object
Serializing '__getstate__' <function Strategy.__getstate__ at 0x7f6f8798d630>...
Serializing '__init__' <function RayXLADDPStrategy.__init__ at 0x7f7016476e60>...
!!! FAIL serialization: cannot pickle 'google.protobuf.pyext._message.EnumDescriptor' object
Detected 2 global variables. Checking serializability...
Serializing 'record_extra_usage_tag' <function record_extra_usage_tag at 0x7f7002d4e7a0>...
Serializing 'TagKey' <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f7002b78220>...
!!! FAIL serialization: cannot pickle 'google.protobuf.pyext._message.EnumDescriptor' object
Detected 1 nonlocal variables. Checking serializability...
Serializing '__class__' <class '__main__.RayXLADDPStrategy'>...
!!! FAIL serialization: cannot pickle 'google.protobuf.pyext._message.EnumDescriptor' object
Serializing '_abc_impl' <_abc._abc_data object at 0x7f7016488700>...
!!! FAIL serialization: cannot pickle '_abc._abc_data' object
WARNING: Did not find non-serializable object in <_abc._abc_data object at 0x7f7016488700>. This may be an oversight.
===================================================================
Variable:
FailTuple(__class__ [obj=<class '__main__.RayXLADDPStrategy'>, parent=<function RayXLADDPStrategy.__init__ at 0x7f7016476e60>])
FailTuple(TagKey [obj=<google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f7002b78220>, parent=<function RayXLADDPStrategy.__init__ at 0x7f7016476e60>])
was found to be non-serializable. There may be multiple other undetected variables that were non-serializable.
Consider either removing the instantiation/imports of these variables or moving the instantiation into the scope of the function/class.
===================================================================
Check https://docs.ray.io/en/master/ray-core/objects/serialization.html#troubleshooting for more information.
If you have any suggestions on how to improve this error message, please reach out to the Ray developers on github.com/ray-project/ray/issues/
===================================================================
I'm now far enough outside my expertise that I'm not going to be very helpful
from ray.
Related Issues (20)
- Release test single_node_oom.aws failed HOT 3
- [RLlib] ValueError in initialization of ImpalaTF2Policy HOT 3
- [Core | Ray on Spark] Allow cluster_mode='yarn' for Ray on Spark HOT 1
- [Core] Raylet may crash when the job exits HOT 3
- CI test windows://python/ray/tests:test_network_failure_e2e is consistently_failing HOT 2
- [Ray Serve] Running experimental multiple application in different containers on EKS HOT 3
- [core][accelerated DAGs] Assertion check fails when driver exits during teardown
- Serve: Add support for Podman 5.x for multi-container applications
- [Core] shall we document `__ray_call__` as a public API?
- Serialization Issue HOT 12
- CI test linux://rllib:TestLearnerGroupAsyncUpdate is consistently_failing HOT 4
- CI test linux://rllib:learning_tests_carpole_dqn_envrunner is consistently_failing HOT 10
- [Serve] Exited containers not cleaned up HOT 3
- [Air] ray.air.callbacks.wandb still exists, but fails on import due to `warn_structure_refactor` no longer existing
- [Core] Temporary folder per worker process HOT 5
- [RLlib] ray.tune.error.TuneError: Stopping criteria num_env_steps_sampled_lifetime not provided in result dict. HOT 1
- [data] autodoc mishandling type annotations HOT 1
- [Rllib] Rllib provides wrong state batch size during "bug check" batches on torch custom model
- Release test chaos_torch_batch_inference_16_gpu_300gb_raw.aws failed HOT 2
- [RLlib] PPO does not respect `sample_timeout_s` with old and hybrid API stacks
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray.