Comments (7)
@duckontheweb when I try to run the neuron-rtd container, I get the below error:
`ubuntu@ip-192-168-19-125:~$ sudo docker run --device=/dev/neuron0 --cap-add IPC_LOCK -v /tmp/neuron_rtd_sock/:/sock -it neuron-rtd
nrtd[1]: [NRTD:nrtd_main] nrtd build using:1.1.1402.0
nrtd[1]: [NRTD:nrtd_main] nrtd build using:1.1.1402.0
sh: lspci: command not found
nrtd[1]: [TDRV:tdrv_init_mla_phase1] Could not open the device index:0
nrtd[1]: [TDRV:tdrv_init_mla_phase1] Could not open the device index:0
nrtd[1]: [TDRV:tdrv_destroy_one_mla] close device failed
nrtd[1]: [TDRV:tdrv_destroy_one_mla] close device failed
nrtd[1]: [TDRV:tdrv_destroy] TDRV not initialized
nrtd[1]: [TDRV:tdrv_destroy] TDRV not initialized
nrtd[1]: [NRTD:InitTongas] Failed to initialize devices, error:1
nrtd[1]: [NRTD:InitTongas] Failed to initialize devices, error:1
nrtd[1]: [NRTD:nrtd_main] Failed to initialize devices: , attempt: 1
nrtd[1]: [NRTD:nrtd_main] Failed to initialize devices: , attempt: 1`
Did you face this error? If so, can you please point me in the right diresction?
from aws-neuron-sdk.
duckontheweb -- we will have a look and get back to you.
from aws-neuron-sdk.
We have updated the docs to add the unix: to the socket environment variable :
NEURON_RTD_ADDRESS=unix:/sock/neuron.sock
thanks for letting us know.
from aws-neuron-sdk.
Thanks, that worked!
I did still have to run chmod o+x /tmp/neuron_rtd_sock
in order to run the neuron-rtd
container. Is that expected (I didn't see anything about it in the docs) or should I be running my container differently?
from aws-neuron-sdk.
You are correct. We missed the chmod operation in the example. Pull request #91 will correct our tutorial to reflect this. Thank you!
from aws-neuron-sdk.
@duckontheweb when I try to run the neuron-rtd container, I get the below error:
`ubuntu@ip-192-168-19-125:~$ sudo docker run --device=/dev/neuron0 --cap-add IPC_LOCK -v /tmp/neuron_rtd_sock/:/sock -it neuron-rtd
nrtd[1]: [NRTD:nrtd_main] nrtd build using:1.1.1402.0
nrtd[1]: [NRTD:nrtd_main] nrtd build using:1.1.1402.0sh: lspci: command not found
nrtd[1]: [TDRV:tdrv_init_mla_phase1] Could not open the device index:0nrtd[1]: [TDRV:tdrv_init_mla_phase1] Could not open the device index:0
nrtd[1]: [TDRV:tdrv_destroy_one_mla] close device failed
nrtd[1]: [TDRV:tdrv_destroy_one_mla] close device failednrtd[1]: [TDRV:tdrv_destroy] TDRV not initialized
nrtd[1]: [TDRV:tdrv_destroy] TDRV not initializednrtd[1]: [NRTD:InitTongas] Failed to initialize devices, error:1
nrtd[1]: [NRTD:InitTongas] Failed to initialize devices, error:1nrtd[1]: [NRTD:nrtd_main] Failed to initialize devices: , attempt: 1
nrtd[1]: [NRTD:nrtd_main] Failed to initialize devices: , attempt: 1`Did you face this error? If so, can you please point me in the right diresction?
To be honest, it's been so long that I don't recall. We were experimenting with the SDK at my previous employer but ended up not using it, so I'm not sure I'll be of much help. Sorry!
from aws-neuron-sdk.
If you're seeing this error, make sure you stop the neuron runtime running on your instance outside of the container!
from aws-neuron-sdk.
Related Issues (20)
- Input tensor is not an XLA tensor: CPUFloatType while using crf.decode function HOT 4
- RuntimeError: Bad StatusOr access: INVALID_ARGUMENT: PJRT_Client_Create: error condition nullptr != (args)->client->Error(): Init: error condition !(num_devices > 0): HOT 3
- BERT model implemented usiing TransformerEncoder returns all NaNs when running it torch==1.13.1 HOT 3
- PDF print on the home page is empty when the left side is collapsed HOT 1
- Quite largely increased latency with weights/neff separated HOT 1
- Input tensors not being read torch neuronx 2.1.2 HOT 4
- Is there something wrong in torch_neuronx.trace ? HOT 3
- support for aten::upsample_nearest3d HOT 1
- Is it possible to compile a model when no NeuronCores are available? HOT 2
- ECS inf1 neuron hook script fails HOT 2
- Issue on page /frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide.html
- Model doesn't support task text-classification for the neuron backend
- DataParallel Support on CRF inference HOT 1
- neuron-distributed for inference HOT 1
- AWS NeuronX sdk installation HOT 2
- Issue on page /general/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision.html HOT 1
- Missing example in the doc for speculative decoding beta support HOT 1
- Links broken on page /libraries/neuronx-distributed/tutorials/finetuning_llama2_7b_ptl.html
- [Runtime API] Missing `nrt_get_dmabuf_fd` Function HOT 4
- Inf1 BERT deployment using 1.13.1-neuron-py310-sdk2.19.0-ubuntu20.04
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-neuron-sdk.