aws-neuron / aws-neuron-sdk Goto Github PK

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services

Home Page: https://aws.amazon.com/machine-learning/neuron/

License: Other

Python 62.05% Shell 0.73% Jupyter Notebook 33.29% Makefile 0.05% C++ 1.17% Dockerfile 0.01% HTML 0.03% CSS 0.05% C 2.61%

aws-neuron-sdk's People

Contributors

Stargazers

Watchers

Forkers

keithachorn-intel munirbshara hahtk cprice404-aws fpmchu micwade-aws joulroad eric-haibin-lin hsaputra matt-boyd-pdx ssoli2020 samux87 guihahn saswatp cszhz deeppat yushuyang1994 reminisce ivanvas regulusv jxuamazon daisukemiyamoto emreyalcin26 sriharshams maneeshs elind77 englishwang6 amerrez awshaichen gwonsoolee jason9075 nonatofabio polochanina mrnikwaws hannanjgaws mbenencase https-github-com-alpha-beta-infini jamesr66a michaellee1 seanpmorgan yujinee ddb08 sebasegoviawhw amilkarrm awsjoshir viniezaboi catalin-manciu-aws awshrishi jcpser la-cruche shreekavithaa aws-wanhenr workdd qpc-database matthewheston sdaher21 anuj-kosambi stevenjokess aws-mandarde hengfun liangtsao davidshtian daviehr kutysam gabriel4256 projektosmium kamranjkhan georgebakas aws-anantsh mingyuanzhu beyondcloud shebbur-aws jackliaoall-aws-resources aws-bowencc test-mass-forker-org-1 bradmcdanel joldnine dingusagar biirving aws-amulyaab hadilou mahwiah aws-diamant elijahahianyo mahirkukreja ohadkatz satyajitghana julien-c gilinachum radbarros aws-maens ajunlonglive jingyahuang gonsoomoon-ml beneykim zynk13 s7yr0 prathikmayur baskrahmer genaipro

aws-neuron-sdk's Issues

Models: deepen Neuron CV Object detection support (YOLO v3 YOLO v4)

App Notes Readme link broken

Application Notes Readme link is broken

Unable to use neuron sdk to compile GPT2 model

I tried the neuron+pytorch tutorial for gpt2: https://github.com/aws/aws-neuron-sdk/blob/master/docs/pytorch-neuron/tutorial-compile-infer.md
Additionally, I installed "transformers" from pip, and downloaded my gpt2 medium model.

I tried the following script:

from transformers.tokenization_gpt2 import GPT2Tokenizer
from transformers.modeling_gpt2 import GPT2LMHeadModel
import torch
import torch_neuron


# loading gpt2 medium model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2', pad_token='<|endoftext|>')
model = GPT2LMHeadModel.from_pretrained('data/GPT2-345M/')
model.eval()

# generating example input
tokens = [tokenizer.encode(t) for t in ['I like to drink coke']]
tensors = torch.LongTensor(tokens)

# using neuron sdk to compile the model
model_neuron = torch.neuron.trace(model, example_inputs=[tensors])

This is the error I'm getting:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/prod/neuron/lib/python3.5/site-packages/torch_neuron/decorators.py", line 150, in trace
    transform_torch_graph_to_tensorflow( func, example_inputs, args, kwargs )
  File "/home/prod/neuron/lib/python3.5/site-packages/torch_neuron/decorators.py", line 294, in transform_torch_graph_to_tensorflow
    tensor_outputs = _resolve_func(node)(op, *tensor_inputs)
TypeError: arange() takes from 2 to 6 positional arguments but 8 were given

Versions of my packages:

Ubuntu 16.04.3 LTS
Python 3.5.2
torch-neuron 1.0.763.0
transformers 2.5.1
numpy 1.17.2

Any idea? Should neuron-sdk work with gpt2 models? Any help is welcome

Missing apt configuration documentation

This page has a link to aide you with setting up APT settings for your distro: https://github.com/aws/aws-neuron-sdk/blob/master/docs/tensorflow-neuron/tutorial-tensorflow-serving-NeuronCore-Group.md#install-tensorflow-model-server-and-serving-api

It links to a file called guide-repo-config.md which doesn't exist.

Neuron-RTD fails to activate using inf1.24xlarge instance

Despite having success running 'neuron-rtd' on inf1.2xlarge and inf1.6xlarge configurations, I consistently receive the same error when attempting to use this service on the larger inf1.24xlarge instance. The particular error indicates that it may be related to having > 10 devices or perhaps even an issue with the particular PCI BDF ID.

The configuration has been replicated on a newly-instantiated inf1.24xlarge instance, running Ubuntu 18.04 and the DLAMI v26.0. Then once launched, using the environment 'source activate aws_neuron_tensorflow_p36'.

This is the command which reveals the error: sudo systemctl start neuron-rtd

The error is logged and revealed using: journalctl -xe

-- Unit neuron-rtd.service has begun starting up.
Jan 08 21:53:01 ip-10-0-0-31 neuron-rtd[4154]: [NRTD:ParseArguments] Using all the BDFs in the infa_map.json!
Jan 08 21:53:01 ip-10-0-0-31 nrtd[4154]: [NRTD:nrtd_main] nrtd build using:1.0.4109.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: **[hal] request seq: 2, cmd: 1 timed out**
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:reset_mla] Resetting 0000:00:10.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:reset_mla] Resetting 0000:00:11.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:reset_mla] Resetting 0000:00:12.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:reset_mla] Resetting 0000:00:13.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:reset_mla] Resetting 0000:00:14.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:reset_mla] Resetting 0000:00:15.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:reset_mla] Resetting 0000:00:16.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:reset_mla] Resetting 0000:00:17.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:reset_mla] Resetting 0000:00:18.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:reset_mla] Resetting 0000:00:19.0
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:tdrv_init_mla_phase1] BDF 0000:00:1a.0 is not a MLA Device
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [TDRV:tdrv_destory] TDRV not initialized
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [NRTD:InitTongas] kmgr_init_mla() failed 2
Jan 08 21:53:19 ip-10-0-0-31 nrtd[4154]: [NRTD:nrtd_main] Initializing MLA failed: 0000:00:10.0 0000:00:11.0 0000:00:12.0 0000:00:13.0 0000:00:14.0 0000:00:15.0 0000:00:16.0 0000:00:17.0 0000:00:18.0 0000:00:19.0 0000:00:1a.0 0000:00:1b.0 0000:00:1c.0 0000:00:1d.0 0000:00
Jan 08 21:53:19 ip-10-0-0-31 neuron-rtd[4154]: [TDRV:tdrv_destory] TDRV not initialized
Jan 08 21:53:19 ip-10-0-0-31 systemd[1]: neuron-rtd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 08 21:53:19 ip-10-0-0-31 systemd[1]: neuron-rtd.service: **Failed with result 'exit-code'.
Jan 08 21:53:19 ip-10-0-0-31 systemd[1]: Failed to start Neuron Runtime Daemon.
-- Subject: Unit neuron-rtd.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit neuron-rtd.service has failed.
--
-- The result is RESULT.
Jan 08 21:53:19 ip-10-0-0-31 sudo[3183]: pam_unix(sudo:session): session closed for user root
Jan 08 21:53:20 ip-10-0-0-31 systemd[1]: neuron-rtd.service: Service hold-off time over, scheduling restart.
Jan 08 21:53:20 ip-10-0-0-31 systemd[1]: neuron-rtd.service: Scheduled restart job, restart counter is at 22.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Automatic restarting of the unit neuron-rtd.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Jan 08 21:53:20 ip-10-0-0-31 systemd[1]: Stopped Neuron Runtime Daemon.
-- Subject: Unit neuron-rtd.service has finished shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit neuron-rtd.service has finished shutting down.

The error appears to be triggered when it reaches the 11th device (0000:00:1a.0). I have ruled 'out of memory' errors. And no documentation about what constitutes an 'MLA Device' could be found, although my working hunch is that it stands for Machine Learning Accelerator.

Help in resolving this so I can fully utilize this instance type is much appreciated.

Can't profile with different resolution

I am attempting to profile ResNet50 with a resolution of 640x480. I am able to generate trace files with 224x224, but I am unable to do so for 640x480. The include_top option is False for both 224x224 and 640x480 resolution networks, and compilation seems to complete OK for both. This issue is possibly due to the resolution limitations described here and here. I have attached my log and a tar file of my code.

resnet50_files.tar.gz
lack_of_trace.txt

Failed to fuse subgraph

Hi,
I am trying to convert a pre-trained tensorflow model for running on AWS Inferentia. The neuron compiler successfully converts the model, but gives warnings about failing to fuse subgraphs. The Number of operations placed on Neuron runtime is 0. Inference on a Inf1 instance is therefore much slower than it should. Do you have any tips how to solve this?

Running this compile script on a Ubuntu DLAMI Version 26 in the aws_neuron_tensorflow_p36 environment

import os
import time
import shutil
import tensorflow as tf
import tensorflow.neuron as tfn
import tensorflow.compat.v1.keras as keras
import openl3

WORKSPACE = './ws_openl3'
os.makedirs(WORKSPACE, exist_ok=True)
model_dir = os.path.join(WORKSPACE, 'openl3')
compiled_model_dir = os.path.join(WORKSPACE, 'openl3_neuron')
shutil.rmtree(model_dir, ignore_errors=True)
shutil.rmtree(compiled_model_dir, ignore_errors=True)

keras.backend.set_learning_phase(0)
model = openl3.models.load_audio_embedding_model(input_repr="mel256", content_type="music", embedding_size=512)

tf.saved_model.simple_save(
    session = keras.backend.get_session(),
    export_dir = model_dir,
    inputs = {'input': model.inputs[0]},
    outputs = {'output': model.outputs[0]})

tfn.saved_model.compile(model_dir, compiled_model_dir)    
shutil.make_archive('./openl3_neuron', 'zip', WORKSPACE, 'openl3_neuron')

produces the following output:

INFO:tensorflow:Restoring parameters from ./ws_openl3/openl3/variables/variables
INFO:tensorflow:Froze 51 variables.
INFO:tensorflow:Converted 51 variables to const ops.
2020-02-20 17:46:35.530084: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 4 ops of 3 different types in the graph that are not compiled by neuron-cc: Unpack, NoOp, Placeholder, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
INFO:tensorflow:fusing subgraph neuron_op_5c0465282a95dec5 with neuron-cc
WARNING:tensorflow:Failed to fuse subgraph neuron_op_5c0465282a95dec5 with '/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile /tmp/tmpiq09lm81/neuron_op_5c0465282a95dec5/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpiq09lm81/neuron_op_5c0465282a95dec5/graph_def.neff --io-config "{\"inputs\": {\"melspectrogram_1/transpose_20/_0:0\": [[1, 1, 199, 1025], \"float32\"], \"melspectrogram_1/unstack0/_1:0\": [[], \"int32\"]}, \"outputs\": [\"flatten_1/Reshape:0\"]}"'
INFO:tensorflow:fusing subgraph neuron_op_879ef434f1d5fcf0 with neuron-cc
WARNING:tensorflow:Failed to fuse subgraph neuron_op_879ef434f1d5fcf0 with '/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile /tmp/tmpiq09lm81/neuron_op_879ef434f1d5fcf0/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpiq09lm81/neuron_op_879ef434f1d5fcf0/graph_def.neff --io-config "{\"inputs\": {\"input_10/_2:0\": [[1, 1, 48000], \"float32\"]}, \"outputs\": [\"melspectrogram_1/transpose_2:0\", \"melspectrogram_1/Shape:0\"]}"'
INFO:tensorflow:Number of operations in TensorFlow session: 1193
INFO:tensorflow:Number of operations after tf.neuron optimizations: 151
INFO:tensorflow:Number of operations placed on Neuron runtime: 0
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: ./ws_openl3/openl3_neuron/saved_model.pb
INFO:tensorflow:Successfully converted ./ws_openl3/openl3 to ./ws_openl3/openl3_neuron

Performance: Neuron Runtime performance improvements and CPU load reduction

Models: add BERT MXNet support

Do not see any inference time improvement on object detection model.

Compared the run time of an uncompiled and a compiled Tensorflow object detection model. Unlike the example ResNet model (which had about 6x improvement over CPU run time), there is no improvement. I think the following compile messages probably explains why - "Number of operations placed on Neuron runtime: 0" . Question is why no op is placed on Nuron. Because of that the whole graph ran in the CPU, and hence no difference.

WARNING:tensorflow:subgraph neuron_op_3830212827f60cb5, tensor pred_sbbox/range0/_0:0: invalid shape (?,)
WARNING:tensorflow:Not fusing subgraph neuron_op_3830212827f60cb5: --io-config error
WARNING:tensorflow:subgraph neuron_op_6e11ec1372c1bd87, tensor upsample1/ResizeNearestNeighbor0/_6:0: invalid shape (?, ?, ?, 128)
WARNING:tensorflow:Not fusing subgraph neuron_op_6e11ec1372c1bd87: --io-config error
WARNING:tensorflow:subgraph neuron_op_a64bf54893aa3612, tensor upsample0/ResizeNearestNeighbor0/_9:0: invalid shape (?, ?, ?, 256)
WARNING:tensorflow:Not fusing subgraph neuron_op_a64bf54893aa3612: --io-config error
WARNING:tensorflow:subgraph neuron_op_6282ae96789270ea, tensor pred_mbbox/range0/_11:0: invalid shape (?,)
WARNING:tensorflow:Not fusing subgraph neuron_op_6282ae96789270ea: --io-config error
WARNING:tensorflow:subgraph neuron_op_7a2c5493ed9a8729, tensor pred_lbbox/range0/_17:0: invalid shape (?,)
WARNING:tensorflow:Not fusing subgraph neuron_op_7a2c5493ed9a8729: --io-config error
INFO:tensorflow:fusing subgraph neuron_op_6ec953b285e9ba28 with neuron-cc
WARNING:tensorflow:Failed to fuse subgraph neuron_op_6ec953b285e9ba28 with '/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile /tmp/tmpqybqolf_/neuron_op_6ec953b285e9ba28/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpqybqolf_/neuron_op_6ec953b285e9ba28/graph_def.neff --io-config "{\"inputs\": {\"input/input_data0/_8:0\": [[1, 416, 416, 3], \"float32\"]}, \"outputs\": [\"darknet/residual10/add:0\", \"darknet/residual18/add:0\", \"conv_lbbox/BiasAdd:0\", \"conv57/LeakyRelu:0\", \"upsample0/ResizeNearestNeighbor/size:0\", \"pred_lbbox/strided_slice:0\", \"pred_lbbox/strided_slice_1:0\"]}"'
INFO:tensorflow:Number of operations in TensorFlow session: 3290
INFO:tensorflow:Number of operations after tf.neuron optimizations: 914
INFO:tensorflow:Number of operations placed on Neuron runtime: 0
INFO:tensorflow:Successfully converted

For comparison, the compile output for the ResNet50 model is as follows.

INFO:tensorflow:fusing subgraph neuron_op_d6f098c01c780733 with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 4638
INFO:tensorflow:Number of operations after tf.neuron optimizations: 556
INFO:tensorflow:Number of operations placed on Neuron runtime: 554
INFO:tensorflow:Successfully converted

Frameworks and deployments: add ECS Neuron support

TF Feature Columns - embedding_column not supported?

We have a TensorFlow Estimator SavedModel. When it is compiled and run on the serving infrastructure, we get the following error:

2020-04-24 22:51:49.219277: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:809 : Failed precondition: Table not initialized.

We believe this might be due to operations associated with https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/feature_column/embedding_column or other feature columns. Are all TF Feature columns supported with Inf/Neuron?

tf.neuron.saved_model.compile fails with batch_size > 1

Following the tutorial here, the compilation works with batch_size=1. When I changed below to batch_size=2, the compilation fails.

import tensorflow as tf


tf.keras.backend.set_learning_phase(0)
tf.keras.backend.set_image_data_format('channels_last')
model = tf.keras.applications.ResNet50(weights='imagenet')
sess = tf.keras.backend.get_session()
inputs = {'input': model.inputs[0]}
outputs = {'output': model.outputs[0]}

# save the model using tf.saved_model.simple_save
modeldir = "./resnet50/1"
tf.saved_model.simple_save(sess, modeldir, inputs, outputs)

# compile the model for Inferentia
neuron_modeldir = "./resnet50_inf2/1"
tf.neuron.saved_model.compile(modeldir, neuron_modeldir, batch_size=2)

Output from compilation...

$ python compile.py
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-01-08 20:10:40.144936: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2020-01-08 20:10:40.150407: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000005000 Hz
2020-01-08 20:10:40.151658: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x563e0cf6be80 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-08 20:10:40.151678: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From compile.py:7: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

WARNING:tensorflow:From compile.py:14: simple_save (from tensorflow.python.saved_model.simple_save) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/neuron/python/saved_model.py:136: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
INFO:tensorflow:fusing subgraph neuron_op_d6f098c01c780733 with neuron-cc
WARNING:tensorflow:Failed to fuse subgraph neuron_op_d6f098c01c780733 with '/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile /tmp/tmpjj5xgykv/neuron_op_d6f098c01c780733/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpjj5xgykv/neuron_op_d6f098c01c780733/graph_def.neff --io-config "{\"inputs\": {\"input_10/_0:0\": [[2, 224, 224, 3], \"float32\"]}, \"outputs\": [\"probs/Softmax:0\"]}"'
INFO:tensorflow:Number of operations in TensorFlow session: 4638
INFO:tensorflow:Number of operations after tf.neuron optimizations: 555
INFO:tensorflow:Number of operations placed on Neuron runtime: 0
INFO:tensorflow:Successfully converted ./resnet50/1 to ./resnet50_inf2/1

The following are current latest versions of neuron packages running on c5.9xl with DLAMI v26 (ubuntu)

(aws_neuron_tensorflow_p36) ubuntu@ip-172-31-0-4:~$ pip list | grep neuron
neuron-cc                          1.0.5939.0+5849551057
tensorboard-neuron                 1.15.0.1.0.315.0
tensorflow-neuron                  1.15.0.1.0.803.0
You are using pip version 10.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

aws_neuron_tensorflow_p36) ubuntu@ip-172-31-0-4:~$ apt list | grep neuron

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuron-runtime/unknown,now 1.0.4751.0 amd64 [installed]
aws-neuron-runtime-base/unknown,now 1.0.4587.0 amd64 [installed]
aws-neuron-tools/unknown,now 1.0.4587.0 amd64 [installed]
tensorflow-model-server-neuron/unknown,now 1.15.0.1.0.803.0 all [installed]

I updated following the release notes for DLAMI

#!/bin/bash

sudo apt-get update
sudo apt-get -y install aws-neuron-runtime-base
sudo apt-get -y install aws-neuron-runtime
sudo apt-get -y install aws-neuron-tools
sudo apt-get -y install tensorflow-model-server-neuron

source activate aws_neuron_tensorflow_p36
conda install numpy=1.17.2 --yes --quiet
conda update tensorflow-neuron

Is this the most up-to-date API for setting batch_size? https://github.com/aws/aws-neuron-sdk/blob/master/docs/tensorflow-neuron/api-compilation-python-api.md

bert_demo documentation

The current documentation for bert_demo mentions to "Use the Neuron compatible BERT-Large implementation and public BERT-Large weights to generate a saved model using steps outlined in public BERT documentation here. " (link: google-research/bert#146)

The public documentation referenced is (currently) an open github issue discussion on the BERT repository -- not public documentation. No full working example is currently available at that link. If there is a known/vetted solution, can you please add it to the the bert_demo so that users can replicate the demonstration?

Thank you

Compiling TF large model with Neuron fails on protobuf size limit

I'm trying to follow the instructions in this guide:
https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-tf-neuron.html and compile a savedModel with Neuron.
When the model is large (>2GB) I'm failing on ValueError: Message tensorflow.GraphDef exceeds maximum protobuf size of 2GB error.
I understand this is a hard limit on the model size from Tensorflow, but is there any workaround from Neuron end? Is there any solution for running a large model on AWS Inferentia?

Thanks.

Models: add SSD support

Problems compiling existing PyTorch model

I have been attempting to compile an existing, pre-trained PyTorch model using neuron-cc on a c5n.4xlarge instance. I'm loading the model from an existing checkpoint and then attempting to compile it in Python 3.6 using torch.neuron.trace according to the docs here.

The compilation is failing with the error log below. Any suggestions on how to troubleshoot this?

Thanks in advance!

Error Output

...
ERROR (Spiller): Same live vertices kept previously have not been freed for long!
list_sch: /opt/amazon/neuroncc/starfish/fast_sch/mem_alloc/spill/MemorySpillDuringSchedule.cpp:352: void MemorySpillDuringSchedule::CheckSpillerWork(std::set<long unsigned int>&, MemoryBase&): Assertion `0' failed.
Aborted (core dumped)
02/27/2020 10:47:58 PM ERROR [neuron-cc]: ***************************************************************
02/27/2020 10:47:58 PM ERROR [neuron-cc]:  An Internal Compiler Error has occurred
02/27/2020 10:47:58 PM ERROR [neuron-cc]: ***************************************************************
02/27/2020 10:47:58 PM ERROR [neuron-cc]:
02/27/2020 10:47:58 PM ERROR [neuron-cc]: Please contact Customer Support and provide the following details.
02/27/2020 10:47:58 PM ERROR [neuron-cc]:
02/27/2020 10:47:58 PM ERROR [neuron-cc]: Error message:  Non-zero exit status (134) for command: /home/ubuntu/test_venv/lib/python3.6/site-packages/neuroncc/starfish/bin/list_sch --hhir hh-tr-external-move.json --verbose 0 --sb_size 75 --arith_intensity_target 2300 --sb_watermark_low 0.250000 --sb_watermark_high 0.750000 --sb_size_tol 1 --alloc simple1 --alloc_opt --depth_diff 0.100000 --verbose_start_cycle 0 --tt_dist --mm_meet_cnt 1 --load_speed_factor 0.300000 --schir sch_tmp.json --spill_depth_limit 5 --threshold_consecutive_num_spills_same_keep_vertices 10 --true_dep --mm_order
02/27/2020 10:47:58 PM ERROR [neuron-cc]:
02/27/2020 10:47:58 PM ERROR [neuron-cc]: Error class:    CompilerInternalError
02/27/2020 10:47:58 PM ERROR [neuron-cc]: Error location: job.Scheduler.4
02/27/2020 10:47:58 PM ERROR [neuron-cc]: Command line:   /home/ubuntu/test_venv/bin/neuron-cc compile /tmp/tmp62j2gg8w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp62j2gg8w/graph_def.neff --io-config '{"inputs": {"input.1:0": [[1, 3, 256, 256], "float32"]}, "outputs": ["BiasAdd:0"]}'
02/27/2020 10:47:58 PM ERROR [neuron-cc]:
02/27/2020 10:47:58 PM ERROR [neuron-cc]: Internal details:
02/27/2020 10:47:58 PM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 207, in neuroncc.driver.Job.runSingleInputFn
02/27/2020 10:47:58 PM ERROR [neuron-cc]:   File "neuroncc/driver/jobs/Scheduler.py", line 59, in neuroncc.driver.jobs.Scheduler.Scheduler.runSingleInput
02/27/2020 10:47:58 PM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 145, in neuroncc.driver.Job.Job.shellCommand
02/27/2020 10:47:58 PM ERROR [neuron-cc]:
02/27/2020 10:47:58 PM ERROR [neuron-cc]: Version information:
02/27/2020 10:47:59 PM ERROR [neuron-cc]:   Neuron Compiler version 1.0.6801.0+6001944336
02/27/2020 10:47:59 PM ERROR [neuron-cc]:
02/27/2020 10:47:59 PM ERROR [neuron-cc]:   HWM version 1.0.839.0-6001300654
02/27/2020 10:47:59 PM ERROR [neuron-cc]:   NEFF version 0.6
02/27/2020 10:47:59 PM ERROR [neuron-cc]:   TVM version 1.0.1619.0+6001909371
02/27/2020 10:47:59 PM ERROR [neuron-cc]:   NumPy version 1.17.2
02/27/2020 10:47:59 PM ERROR [neuron-cc]:   MXNet not available
02/27/2020 10:47:59 PM ERROR [neuron-cc]:   TF version 1.15.0
02/27/2020 10:47:59 PM ERROR [neuron-cc]:   ONNX not available
02/27/2020 10:47:59 PM ERROR [neuron-cc]:
02/27/2020 10:47:59 PM ERROR [neuron-cc]: Artifacts stored in: /home/ubuntu

Getting a WARN about optimize having no effects on pytorch

Following the tutorial for pytorch (https://github.com/aws/aws-neuron-sdk/blob/master/docs/pytorch-neuron/tutorial-compile-infer.md) is showing the following WARNING when running python trace_resnet50.py:

/home/prod/neuron/lib/python3.5/site-packages/torch/jit/__init__.py:847: UserWarning: `optimize` is deprecated and has no effect. Use `with torch.jit.optimized_execution() instead
  warnings.warn("`optimize` is deprecated and has no effect. Use `with torch.jit.optimized_execution() instead")

Is this something that it's just fine? does it need to be fixed?

Developer tools: add host CPU utilization to the Neuron debugger view

Can't run inference on a batch larger then 1 (resnet-50 tutorial)

I successfully reproduced Getting started with torch-neuron tutorial. However, when I tried to run inference step on batch size greater then 1, I got following error:

Traceback (most recent call last):
  File "pytorch_infer_resnet50.py", line 50, in <module>
    results = model_neuron( image )
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
RuntimeError: nrt infer_wait failed in nrtd
The above operation failed in interpreter, with the following stack trace:
at code/__torch__/torch_neuron/decorators.py:7:9
op_version_set = 1
class NeuronModule(Module):
  __parameters__ = []
  training : bool
  def forward(self: __torch__.torch_neuron.decorators.NeuronModule,
    argument_1: Tensor) -> Tensor:
    _0 = ops.neuron.forward_1([argument_1], CONSTANTS.c0, CONSTANTS.c1, CONSTANTS.c2)
         ~~~~~~~~~~~~~~~~~~~~ <--- HERE
    return _0
Compiled from code /home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py(245): forward
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py(525): _slow_forward
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py(539): __call__
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/__init__.py(997): trace_module
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/__init__.py(858): trace
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py(259): trace
pytorch_trace_resnet50.py(14): <module>

The only change in the code was made:

#image, _ = eval_dataset[0]
#image = torch.tensor(image.numpy()[np.newaxis, ...])

batch_size = 2 # works when batch_size = 1
image = torch.tensor(np.array([eval_dataset[0][0].numpy()] * batch_size))

I used aws_neuron_pytorch_p36 Conda environment.

pip show torch_neuron
Name: torch-neuron
Version: 1.0.763.0
Summary: UNKNOWN
Home-page: UNKNOWN
Author: AWS
Author-email: UNKNOWN
License: Proprietary
Location: /home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages
Requires: torch-neuron-base
Required-by:

Loading ONNX neuron-compiled models

I've followed the instructions provided in this comment #59 (comment) and I've generated my model.neff binary.

My question is, how do I actually load it on the Inf chip and do predictions? I suppose I have to use the onnxruntime module, but I don't see anything in the documentation describing this process.

Operators: remove the input size limitations

added '-O2' compiler flag option, that improves Neuron capability to handle large input tensor shapes.

Compilation error: aten::size not supported

(test_venv) root@ip-10-0-108-116:~/midas# python3 run.py
Traceback (most recent call last):
File "run.py", line 75, in
run(INPUT_PATH, OUTPUT_PATH, MODEL_PATH)
File "run.py", line 27, in run
model_neuron = torch.neuron.trace(model, example_inputs=[image])
File "/root/test_venv/lib/python3.5/site-packages/torch_neuron/decorators.py", line 165, in trace
tensor_outputs = _resolve_func(node)(op, *tensor_inputs)
File "/root/test_venv/lib/python3.5/site-packages/torch_neuron/decorators.py", line 826, in _resolve_func
assert hasattr(module, func_name), "Neuron compile failed. Operator {}::{} is not supported".format(mod_name,func_name)
AssertionError: Neuron compile failed. Operator aten::size is not supported

Default MaxPoolingOp only supports NHWC on device type CPU

Getting the above error. Is there a way to work around this error ? Partial stack trace follows.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
         [[node pool0/MaxPool (defined at /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Performance: reduce latency using NeuronCore Pipeline

We are working to aggressively reduce latency via NeuronCore Pipeline, up to 40-50%

Cannot compile trace_resnet50.py in the example.

When I try to compile I get this error:

RuntimeError:
builtin cannot be used as a value:
at /home/ubuntu/test_venv/lib/python3.6/site-packages/torchvision/models/detection/_utils.py:14:56
def zeros_like(tensor, dtype):
# type: (Tensor, int) -> Tensor
return torch.zeros_like(tensor, dtype=dtype, layout=tensor.layout,
~~~~~~~~~~~~~ <--- HERE
device=tensor.device, pin_memory=tensor.is_pinned())
'zeros_like' is being compiled since it was called from 'torch.torchvision.models.detection._utils.BalancedPositiveNegativeSampler.call'
at /home/ubuntu/test_venv/lib/python3.6/site-packages/torchvision/models/detection/_utils.py:72:12

        # randomly select positive and negative examples
        perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
        perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]

        pos_idx_per_image = positive[perm1]
        neg_idx_per_image = negative[perm2]

        # create binary mask from indices
        pos_idx_per_image_mask = zeros_like(
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...  <--- HERE
            matched_idxs_per_image, dtype=torch.uint8
        )
        neg_idx_per_image_mask = zeros_like(
            matched_idxs_per_image, dtype=torch.uint8
        )

        pos_idx_per_image_mask[pos_idx_per_image] = torch.tensor(1, dtype=torch.uint8)
        neg_idx_per_image_mask[neg_idx_per_image] = torch.tensor(1, dtype=torch.uint8)

neuron-cc: `--verbose` instead of `--log-level`?

Problem

The reference guide states --log-level option can adjust the log level, whose default is INFO. However, neuron-cc does not understand the option(neuron-cc compile: error: unrecognized arguments: --log-level=INFO).

Feedback

The correct option is --verbose=INFO and the default is ERROR, not INFO.

Version Info

Neuron Compiler version 1.0.7878.0+6004821168

HWM version 1.0.897.0-6002947737
NEFF version 0.6
TVM version 1.0.1826.0+6004211132
NumPy version 1.17.2
MXNet not available
TF version 1.15.0
ONNX version 1.6.0

Frameworks and deployments: add Sagemaker Neuron support

No resource specifications for neuron-device-plugin ds

The current daemonset for k8s-neuron-device-plugin.yaml doesn't have any resource requests/limits. I take that setting requests/limits is a good strategy to prevent the container from hogging too many resources.

I'm thinking of something resembling this:

resources:
  requests:
    cpu: 200m
    memory: 500Mi
  limits:
    memory: 500Mi

What I'm asking is: do the resource requirements change depending on the number of Inferentia chips available? Or depending on other unknown factors?

NEURON_PROFILE does not appear to work when running Reinvent lab3

I've been using the information in this doc to enable profiling and view profiling data via TensorBoard:

https://github.com/aws/aws-neuron-sdk/blob/master/docs/neuron-tools/getting-started-tensorboard-neuron.md

This has been working well for me so far, when running my own inference code.

However, when I set the NEURON_PROFILE environment variable and then run the inference load test from the Re-Invent lab3 ( https://github.com/awshlabs/reinvent19Inf1Lab/blob/master/3.%20benchmark%20run.md ), there is no profile data generated.

I do see a .pb file and a .neff file that were generated in the directory I specified via NEURON_PROFILE, but there is no trace data, and when I try to start up tensorboard_neuron, it says:

WARNING: no profile data found in ./neuron_profile

It would be really useful to be able to look at the trace in tensorboard in order to understand how the load test is utilizing all of the neuron cores.

Frameworks and deployments: add EKS Neuron support

Detailed documentation here: https://docs.aws.amazon.com/eks/latest/userguide/inferentia-support.html

[neuron-cc] Neuron INT8 support on Inf1

undefined symbol on `c5.4xlarge`

I was following BERT example on a c5.4xlarge instance when I came across the following error:

ImportError: /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/neuron/_whitelist_partition_swig.so: undefined symbol: _ZN10tensorflow10FileSystem20RecursivelyCreateDirERKSs

when I ran bert_model.py. Here's a MWE without BERT example:

(aws_neuron_tensorflow_p36) ubuntu@ip-172-31-95-164:~/aws-neuron-sdk/src/examples/tensorflow/bert_demo$ python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.neuron import fuse
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/_api/v1/neuron/__init__.py", line 10, in 
    from tensorflow._api.v1.neuron import graph_util
  File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/_api/v1/neuron/__init__.py", line 10, in 
    from tensorflow._api.v1.neuron import graph_util
  File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/_api/v1/neuron/graph_util/__init__.py", line 10, in 
    from tensorflow.python.neuron.python.graph_util import inference_graph_from_session
  File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/neuron/python/graph_util.py", line 42, in 
    from tensorflow.python.neuron.whitelist_partition_swig import WhitelistPartition
  File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/neuron/whitelist_partition_swig.py", line 28, in 
    _whitelist_partition_swig = swig_import_helper()
  File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/neuron/whitelist_partition_swig.py", line 24, in swig_import_helper
    _mod = imp.load_module('_whitelist_partition_swig', fp, pathname, description)
  File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/neuron/_whitelist_partition_swig.so: undefined symbol: _ZN10tensorflow10FileSystem20RecursivelyCreateDirERKSs

Same missing symbol error is also present for from tensorflow import neuron as tfn. I also found out that this works without a problem on inf1.6xlarge.

When the inference benchmark release will be released?

We are interested to see the inference benchmark report, and comparision between T4, including inference time cost, and thoughput. It would be great if you guys release it.

[torch-neuron] RNN-T models support for inference on Inf1

Frameworks and deployments: Neuron Pytorch framework general availability

Problem compiling a BASNet network

RuntimeError: Only tensors or tuples of tensors can be output from traced functions (getOutput at /opt/workspace/KaenaPyTorchBase/build/private/pytorch/torch/csrc/jit/tracer.cpp:209)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7ff41318ba07 in /root/test_venv/lib/python3.5/site-packages/torch/lib/libc10.so)
frame #1: torch::jit::tracer::TracingState::getOutput(c10::IValue const&) + 0x362 (0x7ff3fb9a8572 in /root/test_venv/lib/python3.5/site-packages/torch/lib/libtorch.so)
frame #2: torch::jit::tracer::exit(std::vector<c10::IValue, std::allocatorc10::IValue > const&) + 0x3c (0x7ff3fb9a876c in /root/test_venv/lib/python3.5/site-packages/torch/lib/libtorch.so)
frame #3: + 0x5e853c (0x7ff3fe52653c in /root/test_venv/lib/python3.5/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5fd260 (0x7ff3fe53b260 in /root/test_venv/lib/python3.5/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x374625 (0x7ff3fe2b2625 in /root/test_venv/lib/python3.5/site-packages/torch/lib/libtorch_python.so)

frame #8: python3() [0x539a13]
frame #11: python3() [0x4e3537]
frame #14: python3() [0x539f5f]
frame #17: python3() [0x539a13]
frame #19: python3() [0x6292c2]
frame #24: __libc_start_main + 0xf0 (0x7ff4174ea830 in /lib/x86_64-linux-gnu/libc.so.6)

BERT Server Demo - Fails to create kernel due to invalid NEFF

I am attempting to replicate the BERT Demo as described here: Link . The system used was inf1.2xlarge (although inf1.24xlarge previously failed with the same error).

After downloading the BERT model, training it (per Appendix 1), and compiling the Saved Model for Neuron, the next step is to launch the BERT Server (bert_server.py). Up to this point, none of the scripts raised any errors. However, it should be noted that while the uncompiled Saved Model (ie. bert_classifier_saved_model) contains data files in the variables folder, the compiled Saved Model (bert-saved-model-neuron, output from bert_model.py) only contains the saved_model.pb file with an empty variables folder.

When the bert_server.py command is launched using: python bert_server.py --dir bert-saved-model-neuron --parallel 4 , the following output results and the inferencing does not appear to work from the client end:

WARNING:tensorflow:From bert_server.py:35: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

https://github.com/tensorflow/addons

https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

2020-01-23 00:28:46.542790: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2020-01-23 00:28:46.579727: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000015000 Hz
2020-01-23 00:28:46.579834: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560b91b98190 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-23 00:28:46.579856: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/contrib/predictor/saved_model_predictor.py:153: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
> 2020-01-23 00:28:52.779828: E tensorflow/core/framework/op_segment.cc:54] Create kernel failed: Invalid argument: neff is invalid
2020-01-23 00:28:52.779890: E tensorflow/core/common_runtime/executor.cc:642] Executor failed to create kernel. Invalid argument: neff is invalid
[[{{node bert/NeuronOp}}]]
2020-01-23 00:28:53.737255: E tensorflow/core/framework/op_segment.cc:54] Create kernel failed: Invalid argument: neff is invalid
2020-01-23 00:28:53.737303: E tensorflow/core/common_runtime/executor.cc:642] Executor failed to create kernel. Invalid argument: neff is invalid
[[{{node bert/NeuronOp}}]]
2020-01-23 00:28:54.736447: E tensorflow/core/framework/op_segment.cc:54] Create kernel failed: Invalid argument: neff is invalid
2020-01-23 00:28:54.736498: E tensorflow/core/common_runtime/executor.cc:642] Executor failed to create kernel. Invalid argument: neff is invalid
[[{{node bert/NeuronOp}}]]
2020-01-23 00:28:55.740208: E tensorflow/core/framework/op_segment.cc:54] Create kernel failed: Invalid argument: neff is invalid
2020-01-23 00:28:55.740256: E tensorflow/core/common_runtime/executor.cc:642] Executor failed to create kernel. Invalid argument: neff is invalid
[[{{node bert/NeuronOp}}]]
E0123 00:28:55.793235256 1950 socket_utils_common_posix.cc:197] check for SO_REUSEPORT: {"created":"@1579739335.793224173","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":165}
current throughput 0
input processor is waiting
input processor is waiting
input processor is waiting
input processor is waiting
current throughput 0
current throughput 0
current throughput 0
current throughput 0

The error sequence repeats the same number of times as NeuronCores specified.

Since the compiled model did not result in a NEFF file, I presume this is generated by bert_server.py on-the-fly. However, there is currently insufficient documentation to resolve this error, and I do not see it raised elsewhere in this repository. Help in resolving this error would be much appreciated.

[torch-neuron] KNN model optimization for inference on Inf1

Performance: batching performance updates

Batching performance updates will improve throughput up to 35-40% for popular models like ResNet-50 and BERT.

Frameworks and deployments: provide source code of the Neuron TensorFlow framework integration

[torch-neuron] non-Max-suppression operator support on Inf1

Frameworks and deployments: provide source code of the Neuron MXNet framework integration

Error when predecting on Tensorflow serving

Hi,

Last code example fails in AWS blog: https://itnext.io/a-first-look-at-aws-inferentia-b9672e8f8b8f

Below is the output error:

Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/PIL/Image.py", line 2584, in open
��fp.seek(0)
AttributeError: 'tuple' object has no attribute 'seek'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
��File "tfserving_rsnet50.py", line 16, in
��img = image.load_img(img_file, target_size=(224, 224))
File "/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/keras_preprocessing/image/utils.py", line 110, in load_img
��img = pil_image.open(path)
File "/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/PIL/Image.py", line 2586, in open
��fp = io.BytesIO(fp.read())
AttributeError: 'tuple' object has no attribute 'read'

Thanks

Unable to use socket in containerized PyTorch app

I have been following the docs for Docker environment setup for Neuron and Run containerized neuron application to set up a containerized app using the Inferentia chip.

I am able to get the neuron-rtd container running and using a socket in /tmp/neuron_rtd_sock as described, but I had to add the following modification to that folder in order for the container to be able to use the socket: chmod o+x /tmp/neuron_rtd_sock.

I tried using the trace_resnet50.py script described here to test whether a container could get access to the chip. I used the following Dockerfile and run command:

Dockerfile (built as pytorch-inf1 image)

FROM python:3.6

RUN pip install -U pip && \
    pip install torch-neuron "neuron-cc[tensorflow]" --extra-index-url https://pip.repos.neuron.amazonaws.com && \
    pip install  pillow==6.2.2 && \
    pip install torchvision==0.4.2 --no-deps

COPY trace_resnet50.py /src/trace_resnet50.py

CMD ["python", "/src/trace_resnet50.py"]

Docker run command

$ docker run -it --rm --env NEURON_RTD_ADDRESS=/sock/neuron.sock -v /tmp/neuron_rtd_sock/:/sock  pytorch-inf1
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/checkpoints/resnet50-19c8e357.pth
100.0%
/usr/local/lib/python3.6/site-packages/torch/jit/__init__.py:847: UserWarning: `optimize` is deprecated and has no effect. Use `with torch.jit.optimized_execution() instead
  warnings.warn("`optimize` is deprecated and has no effect. Use `with torch.jit.optimized_execution() instead")
INFO:Neuron:compiling module ResNet with neuron-cc
As
[E neuron_runtime.cpp:82] grpc server /sock/neuron.sock is unavailable. Is neuron-rtd running?
[E neuron_op_impl.cpp:52] Warning: Neuron runtime cannot be initialized; falling back to CPU execution
[E neuron_op_impl.cpp:53] Warning: Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape
[E neuron_runtime.cpp:82] grpc server /sock/neuron.sock is unavailable. Is neuron-rtd running?
[E neuron_op_impl.cpp:52] Warning: Neuron runtime cannot be initialized; falling back to CPU execution
[E neuron_op_impl.cpp:53] Warning: Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape
[E neuron_runtime.cpp:82] grpc server /sock/neuron.sock is unavailable. Is neuron-rtd running?
[E neuron_op_impl.cpp:52] Warning: Neuron runtime cannot be initialized; falling back to CPU execution
[E neuron_op_impl.cpp:53] Warning: Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape
[E neuron_runtime.cpp:82] grpc server /sock/neuron.sock is unavailable. Is neuron-rtd running?
[E neuron_op_impl.cpp:52] Warning: Neuron runtime cannot be initialized; falling back to CPU execution
[E neuron_op_impl.cpp:53] Warning: Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape
[E neuron_runtime.cpp:82] grpc server /sock/neuron.sock is unavailable. Is neuron-rtd running?
[E neuron_op_impl.cpp:52] Warning: Neuron runtime cannot be initialized; falling back to CPU execution
[E neuron_op_impl.cpp:53] Warning: Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape

Here are the permissions on the socket directory:

$ ls -l /tmp
total 4
...
drwxr-xrwx 2 root     root      25 Mar  3 16:31 neuron_rtd_sock
...
$ ls -l /tmp/neuron_rtd_sock
total 0
srw-rw-rw- 1 root root 0 Mar  3 16:31 neuron.sock

Is there a step I'm missing to allow the app container to access that socket? I tried running the app with the same elevated privileges as the neuron-rtd container, but got the same results.

Thanks!

Input size limitations and undocumented compiler args

It was mentioned in the docs that there memory limitations preventing compilation with larger input sizes. Do you plan to improve that, as currently it renders the framework unusable for non-classification tasks.
How can we find out what is the input limit for a specific model?
I'm trying to compile a relatively small model (about 2e6 parameters) and it looks like it fails with input sizes larger that 272x272 (batch size==1)
Are there additional complier args I can set? I saw in the re:invent lab lots of undocumented compiler args used, would be great if you would document them as well.

Is it possible to compile ONNX models?

There are mentions of this capability in some docs + list of supported ops, but there's no example of how to do it in practice.
I tried compiling a simple pretrained resnet model from https://github.com/onnx/models/ and it failed with:

01/08/2020 12:51:20 PM ERROR [neuron-cc]: ***************************************************************
01/08/2020 12:51:20 PM ERROR [neuron-cc]:  An Internal Compiler Error has occurred
01/08/2020 12:51:20 PM ERROR [neuron-cc]: ***************************************************************
01/08/2020 12:51:20 PM ERROR [neuron-cc]: 
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Please contact Customer Support and provide the following details.
01/08/2020 12:51:20 PM ERROR [neuron-cc]: 
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Error message:  A process in the process pool was terminated abruptly while the future was running or pending.
01/08/2020 12:51:20 PM ERROR [neuron-cc]: 
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Error location: pipeline.compile.0
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Command line:   /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile --framework ONNX /home/ubuntu/resnet18v1.onnx --output /home/ubuntu/onnx_test/output.neff
01/08/2020 12:51:20 PM ERROR [neuron-cc]: 
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Internal details:
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 207, in neuroncc.driver.Job.runSingleInputFn
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 247, in neuroncc.driver.Job.SingleInputJob.run
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 252, in neuroncc.driver.Job.SingleInputJob.run
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/concurrent/futures/_base.py", line 432, in result
01/08/2020 12:51:20 PM ERROR [neuron-cc]:     return self.__get_result()
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
01/08/2020 12:51:20 PM ERROR [neuron-cc]:     raise self._exception
01/08/2020 12:51:20 PM ERROR [neuron-cc]: 
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Version information:
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   Neuron Compiler version 1.0.5939.0+5849551057
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   HWM version 1.0.720.0-5848815573
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   NEFF version 0.6
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   TVM version 1.0.1416.0+5849176296
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   NumPy version 1.17.4
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   MXNet not available
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   TF version 1.15.0
01/08/2020 12:51:21 PM ERROR [neuron-cc]: 
01/08/2020 12:51:21 PM ERROR [neuron-cc]: Artifacts stored in: /home/ubuntu/neuroncc-ft4i1tln

tensorboard_neuron seems to not take --run_neuron_profile

Hi folks,

I'm attempting to run this tutorial:
https://github.com/aws/aws-neuron-sdk/blob/master/docs/neuron-tools/getting-started-tensorboard-neuron.md

After installing tensorflow-neuron, tensorboard-neuron, and aws-neuron-tools as requested in the tutorial, I then tried to bring up TensorBoard-Neuron. When I execute the following command:

> tensorboard_neuron --run_neuron_profile

I receive the following error:

tensorboard: error: unrecognized arguments: --run_neuron_profile

This seems odd since tensorboard_neuron should support this kind of command according to the tutorial. Its true that I'm not currently passing a logdir argument to tensorboard_neuron, but that is because my configuration doesn't have any tensorflow logs. Is there something I'm missing, or is this a genuine bug?

Setup:
Ubuntu 18.04 DLAMI on an inf1.6xlarge instance

Version:

> tensorboard_neuron --version_tb
1.15.0

Models: add BERT PyTorch support

Missing neuron-device-plugin Dockerfile

It appears that there's no provided Dockerfile for the 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-device-plugin:latest image.

In production, I don't want to resort to using the latest version every time, because that can break the underlying software product.

Is there a Dockerfile for this image that I could use?