GithubHelp home page GithubHelp logo

microsoft / tensorflow-directml-plugin Goto Github PK

View Code? Open in Web Editor NEW
172.0 20.0 20.0 2.65 MB

DirectML PluggableDevice plugin for TensorFlow 2

License: Apache License 2.0

Python 33.06% PowerShell 0.66% Shell 0.03% C++ 65.70% C 0.06% CMake 0.49%

tensorflow-directml-plugin's Introduction

⚠️ Development of TensorFlow-DirectML-Plugin has been paused until further notice. To take advantage of the latest DirectML features and performance improvements for inference scenarios, we recommend taking a look at ONNX Runtime. ⚠️

TensorFlow-DirectML-Plugin

Build Status

TensorFlow is an end-to-end open source platform for machine learning. This repository is an implementation of TensorFlow's Pluggable Device API that leverages DirectML to provide cross-vendor hardware acceleration on Windows 10 and the Windows Subsystem for Linux (WSL). TensorFlow with DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware.

Questions, Issues, and Feedback

You can also contact us directly at [email protected].

Getting Started

TensorFlow DirectML Plugin is in early development and is not supported for production yet. For production scenarios, use TensorFlow 1.15 with DirectML instead.

TensorFlow DirectML Plugin only works with the tensorflow-cpu>=2.12 package, not tensorflow or tensorflow-gpu. To install the package, run the following commands:

pip install tensorflow-directml-plugin

If tensorflow-cpu hasn't already been already installed, version 2.10.0 will automatically be installed.

The following resources provide additional background on DirectML and TensorFlow:

System Requirements

Windows 10

  • Windows 10 Version 1709, 64-bit (Build 16299 or higher)
  • Python x86-64 3.8, 3.9, 3.10 or 3.111
  • One of the following supported GPUs:
    • AMD Radeon R5/R7/R9 2xx series or newer
    • Intel HD Graphics 5xx or newer
    • NVIDIA GeForce GTX 9xx series GPU or newer

Windows Subsystem for Linux

  • Windows 10 Insider Preview, 64-bit (Build 20150 or higher)
  • Python x86-64 3.8, 3.9, 3.10 or 3.112
  • One of the following supported GPUs:

Contribute

If you would like to contribute to tensorflow-directml-plugin, please see our contribution guidelines and read the Microsoft Open Source Code of Conduct. We use GitHub issues for tracking requests and bugs. Please do not report security vulnerabilities through public GitHub issues. See SECURITY.md for more details.

See BUILD.md for instructions on how to produce private builds of tensorflow-directml-plugin.

Known Issues

  • If you are using the plugin on WSL with an NVIDIA RTX 2060 or 2070 GPU, versions of WSL prior to 0.60.0 will encounter a segmentation fault upon process exit in certain preview builds of Windows 11. If you encounter this issue, please upgrade to the latest version of WSL (>= 0.60.0).

License

This project is licensed under MIT License.

Some files and code snippets originate from the TensorFlow repository and are licensed under the Apache License 2.0.

The tensorflow-directml-plugin Python wheel binary package includes a redistributable version of the DirectML library, which is downloaded automatically as a part of the build. The use of the redistributable DirectML library is governed by a separate license that is found as part of the package (found in tensorflow-plugins/directml/DirectML_LICENSE.txt when extracted).

Data Collection Notice

The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft's privacy statement. Our privacy statement is located at https://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices.

Disabling Telemetry

The official builds of tensorflow-directml-plugin (hosted on PyPI) and the nightly builds uploaded by GitHub Actions have data collection enabled. This telemetry is enabled when building with --config=dml_telemetry (i.e. the --telemetry switch in build.py), but it is disabled by default for local builds.

Trademarks Notice

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

tensorflow-directml-plugin's People

Contributors

dependabot[bot] avatar fdwr avatar jstoecker avatar maggie1059 avatar patricevignola avatar rjdyk avatar ryanlai2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorflow-directml-plugin's Issues

Support XLA

It seems do not support XLA (jit compile), such as the following demo:

physical_devices = tf.config.experimental.list_physical_devices(device_type='GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'],
              jit_compile=True)
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test,  y_test, verbose=2)

And the bug is shown as:

W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at xla_ops.cc:296 : UNIMPLEMENTED: Could not find compiler for platform DML: NOT_FOUND: could not find registered compiler for platform DML -- was support for that platform linked in?

Thanks in advance.
Best wishes.

InvalidArgumentError: Graph execution error

Hello,

I am trying to run a model and getting this error where Tensorflow is trying to use the CUDA-based CudnnRNN operation, which is not available because I'm running TensorFlow with DirectML, not CUDA. I have a NVIDIA Geforece RTX 3070 that I am trying to use as the GPU. Anyone come across this issue before that can assist?

Here is the error:
WARNING:tensorflow:AutoGraph could not transform <function Model.make_train_function..train_function at 0x00000296030349D8> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function Model.make_train_function..train_function at 0x00000296030349D8> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

InvalidArgumentError: Graph execution error:

No OpKernel was registered to support Op 'CudnnRNN' used by {{node CudnnRNN}} with these attrs: [seed=0, dropout=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=0, is_training=true]
Registered devices: [CPU, GPU]
Registered kernels:

 [[CudnnRNN]]
 [[sequential_3/lstm_4/PartitionedCall]] [Op:__inference_train_function_15221]

Training model causes kernel dead and restarting

Hi, I was trying to train an image classification using tensorflow-directml. The tensorflow version I use is tensorflow-cpu 2.9 with directml-plugin. The problem I encounter is during the training, the kernel always dead, and whenever I rerun until the training part it always causes dead.

Here are the logs from juptyer-notebook
2022-10-30 07:38:40.547593: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-10-30 07:38:40.547783: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 40804 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: )
2022-10-30 07:38:44.853552: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-10-30 07:38:46.768811: F tensorflow/c/logging.cc:43] Check failed: it != allocations_by_id_.end()
[I 07:39:03.462 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports

The failed log said Check Failed allocation_by_id_end(). I have tried to google it for days but no resolution.

The solution that I had tried but ** had no effect** is to reduce the batch job for training

=======================
My system:
AMD Ryzen 7 5700G 8 Cores 18 Threads
RX 6700 XT DDR 612 GB
DDR4 RAM 64 GB
GPU Driver update latest
Miniconda with Python=3.9
Windows 11 update latest

NUMA support and other warnings

WSL2
GPU: Radeon RX590

Hi I followed the installation step-by-step guide and the plugin seems to be working fin with my GPU. However I'm getting a few warnings such as:

2022-07-04 23:28:50.054002: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-04 23:28:50.055064: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (Radeon RX 590 Series)
2022-07-04 23:28:50.302803: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-04 23:28:50.302872: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
2022-07-04 23:28:50.302929: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6958 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)

is NUMA available in wsl2 and tf2 ?

No OpKernel was registered to support Op 'CudnnRNN' used by {{node CudnnRNN}}

I’m using tensorflow-cpu 2.10 with tf-directml-plugin, and I did not install cuda toolkit and cudnn (Should I install them? if I should then what versions to install). I keep getting this error

InvalidArgumentError: Graph execution error: No OpKernel was registered to support Op ‘CudnnRNN’ used by {{node CudnnRNN}} with these attrs: [seed=0, dropout=0, T=DT_FLOAT, input_mode=“linear_input”, direction=“unidirectional”, rnn_mode=“lstm”, seed2=0, is_training=true] Registered devices: [CPU, GPU] Registered kernels:

My specs:
Windows 11
RTX 4080

Here is my code:

# Set the maximum vocabulary size
vocab_size = 10000

# Convert the training text to sequences
train_sequences = tokenizer.texts_to_sequences(train_texts)

# Pad the sequences to have the same length
max_sequence_length = max(len(seq) for seq in train_sequences)
train_data = pad_sequences(train_sequences, maxlen=max_sequence_length)

# Create the main model
main_model = Sequential()

# Add the layers to the main model
main_model.add(Embedding(input_dim=vocab_size, output_dim=100, input_length=max_sequence_length))
main_model.add(Bidirectional(LSTM(64, return_sequences=True)))

# Create the Attention layer
attention = Attention()

# Apply the Attention layer to the main model's output
attention_output = attention([main_model.layers[-1].output, main_model.layers[-1].output])

# Create the new model that takes the Attention layer output as input
output = Dense(1, activation='sigmoid')(attention_output)
model = Model(inputs=main_model.input, outputs=output)

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Reshape the predicted labels
train_labels = np.expand_dims(train_labels, axis=-1)

# Train the Bi-LSTM classifier on the whole training dataset
model.fit(train_data, train_labels, epochs=1, batch_size=32, verbose=1)

ZerosLike error for Conv2D and ConvLSTM2D layers.

Node: 'Adam/gradients/zeros_like_6'
2 root error(s) found.
  (0) INVALID_ARGUMENT:  ZerosLike with the variant data type is not yet supported for pluggable devices in this version of TensorFlow.
	 [[{{node Adam/gradients/zeros_like_6}}]]
	 [[gradient_tape/SeismicFourier/conv_lstm2d_1/TensorArrayUnstack/Shape/_168]]
  (1) INVALID_ARGUMENT:  ZerosLike with the variant data type is not yet supported for pluggable devices in this version of TensorFlow.
	 [[{{node Adam/gradients/zeros_like_6}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_8553]

I am receiving these errors when using
plugin 0.0.1.dev220621
tensorflow-cpu 2.9.0

tensorflow-cpu 2.12.0 encounter issue.

The version of tensorflow-directml-plugin is: 0.4.0.dev230202
after I reinstalled tensorflow-cpu 2.12.0. I encountered the Error as below:
tfdml_plugin.dll not found

use AMD gpu with tensorflow 2.10 but no output with a simple CNN script

my environment:
windows 10 64bit
python 3.9 64bit
tensorflow 2.10
tensorflow-directml-plugin 0.0.1.dev220621
AMD Radeon RX 640
miniConda CP39-4.12

I build the environment with above software and hardware.And I write a simple CNN script and tried to run it hoping using GPU.Actually it found the GPU and load successfully,but output some warnings and no result shown.
Follow is my script which I run:
import tensorflow as tf
import tensorflow.keras as keras
from keras import layers, optimizers, datasets, Sequential
import os

os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
tf.random.set_seed(2345)

conv_layers = [
layers.Conv2D(64, kernel_size=[3, 3], padding='same', activation=tf.nn.relu),
layers.Conv2D(64, kernel_size=[3, 3], padding='same', activation=tf.nn.relu),
layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),

layers.Conv2D(128, kernel_size=[3, 3], padding='same', activation=tf.nn.relu),
layers.Conv2D(128, kernel_size=[3, 3], padding='same', activation=tf.nn.relu),
layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),

layers.Conv2D(256, kernel_size=[3, 3], padding='same', activation=tf.nn.relu),
layers.Conv2D(256, kernel_size=[3, 3], padding='same', activation=tf.nn.relu),
layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),

layers.Conv2D(512, kernel_size=[3, 3], padding='same', activation=tf.nn.relu),
layers.Conv2D(512, kernel_size=[3, 3], padding='same', activation=tf.nn.relu),
layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),

layers.Conv2D(512, kernel_size=[3, 3], padding='same', activation=tf.nn.relu),
layers.Conv2D(512, kernel_size=[3, 3], padding='same', activation=tf.nn.relu),
layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),    

]

def main():
conv_net = Sequential(conv_layers)
conv_net.build(input_shape=[None, 32, 32, 3])
x = tf.random.normal([4, 32, 32, 3])
out = conv_net(x)
print(out.shape)

if name == 'main':
main()

These are the outputs:
python .\Vgg13.py
2022-09-29 15:51:26.582547: I tensorflow/c/logging.cc:34] Successfully opened dynamic library D:\Miniconda3\envs\tfdml_plugin\lib\site-packages\tensorflow-plugins/directml/directml.0de2b4431c6572ee74152a7ee0cd3fb1534e4a95.dll
2022-09-29 15:51:26.583060: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll
2022-09-29 15:51:26.585193: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll
2022-09-29 15:51:26.711875: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
2022-09-29 15:51:27.050898: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-29 15:51:27.051802: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (AMD Radeon RX 640)
2022-09-29 15:51:27.119403: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll
2022-09-29 15:51:27.120322: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-09-29 15:51:27.120383: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
2022-09-29 15:51:27.120442: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3164 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: )

After I uninstall the tensorflow-directml-plugin and use CPU to run this script,it shows the right result:
python .\Vgg13.py
(4, 1, 1, 512)

Is this a bug with the pluging?How can I make it to work correct?Please help!
Thank you

how to choose dml device when use tensorflow-directml-plugin

Enviroment

software version
win 10 pro 10.0.19044
cpu amd 4650G
amd gpu driver 22.9.2 22.9.2
python 3.8.10
tensorflow-cpu 2.10.0
tensorflow-directml-plugin 0.1.1.dev221004

Desciption

>>> import tensorflow as tf
>>> devices = tf.config.experimental.list_physical_devices()
>>> for device in devices:
...     print(device)
...
PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')

Running the keras mnist demo, it seem still works on cpu.
image

Expectation

i try tf1-directml. it works both on gpu in win10-based and wsl2-based.

software version
wsl2(ubuntu) 20.04LTS
wsl2(kernel) 5.10.60.1
python 3.7.15
tensorflow-directml 1.15.8
>>> devices = tf.config.experimental.list_physical_devices()
>>> for device in devices:
...     print(device)    # gpu.name正是TF的gpu。
...
PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')
PhysicalDevice(name='/physical_device:DML:0', device_type='DML')

Run same code above, it works on gpu.
image

btw, it's really exicted to use directml with integrated-gpu. In this small scale network, there is a significant improvement.

CPU avg Epoch  15s 18ms/step
GPU avg Epoch  9s 186us/sample

DML doesn't support exponential_avg_factor != 1 at the moment

I'm running some different codes to make comparisons between CUDA and DirectML Just to determine if Utilizing AMD GPUS for deployment in the future is going to be possible!
I know that this plugin is still under development, however after running a piece of code that works on Tensorflow-cpu and on Tensorflow using CUDA, a lot of the models seem to work flawlessly and some tend to just not.

for example I get the following error when trying out code that works on CUDA:

Traceback (most recent call last):
  File "d:\DCASE\Code\SELD\trainv2.py", line 380, in <module>
    main(get_param())
  File "d:\DCASE\Code\SELD\trainv2.py", line 338, in main
    train_iterloop(model, trainset, epoch, optimizer)
  File "d:\DCASE\Code\SELD\trainv2.py", line 84, in iterloop
    preds, sloss, dloss = step(model, x, y, optimizer)
  File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node 'model/batch_normalization/FusedBatchNormV3' defined at (most recent call last):
    File "d:\DCASE\Code\SELD\trainv2.py", line 380, in <module>
      main(get_param())
    File "d:\DCASE\Code\SELD\trainv2.py", line 338, in main
      train_iterloop(model, trainset, epoch, optimizer)
    File "d:\DCASE\Code\SELD\trainv2.py", line 84, in iterloop
      preds, sloss, dloss = step(model, x, y, optimizer)
    File "d:\DCASE\Code\SELD\trainv2.py", line 35, in trainstep
      y_p = model(x, training=True)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\engine\training.py", line 557, in __call__
      return super().__call__(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\engine\base_layer.py", line 1097, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\traceback_utils.py", line 96, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\engine\functional.py", line 510, in call
      return self._run_internal_graph(inputs, training=training, mask=mask)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\engine\functional.py", line 667, in _run_internal_graph
      outputs = node.layer(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\engine\base_layer.py", line 1097, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\traceback_utils.py", line 96, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\layers\normalization\batch_normalization.py", line 850, in call        
      outputs = self._fused_batch_norm(inputs, training=training)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\layers\normalization\batch_normalization.py", line 660, in _fused_batch_norm
      output, mean, variance = control_flow_util.smart_cond(
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\control_flow_util.py", line 108, in smart_cond
      return tf.__internal__.smart_cond.smart_cond(
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\layers\normalization\batch_normalization.py", line 634, in _fused_batch_norm_training
      return tf.compat.v1.nn.fused_batch_norm(
Node: 'model/batch_normalization/FusedBatchNormV3'
Detected at node 'model/batch_normalization/FusedBatchNormV3' defined at (most recent call last):
    File "d:\DCASE\Code\SELD\trainv2.py", line 380, in <module>
      main(get_param())
    File "d:\DCASE\Code\SELD\trainv2.py", line 338, in main
      train_iterloop(model, trainset, epoch, optimizer)
    File "d:\DCASE\Code\SELD\trainv2.py", line 84, in iterloop
      preds, sloss, dloss = step(model, x, y, optimizer)
    File "d:\DCASE\Code\SELD\trainv2.py", line 35, in trainstep
      y_p = model(x, training=True)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\engine\training.py", line 557, in __call__
      return super().__call__(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\engine\base_layer.py", line 1097, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\traceback_utils.py", line 96, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\engine\functional.py", line 510, in call
      return self._run_internal_graph(inputs, training=training, mask=mask)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\engine\functional.py", line 667, in _run_internal_graph
      outputs = node.layer(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\engine\base_layer.py", line 1097, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\traceback_utils.py", line 96, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\layers\normalization\batch_normalization.py", line 850, in call        
      outputs = self._fused_batch_norm(inputs, training=training)
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\layers\normalization\batch_normalization.py", line 660, in _fused_batch_norm
      output, mean, variance = control_flow_util.smart_cond(
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\utils\control_flow_util.py", line 108, in smart_cond
      return tf.__internal__.smart_cond.smart_cond(
    File "C:\Users\okabi\anaconda3\envs\seld\lib\site-packages\keras\layers\normalization\batch_normalization.py", line 634, in _fused_batch_norm_training
      return tf.compat.v1.nn.fused_batch_norm(
Node: 'model/batch_normalization/FusedBatchNormV3'
2 root error(s) found.
  (0) INVALID_ARGUMENT:  DML doesn't support exponential_avg_factor != 1 at the moment
         [[{{node model/batch_normalization/FusedBatchNormV3}}]]
         [[model/multi_head_attention__2/einsum_2/Einsum/ReadVariableOp/_74]]
  (1) INVALID_ARGUMENT:  DML doesn't support exponential_avg_factor != 1 at the moment
         [[{{node model/batch_normalization/FusedBatchNormV3}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_trainstep_43850]

so is there like a timeline to when do you guys think you can get that plugin to be stable and or production ready?
also is there a place that we can read the changelog between the different releases?
thank you for your time and for your consideration.

Performance issues on Nvidia GPUs with mixed precision and accuracy issues

Discussed in #315

Originally posted by aliencaocao October 19, 2022
I have done a simple benchmark of ResNetRS50 on an RTX 3080Ti, comparing DirectML plugin 0.1.1.dev221004 and CUDA 11.8 + CUDNN 8.6.0, and found that DML is very slow compared to CUDA, and uses only about 50% of GPU while training, while CUDA constantly uses 100%. Both tests were conducted with mixed precision off and batch size of 64.

Training 10 epochs on DML took 416 seconds, while on CUDA took only 164 seconds. Both on TF 2.10 (CPU for DML) and Python 3.9.13.

This brings the big performance question - is DML in any case optimized for Nvidia GPUs, especially its Tensor Cores and TensorFloat32 datatypes? And what could cause it to not use 100% of my GPU? I have tried to increase batch size but it will just OOM so 64 is definitely a large enough BS to fully use the GPU (as shown by 100% usage on CUDA).

Or perhaps is this something that will be optimized in the future, but just not yet?

UPDATE TLDR: the performance issues mentioned above have been partially resolved in 0.2.0, but the fix introduced a model accuracy loss issue that have yet to be resolved. See #315 (reply in thread)
This makes the plugin not worth to switch over on Nvidia Ampere GPUs (and potentially other nvidia GPUs).
Mixed precision is able to run but with poor performance as of now (on 0.1.1 it was unable to run)

TensorFlow 2.11.0 support?

Hello,
I installed this plugin after realizing that TensorFlow does not support Nvidia natively anymore.
During installation I noticed that it automatically installed tensorflow-cpu==2.10.0 which is the latest one to support GPU and I'm wondering what could be the point when 2.10 alone already runs CUDA natively and in a production ready manner.
So, I upgraded to 2.11.0 , but when I run import tensorflow as tf I get the following exception:

>>> import tensorflow as tf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\__init__.py", line 440, in <module>
    _ll.load_library(_plugin_dir)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\load_library.py", line 151, in load_library
    py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow-plugins\tfdml_plugin.dll not found

Is there a way to run the latest TF with GPU on Windows natively?
I must have missed something. It looks strange that such famous framework does not run in a Windows-Nvidia combo (supposed to be the most popular).
I have the latest gen Nvidia card with the latest Cuda 12 SDK toolkit.
Thanks

`PluggableGraphOptimizer failed: NOT_FOUND: Op type not registered '_CopyFromGpuToHost'`

When I run large model, I get this error, how I can catch this and fix.

2022-12-24 01:32:58.490091: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-24 01:33:00.202302: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] PluggableGraphOptimizer failed: NOT_FOUND: Op type not registered '_CopyFromGpuToHost' in binary running on DESKTOP-NEIZVES. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-12-24 01:33:00.248437: E tensorflow/core/grappler/optimizers/tfg_optimizer_hook.cc:134] tfg_optimizer{tfg-consolidate-attrs,tfg-toposort,tfg-shape-inference{graph-version=0},tfg-prepare-attrs-export} failed: INVALID_ARGUMENT: Unable to find OpDef for _CopyFromHostToGpu
	when importing GraphDef to MLIR module in GrapplerHook
2022-12-24 01:33:00.588281: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] PluggableGraphOptimizer failed: NOT_FOUND: Op type not registered '_CopyFromGpuToHost' in binary running on DESKTOP-NEIZVES. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2022-12-24 01:33:00.611022: E tensorflow/core/grappler/optimizers/tfg_optimizer_hook.cc:134] tfg_optimizer{tfg-consolidate-attrs,tfg-functional-to-region,tfg.func(tfg-cf-sink),tfg-region-to-functional{force-control-capture=true},tfg-lift-legacy-call,symbol-privatize{},symbol-dce,tfg-prepare-attrs-export} failed: INVALID_ARGUMENT: Unable to find OpDef for _CopyFromGpuToHost
	when importing GraphDef to MLIR module in GrapplerHook
2022-12-24 01:33:00.616783: E tensorflow/core/grappler/optimizers/tfg_optimizer_hook.cc:134] tfg_optimizer{tfg-consolidate-attrs,tfg-functional-to-region,tfg.func(tfg-cf-sink),tfg-region-to-functional{force-control-capture=true},tfg-lift-legacy-call,symbol-privatize{},symbol-dce,tfg-prepare-attrs-export} failed: INVALID_ARGUMENT: Unable to find OpDef for _CopyFromGpuToHost
	when importing GraphDef to MLIR module in GrapplerHook
2022-12-24 01:33:00.645922: W tensorflow/core/common_runtime/process_function_library_runtime.cc:941] Ignoring multi-device function optimization failure: NOT_FOUND: Op type not registered '_CopyFromGpuToHost' in binary running on DESKTOP-NEIZVES. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

Process finished with exit code -1073740791 (0xC0000409)

tensorflow graph run successfully occasionally

System Information

Host System
--------------------------------------------------------------------------------
Windows 10 Version  : Windows 10 专业版 64-bit (10.0, Build 19044) (19041.vb_release.191206-1406)
Processor           : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (12 CPUs), ~3.2GHz
Memory              : 40960MB RAM
DirectX Version     : DirectX 12

Python Environment
--------------------------------------------------------------------------------
Python Version      : 3.6.4
TensorFlow-DirectML : 1.15.8

DirectX Device
--------------------------------------------------------------------------------
Description         : NVIDIA GeForce GTX 1060 6GB
Manufacturer        : NVIDIA
Chip Type           : NVIDIA GeForce GTX 1060 6GB
Dedicated Memory    : 6052 MB
Driver Version      : 30.0.15.1252
Driver Model        : WDDM 2.7
Driver Date         : 2022/4/15 8:00:00
Feature Levels      : 12_1,12_0,11_1,11_0,10_1,10_0,9_3,9_2,9_1

DirectX Device
--------------------------------------------------------------------------------
Description         : NVIDIA GeForce GTX 1060 6GB
Manufacturer        : NVIDIA
Chip Type           : NVIDIA GeForce GTX 1060 6GB
Dedicated Memory    : 6052 MB
Driver Version      : 30.0.15.1252
Driver Model        : WDDM 2.7
Driver Date         : 2022/4/15 8:00:00
Feature Levels      : 12_1,12_0,11_1,11_0,10_1,10_0,9_3,9_2,9_1

DirectX Device
--------------------------------------------------------------------------------
Description         : Intel(R) UHD Graphics 630
Manufacturer        : Intel Corporation
Chip Type           : Intel(R) UHD Graphics Family
Dedicated Memory    : 128 MB
Driver Version      : 31.0.101.2111
Driver Model        : WDDM 2.7
Driver Date         : 2022/7/19 8:00:00
Feature Levels      : 12_1,12_0,11_1,11_0,10_1,10_0,9_3,9_2,9_1

DirectX Device
--------------------------------------------------------------------------------
Description         : Citrix Indirect Display Adapter
Manufacturer        : Citrix Systems Inc.
Chip Type           : Unknown
Dedicated Memory    : 6052 MB
Driver Version      : 12.40.44.247
Driver Model        : WDDM 1.3
Driver Date         : 2019/1/23 8:00:00
Feature Levels      : 12_1,12_0,11_1,11_0,10_1,10_0,9_3,9_2,9_1

Repro Details

Describe the current behavior
sess.run crashed when running a pb on gpu, but when i run it several times, sometimes it success,

the same behavior exists in tensorflow-directml

Describe the expected behavior

Code to reproduce the issue
as its a private model, so if needed, i can send by email

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

2023-02-03 13:33:35.105325: I tensorflow/c/logging.cc:34] Successfully opened dynamic library C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\tensorflow-plugins/directml/directml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.dll
2023-02-03 13:33:35.106352: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll
2023-02-03 13:33:35.110910: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll
2023-02-03 13:33:35.461006: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 2 compatible adapters.
2.10.0
2023-02-03 13:33:36.739230: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-03 13:33:36.740492: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (NVIDIA GeForce GTX 1060 6GB)
2023-02-03 13:33:36.829776: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll
2023-02-03 13:33:36.831403: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 1 (Intel(R) UHD Graphics 630)
2023-02-03 13:33:36.889132: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:36.889660: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:36.890203: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:36.891332: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:37.130931: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2023-02-03 13:33:37.164959: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-02-03 13:33:38.487126: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.487561: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.487942: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.488219: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.490238: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.490617: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.490880: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.491229: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.493957: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.494478: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.494742: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.495150: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.497206: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.497757: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.498318: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.498706: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.501004: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.501394: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.501768: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.502233: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.504185: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.504628: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.505047: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.505510: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.507393: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.507844: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.508257: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.508800: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.510718: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.511081: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.511438: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.511923: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.514687: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.515197: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.515553: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.516035: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.518598: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.519044: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.519409: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.519739: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.522000: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.522482: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.522929: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.523365: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.525109: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.525492: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.525881: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.526401: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.528559: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.528984: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.529458: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.529867: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.532397: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.533052: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.533369: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.533712: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.535548: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.535929: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.536295: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.536767: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.539228: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.539627: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.539979: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.540340: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.584979: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.585448: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.585880: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.586405: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.589343: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.589650: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.589958: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.590416: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.592988: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.593275: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.593581: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.593890: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.595399: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.595658: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.595967: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.596303: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.598482: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.598768: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.599002: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.599291: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.600753: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.600974: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.601254: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.601519: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.603684: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.604021: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.604472: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.604804: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.606572: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.606896: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.607321: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.607648: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.609412: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.609680: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.609949: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.610172: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.611583: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.611807: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.612029: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.612280: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.613676: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.613964: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.614229: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.614517: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.616758: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.617107: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.617362: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.617649: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.619395: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.619811: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.620125: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.620467: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.622894: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.623464: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.623874: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.624233: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.626526: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.626924: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.627350: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.627791: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.629656: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.630069: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.630469: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.630885: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.634844: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.635306: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.635876: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.636496: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.639079: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.639493: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.639903: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.640521: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.642538: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.642908: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.643283: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.643869: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.645823: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.646147: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.646397: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.646797: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.689839: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.690321: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.690749: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.691330: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.693926: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.694248: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.694555: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.694986: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.696831: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.697108: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.697325: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.697617: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.700221: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.700612: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.700987: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.701512: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.703635: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.704020: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.704317: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.704676: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.706823: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.707327: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.707684: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.708064: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.710154: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.710477: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.710932: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.711337: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.714326: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.715061: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.715406: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.715877: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id: )
2023-02-03 13:33:38.719845: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.720214: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 1, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:33:38.720483: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23418 MB memory) -> physical
PluggableDevice (device: 0, name: DML, pci bus id: )
2023-02-03 13:33:38.720798: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 18451 MB memory) -> physical
PluggableDevice (device: 1, name: DML, pci bus id:

Multiple OpKernel registrations error

Hello all!
I've run into a bit of a problem, and the only thing I can think of is a potential issue for this plugin.
Keep in mind I know very little about how to identify what could be causing this issue and where this issue could be coming from. I've actually been trying to learn how to use tensorflow/keras, so debugging a system error while I barely know much about tensorflow isn't helping.

ISSUE: When I ran a program I was working on, it gave me this error talking about some multiple OpKernel registrations. When I pulled basic code from tensorflow's site (https://www.tensorflow.org/guide/gpu), I received the same error. Code & full output below:
`# %%
tf.debugging.set_log_device_placement(True)

a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2022-07-05 21:07:23.825481: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-05 21:07:23.828630: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (NVIDIA GeForce RTX 3070)
2022-07-05 21:07:24.453938: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-05 21:07:24.453984: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
2022-07-05 21:07:24.454008: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6838 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: )

InvalidArgumentError Traceback (most recent call last)
Input In [2], in <cell line: 6>()
4 a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
5 b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
----> 6 c = tf.matmul(a, b)
8 print(c)

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback..error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.traceback)
--> 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:7164, in raise_from_not_ok_status(e, name)
7162 def raise_from_not_ok_status(e, name):
7163 e.message += (" name: " + name if name is not None else "")
-> 7164 raise core._status_to_exception(e) from None

InvalidArgumentError: Multiple OpKernel registrations match NodeDef at the same priority '{{node MatMul}}': 'op: "MatMul" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }' and 'op: "MatMul" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }' [Op:MatMul]
`

NOTES: I'm using an Ubuntu WSL via VScode (the window is attached as a remote container) and their embedded jupyter notebook to run this. If I use a Docker image or a remote container built through VScode, everything works as expected. This error only appears within a WSL container.
What's strange is that if I DON'T use VScode, terminal won't read my GPU at all if I just run basic ipython.
GPU so far has only been discoverable using a python 3.9 environment on VScode's jupyter notebook. No other python environments seem to detect my GPU, nor does it matter which terminal I access it from.

I'm not sure what to do or how to approach this. I'm so unfamiliar with this I can't tell if this is an issue with directml plugin or something I apologize if this isn't an error with the directml-plugin. I figured to try to post the issue here since this project is pretty new, at least for the programs I'm used to dealing with. However, knowing how little I actually know, I could have easily missed something somewhere to cause this issue.
In the meantime, should I just stick to running regular containers? If so, how am I supposed to have the container detect my GPU, or rather how do I set up an environment that's reflective of my PC's capabilities?

support for python 3.11, support for tf 2.13, is this project abandoned?

I came up with link to this repository on official TF website. It encourages to use this plugin for GPU learning for TF versions > 2.10.

I've checked the requirements, my windows version and my python version are listed as supported, however there is a "1" annotation in README.md here which isn't explained anywhere else.

When trying to install this plugin with pip, I get:

ERROR: Could not find a version that satisfies the requirement tensorflow-directml-plugin (from versions: none)
ERROR: No matching distribution found for tensorflow-directml-plugin

For me it seems like support for 3.11 was not published on pypi or never created.

Also there are new versions of TF released (2.13), some questions have been asked here about support, but no official awnsers were given.

Is this repository actively maintained, or something else should be used instead?

Maybe mentioning the only contributor, @maggie1059 would help a bit?

RNNs only work with layer-wrapped cells on AMD GPU

When running the example RNN notebook from tensorflow, I got the following error:

NotFoundError: Exception encountered when calling layer "lstm_1" (type LSTM).

Could not find device for node: {{node CudnnRNN}} = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="lstm", seed=0, seed2=0]
All kernels registered for op CudnnRNN:
  <no registered kernels>
 [Op:CudnnRNN]

Call arguments received by layer "lstm_1" (type LSTM):
  • inputs=tf.Tensor(shape=(20, 10, 50), dtype=float32)
  • mask=Nonetraining=Noneinitial_state=None

After looking around, I found that Microsoft's installation tutorial mentions this in a note in step 5:

! Note
If your training scripts hardcode the device string to something other than "GPU", that might throw errors.

The notebook example says the built-in layers like keras.layers.LSTM(*args) use CuDNN kernals by default, so I used wrapped-cell layers instead, keras.layers.RNN(keras.layers.LSTMCell(*args), *args), which worked just fine and uses my GPU.

It looks like the built-in layers are looking for a compatible CuDNN GPU. Since I have an AMD GPU, it fails. When I use the wrapped-cell layers, it doesn't make that assumption and the tensorflow-directml-plugin is able to work as intended.

I was able to get the built-in layer to use the generic GPU kernel by making it fail the requirements for CuDNN (i.e. I set activation='sigmoid' instead of tanh) and it was able to use my GPU using the directml-plugin, but that seems like a strange workaround since the point of built-in layers is to be simple and I might like the default configuration.
I wasn't able to find an option to make the built-in layers use a generic GPU kernel, aside from making them fail the criteria. If there is a better way feel free to let me know.

Do you know if there are plans to update the built-in layers or add a workaround, or is the plan to use wrapped-cell layers as the main workaround?

`Transformers` : Error while fitting TFBertForSequenceClassification model

Hi,

First of all thanks a lot making directML compatible with TensorFlow > 2 !
As the Transformers lib needs TensorFlow >= 2.3 i had got hope my TFBertForSequenceClassification model was working with my AMD GPU card (Rx 6800)

Unfortuntly, this is not the case. (important detail : it takes a crazy long time but works on CPU with standard Tensorflow lib)

My env:
windows 11 PRO 64bit : 21H2
python 3.8.13
tensorflow-cpu 2.9.1
tensorflow-directml-plugin 0.0.1.dev220621
CPU : Ryzen 5600X,
GPU : Rx 6800

The GPU is rightly recognized:

tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

My code :

bert_model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

callbacks = [tf.keras.callbacks.ModelCheckpoint(filepath=model_save_path,
                                                save_weights_only=True,
                                                monitor='val_loss',
                                                mode='min',
                                                save_best_only=True),
             keras.callbacks.TensorBoard(log_dir=log_dir)]

print('\nBert Model', bert_model.summary())

loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = [tf.keras.metrics.SparseCategoricalAccuracy('accuracy')]
optimizer = tf.keras.optimizers.Adam(learning_rate=2e-5,epsilon=1e-08)

bert_model.compile(loss=loss, optimizer=optimizer, metrics=metric)

#OUTPUT : 
Model: "tf_bert_for_sequence_classification_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 bert (TFBertMainLayer)      multiple                  109482240 
                                                                 
 dropout_151 (Dropout)       multiple                  0         
                                                                 
 classifier (Dense)          multiple                  1538      
                                                                 
=================================================================
Total params: 109,483,778
Trainable params: 109,483,778
Non-trainable params: 0
_________________________________________________________________

Above code works well but when reaching the fit :

history=bert_model.fit([X_train, Mask_Train], 
                       y_train,
                       batch_size=32,
                       epochs=EPOCHS,
                       validation_data=([X_test, Mask_test], y_test),
                       callbacks=callbacks)

The GPU VRAM begins to receive data (i'm monitoring it via Radeon Adrenalin Software) and suddenly an error message (see below) appears !

The Error Message :

Epoch 1/3
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
File <timed exec>:2, in <module>

File ~\anaconda3\envs\P7_Bert_TF29_PYT38\lib\site-packages\keras\utils\traceback_utils.py:67,
 in filter_traceback.<locals>.error_handler(*args, **kwargs)
     65 except Exception as e:  # pylint: disable=broad-except
     66   filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67   raise e.with_traceback(filtered_tb) from None
     68 finally:
     69   del filtered_tb

File ~\anaconda3\envs\P7_Bert_TF29_PYT38\lib\site-packages\tensorflow\python\eager\execute.py:54, 
 in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     52 try:
     53   ctx.ensure_initialized()
---> 54   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     55                                       inputs, attrs, num_outputs)
     56 except core._NotOkStatusException as e:
     57   if name is not None:

InvalidArgumentError: Cannot assign a device for operation tf_bert_for_sequence_classification_3/bert/embeddings/Gather: 
Could not satisfy explicit device specification '' 
because the node {{colocation_node tf_bert_for_sequence_classification_3/bert/embeddings/Gather}} was colocated 
with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. 
All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' 
resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]

StridedSlice: CPU 
Unique: GPU CPU 
Shape: GPU CPU 
_Arg: GPU CPU 
ResourceGather: GPU CPU 
Const: GPU CPU 
UnsortedSegmentSum: CPU 
Mul: GPU CPU 
ReadVariableOp: GPU CPU 
AssignVariableOp: GPU CPU 
ResourceScatterAdd: GPU CPU 
Sqrt: GPU CPU 
AddV2: GPU CPU 
RealDiv: GPU CPU 
AssignSubVariableOp: GPU CPU 
NoOp: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  tf_bert_for_sequence_classification_3_bert_embeddings_gather_resource (_Arg)  
       framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adam_adam_update_readvariableop_resource (_Arg)  
       framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adam_adam_update_readvariableop_2_resource (_Arg)  
       framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  tf_bert_for_sequence_classification_3/bert/embeddings/Gather (ResourceGather) 
  Adam/Adam/update/Unique (Unique) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/Shape (Shape) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/strided_slice/stack (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/strided_slice/stack_1 (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/strided_slice/stack_2 (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/strided_slice (StridedSlice) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/UnsortedSegmentSum (UnsortedSegmentSum) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/mul (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ReadVariableOp (ReadVariableOp) 
  Adam/Adam/update/mul_1 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/AssignVariableOp (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ResourceScatterAdd (ResourceScatterAdd) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ReadVariableOp_1 (ReadVariableOp) 
  Adam/Adam/update/mul_2 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/mul_3 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ReadVariableOp_2 (ReadVariableOp) 
  Adam/Adam/update/mul_4 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/AssignVariableOp_1 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ResourceScatterAdd_1 (ResourceScatterAdd) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ReadVariableOp_3 (ReadVariableOp) 
  Adam/Adam/update/Sqrt (Sqrt) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/mul_5 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/add (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/truediv (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/AssignSubVariableOp (AssignSubVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/group_deps/NoOp (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/group_deps/NoOp_1 (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/group_deps (NoOp) /job:localhost/replica:0/task:0/device:GPU:0

      [[{{node tf_bert_for_sequence_classification_3/bert/embeddings/Gather}}]] 
      [Op:__inference_train_function_57566]

Thanks in advance for any Help !
Have a good day.

tensorflow version 2.12

Following PR #353 the current TF dependency is hardoced to version tensorflow-cpu==2.12.0rc0,
Please update it, if possible, to the final 2.12.0 version, which was recently released.

Unable to use directml on NVIDIA GPU. `UnimplementedError: Graph execution error.`

Hello,

I'm currently trying to run TensorFlow on a Windows computer using the tensorflow-directml-plugin as discussed in this guide.

My computer is equipped with an NVIDIA Quadro K1200 GPU, which supports DirectX12. You can check its capabilities here.

I'm using the NVIDIA Graphics Driver version 528.89.

The code I'm working with is located in this notebook. When I run the fit() method, I encounter an error message:
UnimplementedError: Graph execution error.
This issue is visible in the output of cell 20 in the notebook.

I am accessing this computer via Remote Desktop Protocol (RDP), and it seems that DxDIAG doesn't recognize my NVIDIA GPU in its list of Display Adapters. Instead, it displays "Microsoft Remote Display Adapter." However, the Device Manager correctly lists the NVIDIA GPU as an active device.

Here are my questions:

  1. Could the remote connection be causing this error?
  2. The machine also has an Intel i7-6700 CPU with built-in graphics. In this context, when tf.config.list_physical_devices('GPU') outputs GPU:0, is it referring to the built-in Intel graphics or the NVIDIA GPU? If it's the built-in Intel graphics, could this be the cause of the error?
  3. If none of the above factors are causing the error, do you have any ideas about what might be causing it?

I appreciate your assistance and guidance in resolving this issue.

Thank you.

Enumerating Intel GPU

I have 2 GPUs in my system, an AMD Radeon RX 560 and an integrated Intel UHD Graphics 770. The current tensorflow-directml-pluigin is not enumerating the Intel GPU, using the following:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

I see this line in the output:

2022-08-25 12:56:05.637891: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (Radeon RX 560 Series)

But there is no line for the Intel GPU. I believe the old TF 1 directml plugin would enumerate the Intel GPU. Any ideas as to what is causing this issue?

How can I disable this plugin?

Since this plugin is unstable, I would like to be able to switch from GPU to CPU. Is it really possible to do this and how?

All kernels registered for op CudnnRNN: <no registered kernels>

Hello,
I just build a model like this on tensorflow :

model = Sequential()
model.add(
              Embedding(input_dim=vocab_size,
              output_dim=vector_size,
              weights=[embedding_matrix],
              input_length=max_seq_len))
model.add(Dropout(0.6))
model.add(LSTM(max_seq_len,return_sequences=True))
model.add(LSTM(27))
model.add(Dense(27,activation='softmax'))

When I ".fit()" the model I get this error :

Could not find device for node: {{node CudnnRNN}} = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="lstm", seed=0, seed2=0]
All kernels registered for op CudnnRNN:

[Op:CudnnRNN]

Call arguments received by layer "lstm_6" " f"(type LSTM):
• inputs=tf.Tensor(shape=(32, 100, 600), dtype=float32)
• mask=None
• training=True
• initial_state=None

This code was working fine on another virtual-environment with tensorflow-gpu and not direct-ml.
This env works fine on another problem with image classification (the time/batch reduce from 22mn to 5 mn) and I see that the gpu is fully loaded. So the installation and the pluggin works fine.
But this env with direct-ml give me this issue.
I have : cuda version : 12.0 (nvidia-smi)
CUDNN version : 11.8 (nvcc --version)

encounter a problem

InvalidArgumentError: Graph execution error:

2 root error(s) found.
(0) INVALID_ARGUMENT: No OpKernel was registered to support Op 'CudnnRNN' used by {{node CudnnRNN}} with these attrs: [seed=0, dropout=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=0, is_training=true]
Registered devices: [CPU, GPU]
Registered kernels:

 [[CudnnRNN]]
 [[model/bidirectional/forward_lstm_3/PartitionedCall]]
 [[binary_crossentropy/logistic_loss/_12]]

(1) INVALID_ARGUMENT: No OpKernel was registered to support Op 'CudnnRNN' used by {{node CudnnRNN}} with these attrs: [seed=0, dropout=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=0, is_training=true]
Registered devices: [CPU, GPU]
Registered kernels:

 [[CudnnRNN]]
 [[model/bidirectional/forward_lstm_3/PartitionedCall]]

0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_53639]

`tf.keras.layers.experimental.preprocessing.Random*` fail in 0.0.1.dev220621

First of all, thanks for providing such a great plugin! I have been wanting to use DirectML acceleration with TensorFlow2 for some time now, and this plugin does it right.

I just noticed the existence of this plugin and immediately trained a simple image classification model with MobileNetV2 in the following environment.

  • OS: Windows10 Home(21H2)
  • CPU: Ryzen5 5600X(6C12T)
  • MB: msi mpg B550 gaming plus
  • RAM: 3200MHz 16GBx2
  • GPU: RX6900XT
  • Python: 3.9.7
  • TensorFlow: 2.9.1

I installed the tensorflow-directml-plugin on a completely new Python virtual environment.

Then I got an error when running tf.keras.layers.experimental.preprocessing.Random*.

Maybe this issue is known to you, but I am reporting it. (I also understand that this plugin is still in preview)

Also, when I commented out the line containing tf.keras.layers.experimental.preprocessing.Random* and it ran with no problem. And, training become 10x faster than when trained on CPU only!

Full traceback is here:

Traceback (most recent call last):
  File "C:\Users\Owner\dogandcat.py", line 85, in <module>
    history = model.fit(train_dataset,
  File "C:\Users\Owner\pyenvs\dml2\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\Owner\pyenvs\dml2\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation model/sequential/random_rotation/stateful_uniform/RngReadAndSkip: Could not satisfy explicit device specification '' because the node {{colocation_node model/sequential/random_rotation/stateful_uniform/RngReadAndSkip}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
_Arg: GPU CPU
RngReadAndSkip: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  model_sequential_random_rotation_stateful_uniform_rngreadandskip_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  model/sequential/random_rotation/stateful_uniform/RngReadAndSkip (RngReadAndSkip)

         [[{{node model/sequential/random_rotation/stateful_uniform/RngReadAndSkip}}]] [Op:__inference_train_function_13152]
FYI, the training code that caused the problem looks like this:
# Froked from https://github.com/tensorflow/docs-l10n/blob/master/site/ja/tutorials/images/transfer_learning.ipynb
import os
import tensorflow as tf

from tensorflow.keras.preprocessing import image_dataset_from_directory

_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

BATCH_SIZE = 32
IMG_SIZE = (160, 160)

DEVICE = '/GPU:0'

with tf.device(DEVICE):
  train_dataset = image_dataset_from_directory(train_dir,
                                              shuffle=True,
                                              batch_size=BATCH_SIZE,
                                              image_size=IMG_SIZE)

  validation_dataset = image_dataset_from_directory(validation_dir,
                                                    shuffle=True,
                                                    batch_size=BATCH_SIZE,
                                                    image_size=IMG_SIZE)

  val_batches = tf.data.experimental.cardinality(validation_dataset)
  test_dataset = validation_dataset.take(val_batches // 5)
  validation_dataset = validation_dataset.skip(val_batches // 5)

  AUTOTUNE = tf.data.AUTOTUNE

  train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
  validation_dataset = validation_dataset.prefetch(buffer_size=AUTOTUNE)
  test_dataset = test_dataset.prefetch(buffer_size=AUTOTUNE)

  data_augmentation = tf.keras.Sequential([
    tf.keras.layers.experimental.preprocessing.RandomFlip('horizontal'),
    tf.keras.layers.experimental.preprocessing.RandomRotation(0.2),
  ])

  preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input

  # Create the base model from the pre-trained model MobileNet V2
  IMG_SHAPE = IMG_SIZE + (3,)
  base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                                include_top=False,
                                                weights='imagenet')

  image_batch, label_batch = next(iter(train_dataset))
  feature_batch = base_model(image_batch)
  print(feature_batch.shape)

  base_model.trainable = True

  # Let's take a look at the base model architecture
  base_model.summary()

  global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
  feature_batch_average = global_average_layer(feature_batch)
  print(feature_batch_average.shape)

  prediction_layer = tf.keras.layers.Dense(1)
  prediction_batch = prediction_layer(feature_batch_average)
  print(prediction_batch.shape)

  inputs = tf.keras.Input(shape=(160, 160, 3))
  x = data_augmentation(inputs)
  x = preprocess_input(x)
  x = base_model(x, training=False)
  x = global_average_layer(x)
  x = tf.keras.layers.Dropout(0.2)(x)
  outputs = prediction_layer(x)
  model = tf.keras.Model(inputs, outputs)

  base_learning_rate = 0.0001
  model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
                loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                metrics=['accuracy'])

  model.summary()

  history = model.fit(train_dataset,
                      epochs=10,
                      validation_data=validation_dataset)

platform is already registered with name: "DML"

Enviroment

hardware
CPU AMD Ryzen 9 5900HX with Radeon Graphics
GPU AMD Radeon RX 6800M
software version
win 10 21H2 19044.2130
amd driver 22.5.1
python 3.9.13
tensorflow-cpu 2.10.0
tensorflow-directml-plugin 0.2.0.dev221020

Problem

After I updated tensorflow-directml-plugin from version 0.1.1.dev221004 to version 0.2.0.dev221020, I can not longer import tensorflow due to following error:
2022-10-24 10:50:24.173023: F tensorflow/c/experimental/stream_executor/stream_executor.cc:808] Non-OK-status: stream_executor::MultiPlatformManager::RegisterPlatform( std::move(cplatform)) status: INTERNAL: platform is already registered with name: "DML"

Full output:

(tfdml) C:\Users\Username>python
Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
2022-10-24 10:50:22.906122: I tensorflow/c/logging.cc:34] Successfully opened dynamic library C:\ProgramData\Anaconda3\envs\tfdml\lib\site-packages\tensorflow-plugins/directml/directml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.dll
2022-10-24 10:50:22.906980: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll
2022-10-24 10:50:22.910657: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll
2022-10-24 10:50:24.163258: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 2 compatible adapters.
2022-10-24 10:50:24.173023: F tensorflow/c/experimental/stream_executor/stream_executor.cc:808] Non-OK-status: stream_executor::MultiPlatformManager::RegisterPlatform( std::move(cplatform)) status: INTERNAL: platform is already registered with name: "DML"

UPD: Solved after reinstallation of tensorflow-cpu and tensorflow-directml-plugin

The DirectML device has encountered an unrecoverable error (DXGI_ERROR_DEVICE_REMOVED)

System:

  • python: 3.10
  • tensorflow-cpu: 2.10
  • OS: Windows 10 Pro
  • CPU: Intel(R) Core(TM) i7-7700
  • Graphics card: Radeon 580 8GB
  • Driver: Adrenalin Edition 22.11.2
  • Installation method: pip

Log:

2022-12-19 02:47:33.700540: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-19 02:47:33.701303: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (Radeon RX 580 Series)
2022-12-19 02:47:33.791806: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll
2022-12-19 02:47:33.793608: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-19 02:47:33.793865: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
2022-12-19 02:47:33.794200: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14004 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-12-19 02:47:45.714334: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-19 02:47:46.246620: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-19 02:47:46.247027: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14004 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-12-19 02:47:46.252054: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-19 02:47:46.252296: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14004 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-12-19 02:47:46.255226: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-19 02:47:46.255469: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14004 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-12-19 02:47:46.260202: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-19 02:47:46.260444: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14004 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-12-19 02:47:46.263423: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-19 02:47:46.263817: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14004 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-12-19 02:47:46.405264: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-19 02:47:46.405505: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14004 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-12-19 02:48:32.256535: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
 6/25 [======>.......................] - ETA: 56s2022-12-19 02:49:06.291523: E tensorflow/c/logging.cc:40] The DirectML device has encountered an unrecoverable error (DXGI_ERROR_DEVICE_REMOVED). This is most often caused by a timeout occurring on the GPU. Please visit https://aka.ms/tfdmltimeout for more information and troubleshooting steps.
2022-12-19 02:49:06.291925: F tensorflow/c/logging.cc:43] HRESULT failed with 0x887a0005: readback_heap->Map(0, nullptr, &readback_heap_data)

I use this model

This error occurs every other time:

import os
os.environ["TF_DIRECTML_MAX_ALLOC_SIZE"] = "536870912"

does not help

Training VGG13 net with RX6600 is slow

my environment:
windows 11 64bit
python 3.9 64bit
tensorflow 2.10
tensorflow-directml-plugin 0.2.0.dev221020
AMD Radeon RX 6600 Nvidia RTX1060
Conda 22.9.0

I'm training a VGG13 net in miniConda enviroment.I have two configurations:
1.Nvidia RTX1060 + tensorflow-gpu
2.RX6600(more powerful than RTX1060) + tensorflow-cpu + tensorflow-directml-plugin
With first configuration,it is very fast, about 6s each train period.But with second configuration,it is slower than the first configuration,only about 30s each train period.
I guess the reason of second configuration is slower, is it just uses tensorflow-cpu not tensorflow-gpu?Is it right?
Is there any way can improve the trainning speed with that second configuration? Or when tensorflow-directml-plugin can support tensorflow-gpu?

Thanks

HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason()

Envs:
Tensroflow 2.12
tensorflow_directml_plugin-0.5.0-cp39-cp39-win_amd64.whl
Python 3.9

Error:
F tensorflow/c/logging.cc:43] HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason(), and Python Restart

I build newest tensorflow_directml_plugin 0.5.0, but when I run minst example on tf,
some error happened: F tensorflow/c/logging.cc:43] HRESULT failed with 0x887a0001: dml_device_->GetDeviceRemovedReason()
and GPU memory and shared memory have substantial growth.

AI-Benchmark doesn't run correctly (instead of Tensorflow 1.15 DirectML which yes)..

Hi,
trying to run ai-benchmark on a system with:
RX Vega latest driver
Win 11 insider version

note old Tensorflow 1.15 DirectML works..

installed with on a clean Python 3.10.2 env:
pip install tensorflow-cpu==2.9.1
pip install tensorflow-directml-plugin
pip install ai-benchmark

seems main fail is:
tensorflow.python.framework.errors_impl.InvalidArgumentError: AssignRefVariable is not yet supported for pluggable devices in this version of TensorFlow.

log:

Python 3.10.2 (tags/v3.10.2:a58ebcc, Jan 17 2022, 14:12:15) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2022-06-23 20:53:42.774835: I tensorflow/c/logging.cc:34] Successfully opened dynamic library c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow-plugins/directml/directml.0de2b4431c6572ee74152a7ee0cd3fb1534e4a95.dll
2022-06-23 20:53:42.775767: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll
2022-06-23 20:53:42.780394: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll
2022-06-23 20:53:43.066846: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
>>>
>>> tf.config.experimental.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>>> print(tf.__version__)
2.9.1
>>> from ai_benchmark import AIBenchmark
>>> results = AIBenchmark().run()

>>   AI-Benchmark-v.0.1.2
>>   Let the AI Games begin..

2022-06-23 20:55:17.167099: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-23 20:55:17.167680: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (Radeon RX Vega)
2022-06-23 20:55:17.287625: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll
2022-06-23 20:55:17.289964: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-06-23 20:55:17.290192: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
2022-06-23 20:55:17.290639: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 14910 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-06-23 20:55:17.292723: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-06-23 20:55:17.292817: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 14910 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-06-23 20:55:17.293559: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-06-23 20:55:17.293667: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 14910 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
*  TF Version: 2.9.1
*  Platform: Windows-10-10.0.22489-SP0
*  CPU: N/A
*  CPU RAM: 32 GB
*  GPU/0: DML
*  GPU RAM: 14.6 GB
*  CUDA Version: N/A
*  CUDA Build: N/A

The benchmark is running...
The tests might take up to 20 minutes
Please don't interrupt the script

1/19. MobileNet-V2

2022-06-23 20:55:24.203219: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-06-23 20:55:24.203491: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14910 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-06-23 20:55:24.537069: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-06-23 20:55:24.561797: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
Traceback (most recent call last):
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\client\session.py", line 1377, in _do_call
    return fn(*args)
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: AssignRefVariable is not yet supported for pluggable devices in this version of TensorFlow.
         [[{{node MobilenetV2/expanded_conv_1/project/BatchNorm/beta/Assign}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\ai_benchmark\__init__.py", line 63, in run
    return run_tests(training=True, inference=True, micro=False, verbose=self.verbose,
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\ai_benchmark\utils.py", line 560, in run_tests
    tf.compat.v1.global_variables_initializer().run()
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\ops.py", line 2731, in run
    _run_using_default_session(self, feed_dict, self.graph, session)
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\ops.py", line 5782, in _run_using_default_session
    session.run(operation, feed_dict)
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\client\session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\client\session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\client\session.py", line 1396, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node 'MobilenetV2/expanded_conv_1/project/BatchNorm/beta/Assign' defined at (most recent call last):
    File "<stdin>", line 1, in <module>
    File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\ai_benchmark\__init__.py", line 63, in run
      return run_tests(training=True, inference=True, micro=False, verbose=self.verbose,
    File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\ai_benchmark\utils.py", line 557, in run_tests
      input_, output_, train_vars_ = getModelSrc(test, testInfo, sess)
    File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\ai_benchmark\utils.py", line 238, in getModelSrc
      tf.compat.v1.train.import_meta_graph(test.model_src, clear_devices=True)
Node: 'MobilenetV2/expanded_conv_1/project/BatchNorm/beta/Assign'
AssignRefVariable is not yet supported for pluggable devices in this version of TensorFlow.
         [[{{node MobilenetV2/expanded_conv_1/project/BatchNorm/beta/Assign}}]]

Original stack trace for 'MobilenetV2/expanded_conv_1/project/BatchNorm/beta/Assign':
  File "<stdin>", line 1, in <module>
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\ai_benchmark\__init__.py", line 63, in run
    return run_tests(training=True, inference=True, micro=False, verbose=self.verbose,
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\ai_benchmark\utils.py", line 557, in run_tests
    input_, output_, train_vars_ = getModelSrc(test, testInfo, sess)
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\ai_benchmark\utils.py", line 238, in getModelSrc
    tf.compat.v1.train.import_meta_graph(test.model_src, clear_devices=True)
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\training\saver.py", line 1582, in import_meta_graph
    return _import_meta_graph_with_return_elements(meta_graph_or_file,
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\training\saver.py", line 1603, in _import_meta_graph_with_return_elements
    meta_graph.import_scoped_meta_graph_with_return_elements(
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\meta_graph.py", line 804, in import_scoped_meta_graph_with_return_elements
    imported_return_elements = importer.import_graph_def(
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\util\deprecation.py", line 561, in new_func
    return func(*args, **kwargs)
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\importer.py", line 403, in import_graph_def
    return _import_graph_def_internal(
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\importer.py", line 516, in _import_graph_def_internal
    _ProcessNewOps(graph)
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\importer.py", line 247, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\ops.py", line 3904, in _add_new_tf_operations
    new_ops = [
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\ops.py", line 3905, in <listcomp>
    self._create_op_from_tf_operation(c_op, compute_device=compute_devices)
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\ops.py", line 3787, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "c:\users\rtfss\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\framework\ops.py", line 2133, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)

ValueError: Received incompatible tensor with shape (1, 1, 256, 128)

I'm with an error running the original VOC2012 example for training a object detection.

I re-trained the example partially to generate a checkpoint for quick test and then I ran train_voc.py file for generating the checkpoint model yolov3_train_1.tf.

If I use the original yolov3.tf (came running the setup.py file) in the detect.py works fine, but I don't know what is happening with the new checkpoint that I trained (yolov3_train_1.tf).

with the the original checkpoint (yolov3.tf) I get the log:

I0923 21:50:23.414862 16292 server.py:122] listener closed
I0923 21:50:23.414862 16292 server.py:270] server has terminated
2023-09-23 21:50:26.293868: I tensorflow/c/logging.cc:34] Successfully opened dynamic library C:\Users\leand\.conda\envs\yolo_env\lib\site-packages\tensorflow-plugins/directml/directml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.dll
2023-09-23 21:50:26.294425: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll
2023-09-23 21:50:26.296333: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll
2023-09-23 21:50:26.382625: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
2023-09-23 21:50:27.042213: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-23 21:50:27.042797: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (Intel(R) Iris(R) Xe Graphics)
2023-09-23 21:50:27.089313: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll
2023-09-23 21:50:27.090007: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:50:27.090312: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
I0923 21:50:30.521184 15116 detect.py:42] weights loaded
I0923 21:50:30.521184 15116 detect.py:45] classes loaded
2023-09-23 21:50:30.813031: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:50:30.813582: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2023-09-23 21:50:30.819849: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:50:30.820059: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2023-09-23 21:50:30.844042: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:50:30.844378: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2023-09-23 21:50:30.855851: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:50:30.856020: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2023-09-23 21:50:30.861914: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:50:30.862240: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2023-09-23 21:50:30.886910: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:50:30.887077: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2023-09-23 21:50:30.890870: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:50:30.891261: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
...
2023-09-23 21:50:30.932467: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:50:30.932625: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2023-09-23 21:50:30.953731: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:50:30.953925: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
I0923 21:50:30.953699 15116 detect.py:62] time: 0.348203182220459
I0923 21:50:30.953699 15116 detect.py:64] detections:
I0923 21:50:30.975726 15116 detect.py:66] 	car, 0.9748033285140991, [0.24280313 0.2940823  0.37936452 0.39064947]
I0923 21:50:30.975726 15116 detect.py:66] 	bird, 0.9257763028144836, [0.42066154 0.05204896 0.5695095  0.14767769]
I0923 21:50:30.975726 15116 detect.py:66] 	bus, 0.8974265456199646, [0.02114373 0.2955287  0.18957907 0.4136036 ]
I0923 21:50:30.975726 15116 detect.py:66] 	pottedplant, 0.8646465539932251, [0.01896213 0.7604627  0.18410942 0.9198695 ]
I0923 21:50:30.975726 15116 detect.py:66] 	motorbike, 0.7518333196640015, [0.60587585 0.5450276  0.73023164 0.6556344 ]
I0923 21:50:30.991388 15116 detect.py:66] 	dog, 0.7247338891029358, [0.20027779 0.52045083 0.3819927  0.8443599 ]
I0923 21:50:30.991388 15116 detect.py:66] 	person, 0.6809608340263367, [0.8091381  0.51088405 0.9747042  0.6712277 ]
I0923 21:50:30.991388 15116 detect.py:66] 	motorbike, 0.6792984008789062, [0.708943   0.53600967 0.7793795  0.6508231 ]
I0923 21:50:30.991388 15116 detect.py:66] 	tvmonitor, 0.6079525351524353, [0.8371077  0.77706623 0.9707338  0.90270257]
I0923 21:50:31.006989 15116 detect.py:66] 	bicycle, 0.5827189683914185, [0.22785279 0.03375612 0.3761612  0.17757493]
I0923 21:50:31.006989 15116 detect.py:66] 	dog, 0.5704740881919861, [0.38914445 0.5364528  0.51281893 0.6639898 ]
I0923 21:50:31.107016 15116 detect.py:73] output saved to: ./output.jpg

with the trained chepoint (yolov3_train_1.tf.) I get:

2023-09-23 21:54:25.902047: I tensorflow/c/logging.cc:34] Successfully opened dynamic library C:\Users\leand\.conda\envs\yolo_env\lib\site-packages\tensorflow-plugins/directml/directml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.dll
2023-09-23 21:54:25.902577: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll
2023-09-23 21:54:25.904550: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll
2023-09-23 21:54:26.003354: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
2023-09-23 21:54:26.661228: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-23 21:54:26.662240: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (Intel(R) Iris(R) Xe Graphics)
2023-09-23 21:54:26.703601: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll
2023-09-23 21:54:26.704314: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-23 21:54:26.704649: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027AF6F1A710> and <keras.layers.activation.leaky_relu.LeakyReLU object at 0x0000027AF6F1AEF0>).
W0923 21:54:29.208242  2608 restore.py:84] Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027AF6F1A710> and <keras.layers.activation.leaky_relu.LeakyReLU object at 0x0000027AF6F1AEF0>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.convolutional.conv2d.Conv2D object at 0x0000027AF708B430> and <keras.layers.reshaping.zero_padding2d.ZeroPadding2D object at 0x0000027AF701D990>).
W0923 21:54:29.208242  2608 restore.py:84] Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.convolutional.conv2d.Conv2D object at 0x0000027AF708B430> and <keras.layers.reshaping.zero_padding2d.ZeroPadding2D object at 0x0000027AF701D990>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027AF708BA60> and <keras.layers.convolutional.conv2d.Conv2D object at 0x0000027AF708B430>).
W0923 21:54:29.208242  2608 restore.py:84] Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027AF708BA60> and <keras.layers.convolutional.conv2d.Conv2D object at 0x0000027AF708B430>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.convolutional.conv2d.Conv2D object at 0x0000027AF70CEF20> and <keras.layers.activation.leaky_relu.LeakyReLU object at 0x0000027AF70AC580>).
W0923 21:54:29.223865  2608 restore.py:84] Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.convolutional.conv2d.Conv2D object at 0x0000027AF70CEF20> and <keras.layers.activation.leaky_relu.LeakyReLU object at 0x0000027AF70AC580>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.convolutional.conv2d.Conv2D object at 0x0000027AF70E9FF0> and <keras.layers.activation.leaky_relu.LeakyReLU object at 0x0000027AF70EB340>).
...
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027A9A9730D0> and <keras.engine.input_layer.InputLayer object at 0x0000027A9A94D810>).
W0923 21:54:29.386797  2608 restore.py:84] Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027A9A9730D0> and <keras.engine.input_layer.InputLayer object at 0x0000027A9A94D810>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.convolutional.conv2d.Conv2D object at 0x0000027A9A94CD30> and <keras.layers.merging.concatenate.Concatenate object at 0x0000027A9A94E110>).
W0923 21:54:29.386797  2608 restore.py:84] Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.convolutional.conv2d.Conv2D object at 0x0000027A9A94CD30> and <keras.layers.merging.concatenate.Concatenate object at 0x0000027A9A94E110>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027A9A94CD60> and <keras.layers.convolutional.conv2d.Conv2D object at 0x0000027A9A94CD30>).
W0923 21:54:29.386797  2608 restore.py:84] Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027A9A94CD60> and <keras.layers.convolutional.conv2d.Conv2D object at 0x0000027A9A94CD30>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.convolutional.conv2d.Conv2D object at 0x0000027A9A92F130> and <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027A9A94CD60>).
W0923 21:54:29.386797  2608 restore.py:84] Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.convolutional.conv2d.Conv2D object at 0x0000027A9A92F130> and <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027A9A94CD60>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027AFD7C9510> and <keras.layers.activation.leaky_relu.LeakyReLU object at 0x0000027A9A94C670>).
W0923 21:54:29.386797  2608 restore.py:84] Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.layers.normalization.batch_normalization.BatchNormalization object at 0x0000027AFD7C9510> and <keras.layers.activation.leaky_relu.LeakyReLU object at 0x0000027A9A94C670>).

Traceback (most recent call last):
  File "E:\sandbox\DirectML-master\TensorFlow\TF2\yolov3-tf2\detect.py", line 78, in <module>
    app.run(main)
  File "C:\Users\leand\AppData\Roaming\Python\Python310\site-packages\absl\app.py", line 308, in run
    _run_main(main, args)
  File "C:\Users\leand\AppData\Roaming\Python\Python310\site-packages\absl\app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "E:\sandbox\DirectML-master\TensorFlow\TF2\yolov3-tf2\detect.py", line 41, in main
    yolo.load_weights(FLAGS.weights).expect_partial()
  File "C:\Users\leand\AppData\Roaming\Python\Python310\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\leand\AppData\Roaming\Python\Python310\site-packages\tensorflow\python\training\saving\saveable_object_util.py", line 135, in restore
    raise ValueError(
ValueError: Received incompatible tensor with shape (1, 1, 256, 128) when attempting to restore variable with shape (1, 1, 128, 64) and name layer_with_weights-0/layer_with_weights-10/kernel/.ATTRIBUTES/VARIABLE_VALUE.

Using the plugin with Tensorflow Recommenders.

Hello, I'm trying to use DirectML with TFRS library.
the model works with Tensorflow-CPU and with Cuda without issues, however when trying this DirectML plugin on an AMD GPU

I get the following error

2022-09-23 17:08:11.821401: I tensorflow/c/logging.cc:34] Successfully opened dynamic library C:\Users\7gab\anaconda3\envs\tfRecommenders\lib\site-packages\tensorflow-plugins/directml/directml.0de2b4431c6572ee74152a7ee0cd3fb1534e4a95.dll
2022-09-23 17:08:11.822066: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll 
2022-09-23 17:08:11.823965: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll
2022-09-23 17:08:11.970227: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
2022-09-23 17:08:21.035934: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-23 17:08:21.036666: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (AMD Radeon RX 6600)
2022-09-23 17:08:21.100537: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll
2022-09-23 17:08:21.103005: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-09-23 17:08:21.103449: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
2022-09-23 17:08:21.103645: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6935 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
uniqye items 130
2022-09-23 17:08:31.666967: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-09-23 17:08:31.703858: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-09-23 17:08:32.110711: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-09-23 17:08:32.146635: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-09-23 17:08:32.526733: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-09-23 17:08:32.554012: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
Epoch 1/200
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor. Received: inputs={'temp': <tf.Tensor 'IteratorGetNext:4' shape=(None,) dtype=int32>, 'humidity': <tf.Tensor 'IteratorGetNext:2' shape=(None,) dtype=int32>, 'timeStamp': <tf.Tensor 'IteratorGetNext:5' shape=(None,) dtype=float32>, 'customerId': <tf.Tensor 'IteratorGetNext:0' shape=(None,) dtype=string>, 'hourCategory': <tf.Tensor 'IteratorGetNext:1' shape=(None,) dtype=string>}. Consider rewriting this model with the Functional API.
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor. Received: inputs={'temp': <tf.Tensor 'IteratorGetNext:4' shape=(None,) dtype=int32>, 'humidity': <tf.Tensor 'IteratorGetNext:2' shape=(None,) dtype=int32>, 'timeStamp': <tf.Tensor 'IteratorGetNext:5' shape=(None,) dtype=float32>, 'customerId': <tf.Tensor 'IteratorGetNext:0' shape=(None,) dtype=string>, 'hourCategory': <tf.Tensor 'IteratorGetNext:1' shape=(None,) dtype=string>}. Consider rewriting this model with the Functional API.
Traceback (most recent call last):
  File "f:/tensorflow_recommender/ContextRetrievalTrain.py", line 238, in <module>
    trainContextRetrieval()
  File "f:/tensorflow_recommender/ContextRetrievalTrain.py", line 212, in trainContextRetrieval
    model.fit(cached_train, epochs=200)
  File "C:\Users\7gab\anaconda3\envs\tfRecommenders\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\7gab\anaconda3\envs\tfRecommenders\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation sequential_6/user_model/sequential_1/embedding_1/embedding_lookup: Could not satisfy explicit device specification '' 
because the node {{colocation_node sequential_6/user_model/sequential_1/embedding_1/embedding_lookup}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
StridedSlice: CPU
Unique: GPU CPU
Shape: GPU CPU
_Arg: GPU CPU
ResourceGather: GPU CPU
Identity: GPU CPU
Const: GPU CPU
UnsortedSegmentSum: CPU
ResourceSparseApplyAdagradV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  sequential_6_user_model_sequential_1_embedding_1_embedding_lookup_5809 (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adagrad_adagrad_update_resourcesparseapplyadagradv2_accum (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential_6/user_model/sequential_1/embedding_1/embedding_lookup (ResourceGather)
  sequential_6/user_model/sequential_1/embedding_1/embedding_lookup/Identity (Identity)
  Adagrad/Adagrad/update/Unique (Unique) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/Shape (Shape) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/strided_slice/stack (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/strided_slice/stack_1 (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/strided_slice/stack_2 (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/strided_slice (StridedSlice) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/UnsortedSegmentSum (UnsortedSegmentSum) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0

         [[{{node sequential_6/user_model/sequential_1/embedding_1/embedding_lookup}}]] [Op:__inference_train_function_6193]

while trying to train the retrieval model.

I know that this plugin is still not in stable state, but I'm just making sure if this will work / be supported in the future or should I just depend on Tensorflow-cpu / Nvidia GPU?

No float16 support for _FusedConv2D

When trying to use the tensorflow directml plugin with a model that works fine with tensorflow-gpu, I'm getting this error.

2 root error(s) found.
  (0) NOT_FOUND:  No registered '_FusedConv2D' OpKernel for 'GPU' devices compatible with node {{node model/conv2d/Relu}}
         (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_HALF, _XlaHasReferenceVars=false, data_format="NCHW", dilations=[1, 1, 1, 1], epsilon=0, explicit_paddings=[], fused_ops=["BiasAdd", "Relu"], leakyrelu_alpha=0.2, num_args=1, padding="SAME", strides=[1, 1, 1, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"
        .  Registered:  device='GPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_BFLOAT16]

         [[StatefulPartitionedCall/StatefulPartitionedCall_1/model/conv2d/Relu]]

This looks like fused 2D convolutions have float32, float64, and bfloat16 support, but not float16. Unfortunately, using bfloat16 is not an option in my case because the computers where inference is going to run will have GPUs that support float16 but not yet bfloat16.

Would it be possible to add support for float16 too?

I'm using tensorflow-directml-plugin version 0.3.0.dev221212, and tensorflow-cpu version 2.10.0 on a Windows 10 machine, no WSL.

Adding a new device

Hi,
I am intreseted to find out what I need to do to add a new accelerator under DirectML.
I noticed that it currently supports Direct3D (DirectX12 devices) - which means GPUs only.
However, this accelerator isn't related to display and doesn't support DirectX12 framework.
What is the roadmap regarding adding support to non-display (non-GPU) hardware for DirectML?
Is there any way to integrate my accelerator into DirectML at the moment?

Thanks,
Roy

Support for mixed precision

Have an rx 6900xt which runs inference on stable diffusion in 33s. My 16 GB rtx a4000 does the same in 6.7s. DirectML is not a serious alternative to neither of ROCm and CUDA, without support or emulation for tensor cores. AMD inference times are 6 times slower than the equivalent Nvidia card running CUDA. Even ROCm has massive gains on Radeon cards without any actual matrix cores.

Any chance the plugin gets real mixed precision support? What are your plans going forward with regards to performance?

Thanks in advance for taking your time to address these concerns.

Status on libtensorflow (C API) support on Windows

Would like to follow up on this discussion to allow the plugin to work with libtensorflow on Windows:
https://discuss.tensorflow.org/t/is-it-possible-to-build-tf-with-pluggable-devices-plugin-help-request-pluggable-device/10985/18

It has been close to a year since this was discussed. As per the comment above, the issue to use the plugin should have been resolved by TF 2.12. The latest TF release is 2.14.

Would also like to know if there are any usable binaries that we could test on Windows?

Errors when using the tensorflow or tensorflow-gpu packages

What is the recommended development environment?

  • my environment
    • windows 10 64bit
    • python 3.8.10 64bit
    • tensorflow 2.9.1
    • tensorflow-directml-plugin 0.0.1.dev220621
    • Intel Core i5-1135G7, Intel Iris Xe Graphics

An error occurs when running a simple example.

  • test code
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#---------------------------------------------------------------------------------------------------
"""..."""

import tensorflow as tf

tf.debugging.set_log_device_placement(True)
# tf.enable_eager_execution()

from tensorflow import keras
from tensorflow.keras import layers

#---------------------------------------------------------------------------------------------------
def ex() -> None:
    """..."""
    inputs = keras.Input(shape=(3, 32, 32))
    x = layers.Conv2D(32, (3, 3), activation="relu")(inputs)
    x = layers.Flatten()(x)
    x = layers.Dense(256, activation="relu")(x)
    outputs = layers.Dense(10, activation="softmax")(x)

    model = keras.Model(inputs, outputs)
    model.summary()
#---------------------------------------------------------------------------------------------------
def main() -> None:
    """..."""
    ex()
#---------------------------------------------------------------------------------------------------
if __name__ == "__main__":
    main()
  • error message
(tf2dml_38) d:\ai\ai_test>python e.py
2022-06-23 14:46:49.848519: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-06-23 14:46:49.848783: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-06-23 14:46:51.668324: I tensorflow/c/logging.cc:34] Successfully opened dynamic library D:\devtool\pyvenv\tf2dml_38\lib\site-packages\tensorflow-plugins/directml/directml.0de2b4431c6572ee74152a7ee0cd3fb1534e4a95.dll
2022-06-23 14:46:51.669120: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll
2022-06-23 14:46:51.672098: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll
2022-06-23 14:46:51.733125: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
2022-06-23 14:46:52.042786: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-23 14:46:52.043678: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (Intel(R) Iris(R) Xe Graphics)
2022-06-23 14:46:52.070285: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll
2022-06-23 14:46:52.071152: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-06-23 14:46:52.071254: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
2022-06-23 14:46:52.071518: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6951 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-06-23 14:46:52.074304: I tensorflow/core/common_runtime/eager/execute.cc:1323] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-23 14:46:52.076556: I tensorflow/core/common_runtime/eager/execute.cc:1323] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2022-06-23 14:46:52.076744: I tensorflow/core/common_runtime/eager/execute.cc:1323] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Traceback (most recent call last):
  File "e.py", line 31, in <module>
    main()
  File "e.py", line 28, in main
    ex()
  File "e.py", line 18, in ex
    x = layers.Conv2D(32, (3, 3), activation="relu")(inputs)
  File "D:\devtool\pyvenv\tf2dml_38\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "D:\devtool\pyvenv\tf2dml_38\lib\site-packages\keras\backend.py", line 1920, in random_uniform
    return tf.random.uniform(
tensorflow.python.framework.errors_impl.InvalidArgumentError: Multiple OpKernel registrations match NodeDef at the same priority '{{node RandomUniform}}': 'op: "RandomUniform" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } constraint { name: "dtype" allowed_values { list { type: DT_FLOAT } } } host_memory_arg: "shape"' and 'op: "RandomUniform" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } constraint { name: "dtype" allowed_values { list { type: DT_FLOAT } } } host_memory_arg: "shape"' [Op:RandomUniform]
  • error message with tensorflow 2.8.0
(tf2dml_38) d:\ai\ai_test>python e.py
2022-06-23 14:50:16.448936: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-06-23 14:50:16.449160: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "e.py", line 6, in <module>
    import tensorflow as tf
  File "D:\devtool\pyvenv\tf2dml_38\lib\site-packages\tensorflow\__init__.py", line 443, in <module>
    _ll.load_library(_plugin_dir)
  File "D:\devtool\pyvenv\tf2dml_38\lib\site-packages\tensorflow\python\framework\load_library.py", line 151, in load_library
    py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: D:\devtool\pyvenv\tf2dml_38\lib\site-packages\tensorflow-plugins\tfdml_plugin.dll not found

High usage of CPU viewed from Task Manager

Hello, I am new with this DirectML thing and try to use it for Automatic Speech Recognition from Keras (https://keras.io/examples/audio/transformer_asr/). From Task manager it is shown that the CPU usage is high and the GPU usage is pretty low

My Specs:
Windows 11 22H2 OS build 22621.1848
AMD Ryzen 5 7600X
Radeon RX6700XT
tensorflow-directml-plugin 0.3.0.dev221212
tensorflow-cpu 2.10.0
Keras 2.10.0
16GB RAM (is this needed?)
Screenshot 2023-07-02 175624

Is this like an error, or is this intended? For code everything is basically the same as from the link above, maybe only tweak the file extension part to match how Windows read the backslash but the network is the same...

tfdml_plugin.dll not found

tensorflow.python.framework.errors_impl.NotFoundError: C:\ProgramData\Miniconda3\envs\tfdml_plugin\lib\site-packages\tensorflow-plugins\tfdml_plugin.dll not found

when i install,run any code, like import tensorflow as tf,it is error ,not found tfdml_plugin.dll . but,the file is right in the path.
how can i do ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.