apple / tensorflow_macos Goto Github PK

View Code? Open in Web Editor NEW

3.6K 3.6K 308.0 18 KB

TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.

License: Other

Shell 100.00%

tensorflow_macos's Introduction

You can now leverage Apple’s tensorflow-metal PluggableDevice in TensorFlow v2.5 for accelerated training on Mac GPUs directly with Metal. Learn more here.

Mac-optimized TensorFlow and TensorFlow Addons

INTRODUCTION

This pre-release delivers hardware-accelerated TensorFlow and TensorFlow Addons for macOS 11.0+. Native hardware acceleration is supported on M1 Macs and Intel-based Macs through Apple’s ML Compute framework.

CURRENT RELEASE

0.1-alpha3

SUPPORTED VERSIONS

TensorFlow r2.4rc0
TensorFlow Addons 0.11.2

REQUIREMENTS

macOS 11.0+
Python 3.8 (required to be downloaded from Xcode Command Line Tools for M1 Macs).

INSTALLATION

An archive containing Python packages and an installation script can be downloaded from the releases.

To quickly try this out, copy and paste the following into Terminal:
```
% /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/apple/tensorflow_macos/master/scripts/download_and_install.sh)"
```
This will verify your system, ask you for confirmation, then create a virtual environment with TensorFlow for macOS installed.

Alternatively, download the archive file from the releases. The archive contains an installation script, accelerated versions of TensorFlow, TensorFlow Addons, and needed dependencies.

% curl -fLO https://github.com/apple/tensorflow_macos/releases/download/v0.1alpha2/tensorflow_macos-${VERSION}.tar.gz
% tar xvzf tensorflow_macos-${VERSION}.tar
% cd tensorflow_macos
% ./install_venv.sh --prompt

Installation on Conda

This pre-release version supports installation and testing using the Python from Xcode Command Line Tools. See #153 for more information on installation in a Conda environment.

Notes

For M1 Macs, the following packages are currently unavailable:

SciPy and dependent packages
Server/Client TensorBoard packages

When installing pip packages in a virtual environment, you may need to specify --target as follows:

% pip install --upgrade -t "${VIRTUAL_ENV}/lib/python3.8/site-packages/" PACKAGE_NAME

ISSUES AND FEEDBACK

Please submit feature requests or report issues via GitHub Issues.

ADDITIONAL INFORMATION

Device Selection (Optional)

It is not necessary to make any changes to your existing TensorFlow scripts to use ML Compute as a backend for TensorFlow and TensorFlow Addons.

There is an optional mlcompute.set_mlc_device(device_name='any') API for ML Compute device selection. The default value for device_name is 'any', which means ML Compute will select the best available device on your system, including multiple GPUs on multi-GPU configurations. Other available options are 'cpu' and 'gpu'. Please note that in eager mode, ML Compute will use the CPU. For example, to choose the CPU device, you may do the following:

# Import mlcompute module to use the optional set_mlc_device API for device selection with ML Compute.
from tensorflow.python.compiler.mlcompute import mlcompute

# Select CPU device.
mlcompute.set_mlc_device(device_name='cpu') # Available options are 'cpu', 'gpu', and 'any'.

Unsupported TensorFlow Features

The following TensorFlow features are currently not supported in this fork:

tf.vectorized_map
Higher-order gradients
Jacobian-vector products (aka. forwardprop)

Logs and Debugging

Graph mode

Logging provides more information about what happens when a TensorFlow model is optimized by ML Compute. Turn logging on by setting the environment variable TF_MLC_LOGGING=1 when executing the model script. The following is the list of information that is logged in graph mode:

Device used by ML Compute.
Original TensorFlow graph without ML Compute.
TensorFlow graph after TensorFlow operations have been replaced with ML Compute.
- Look for MLCSubgraphOp nodes in this graph. Each of these nodes replaces a TensorFlow subgraph from the original graph, encapsulating all the operations in the subgraph. This, for example, can be used to determine which operations are being optimized by ML Compute.
Number of subgraphs using ML Compute and how many operations are included in each of these subgraphs.
- Having larger subgraphs that encapsulate big portions of the original graph usually results in better performance from ML Compute. Note that for training, there will usually be at least two MLCSubgraphOp nodes (representing forward and backward/gradient subgraphs).
TensorFlow subgraphs that correspond to each of the ML Compute graphs.

Eager mode

Unlike graph mode, logging in eager mode is controlled by TF_CPP_MIN_VLOG_LEVEL. The following is the list of information that is logged in eager mode:

The buffer pointer and shape of input/output tensor.
The key for associating the tensor’s buffer to built the MLCTraining or MLCInference graph. This key is used to retrieve the graph and run a backward pass or an optimizer update.
The weight tensor format.
Caching statistics, such as insertions and deletions.

Tips for debugging

Larger models being trained on the GPU may use more memory than is available, resulting in paging. If this happens, try decreasing the batch size or the number of layers.
TensorFlow is multi-threaded, which means that different TensorFlow operations, such as MLCSubgraphOp, can execute concurrently. As a result, there may be overlapping logging information. To avoid this during the debugging process, set TensorFlow to execute operators sequentially by setting the number of threads to 1 (see tf.config.threading.set_inter_op_parallelism_threads).
In eager mode, you may disable the conversion of any operation to ML Compute by using TF_DISABLE_MLC_EAGER=“;Op1;Op2;...”. The gradient op may also need to be disabled by modifying the file $PYTHONHOME/site-packages/tensorflow/python/ops/_grad.py (this avoids TensorFlow recompilation).
To initialize allocated memory with a specific value, use TF_MLC_ALLOCATOR_INIT_VALUE=<init-value>.
To disable ML Compute acceleration (e.g. for debugging or results verification), set the environment variable TF_DISABLE_MLC=1.

tensorflow_macos's People

Contributors

Stargazers

Watchers

Forkers

jeanjerome devbox10 geog0521 jackliaoall-tensorflow-related ronnyh marshallho ssghost firmanhadi themorgantown stjordanis pengxieasurion hongyunnchen aiot5g stephonye nguyenducnhaty dim25 tjdev7 am9x hhy5277 lesliesibbs mattmacleod sector84 parkourcx architectureofthings tien-le zeta1999 afilimonov jeffgan99 joshjong cookingcodewithme antoalli siyuliu0329 hwayguo richipreso hpi-clinical-ocr roycezjq kpskylonely liamyy jiaqi-knight yyolk phoitack suvarnak impressostudios monikavila emailandxu maximli seaosrobotics sahwar arvel00 leetesla guillevent fpli-mbr kukupigs zobeirraisi mwidjaja1 dennyglee jayaudaykmar26589 hustonio magikmaker stuti-madaan adbmd fundou badshahmukherjeesas iamgreatwk lengocgiang roberteboone 5917549999 jesselt jasonghent stephanboner ylff ponewor alexbaylis41 c-schel dharai artshevchenko d2works md1993 mitchell2090 mtamburro theriley106 daftano seongl avr248 danieljtait lzf-peter aphougat pan-yangxu andy-96 pooyadavoodi sun1638650145 idea-lover aruis yux keeboo deval520 techietom sam-mc1 chinitaberrio shantinaser

tensorflow_macos's Issues

incompatible cpu-subtype error

I ran the installer script and activated the virtualenv as mentioned at the end of the installation.

At the python interpreter I get the following error. Is there some other package to install to get this working on an M1 Mac?

>>> import tensorflow as tf
Traceback (most recent call last):
  File "/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: dlopen(/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 6): no suitable image found.  Did find:
	/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: incompatible cpu-subtype: 0x00000000 in /Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
	/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: incompatible cpu-subtype: 0x00000000 in /Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 39, in <module>
    from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
  File "/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 83, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: dlopen(/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 6): no suitable image found.  Did find:
	/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: incompatible cpu-subtype: 0x00000000 in /Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
	/Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: incompatible cpu-subtype: 0x00000000 in /Users/macuser/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

Tensorflow cannot be installed in Mac M1 because of error ERROR: numpy-1.18.5-cp38-cp38-macosx_11_0_arm64.whl is not a supported wheel on this platform.

I am trying to install tensor flow in the new MacBook Pro M1 but it gives the error
ERROR: numpy-1.18.5-cp38-cp38-macosx_11_0_arm64.whl is not a supported wheel on this platform.

Tensorflow Keras running extremely slow on GPU in M1 chip

I used the code attached here for the benchmark:

When specifying to use GPU with the following code, the performance is extremely slow (about 7 minutes per epoch):

from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')
tf.config.run_functions_eagerly(False)

Once use cpu without specifying the above code, the performance is 31 seconds per epoch.

Why the GPU performance is much lower compared to use CPU

The kernel appears to have died. It will restart automatically.

After modifying the name of the installation package, I successfully created a virtual environment with the official script and installed tensorflow, but when I use jupyter to run the code, it still fails to run。The kernel appears to have died when i training

Any benchmark available?

Hi Team,

Awesome job!!!! ❤️

Is it possible to show some graphs for a benchmark with and without the plugin?

M1 wrong architecture after install script

Im getting an error in my terminal when trying to run.

heres my trace:

/Users/tomjefferis/tensorflow_macos_venv/bin/python /Users/tomjefferis/PycharmProjects/pythonProject/test.py
Traceback (most recent call last):
File "/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in
from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: dlopen(/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 6): no suitable image found. Did find:
/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: mach-o, but wrong architecture
/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: mach-o, but wrong architecture

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/Users/tomjefferis/PycharmProjects/pythonProject/test.py", line 2, in
from tensorflow.python.compiler.mlcompute import mlcompute
File "/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/init.py", line 41, in
from tensorflow.python.tools import module_util as _module_util
File "/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/init.py", line 39, in
from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
File "/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 83, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in
from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: dlopen(/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 6): no suitable image found. Did find:
/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: mach-o, but wrong architecture
/Users/tomjefferis/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: mach-o, but wrong architecture

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

Process finished with exit code 1

ERROR: numpy-1.18.5-cp38-cp38-macosx_11_0_x86_64.whl is not a supported wheel on this platform.

I was trying to install tensorflow macos by running the script according to the instructions:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/apple/tensorflow_macos/master/scripts/download_and_install.sh)"

However, I got an error as shown below:

Installing and upgrading base packages.
Requirement already satisfied: pip in ./tensorflow_macos_venv/lib/python3.8/site-packages (20.3)
Requirement already satisfied: wheel in ./tensorflow_macos_venv/lib/python3.8/site-packages (0.35.1)
Requirement already satisfied: setuptools in ./tensorflow_macos_venv/lib/python3.8/site-packages (50.3.2)
Requirement already satisfied: cached-property in ./tensorflow_macos_venv/lib/python3.8/site-packages (1.5.2)
Requirement already satisfied: six in ./tensorflow_macos_venv/lib/python3.8/site-packages (1.15.0)

Installing bundled binary dependencies.
ERROR: numpy-1.18.5-cp38-cp38-macosx_11_0_x86_64.whl is not a supported wheel on this platform.

Any idea how to fix it?

Support for Swift for TensorFlow

I'm ecstatic that GPU-accelerated machine learning is now possible with TensorFlow on Mac. It largely renders my own SwiftML project unnecessary, but there's still an unanswered question: does this TensorFlow fork support Swift for TensorFlow?

As far as I'm aware, Swift for TensorFlow depends on the shared core framework that underpins all of the various TensorFlow distributions, and it therefore supports whatever hardware that that core framework supports. Could this fork that Apple created be used as a drop-in replacement for the standard core framework to enable GPU acceleration for Swift for TensorFlow on Mac? If not, then could it be used in some other manner to achieve the same result?

Terminating app due to uncaught exception 'NSInvalidArgumentException'

Hi,

I'm trying to use tensorflow for mac to train my model using the tensorforce library.
However, it crashes at model initialization with this stacktrace:

2020-12-03 19:06:42.765 Python[26957:1076199] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[__NSArrayM insertObject:atIndex:]: object cannot be nil'
*** First throw call stack:
(
        0   CoreFoundation                      0x0000000185689320 __exceptionPreprocess + 240
        1   libobjc.A.dylib                     0x00000001853b7c04 objc_exception_throw + 60
        2   CoreFoundation                      0x0000000185750064 -[__NSCFString characterAtIndex:].cold.1 + 0
        3   CoreFoundation                      0x000000018574d5b4 -[__NSArrayM insertObject:atIndex:].cold.2 + 0
        4   CoreFoundation                      0x00000001855a8274 -[__NSArrayM insertObject:atIndex:] + 1020
        5   _pywrap_tensorflow_internal.so      0x0000000106f552e0 _ZN10tensorflow9mlcompute5eager15MLCGraphBuilder3AddINSt3__16__bindIMNS1_10MLCEagerOpIfEEKFNS_6StatusEPNS_15OpKernelContextEPKNS_6TensorEPU8__strongP9MLCTensorEJPNS1_14MLCEagerMatMulIfEERSA_RKNS4_12placeholders4__phILi1EEERKNSP_ILi2EEEEEEEES8_P8MLCLayerNS4_6vectorISD_NS4_9allocatorISD_EEEESD_T_ + 340
        6   _pywrap_tensorflow_internal.so      0x0000000106f54754 _ZN10tensorflow9mlcompute5eager15MLCGraphBuilder22RunGraphWithSingleNodeINSt3__16__bindIMNS1_14MLCEagerMatMulIfEEKFP22MLCFullyConnectedLayerPNS_15OpKernelContextERS2_EJPS7_RSB_RKNS4_12placeholders4__phILi1EEEEEEPKNS_6TensorENS4_6vectorIPvNS4_9allocatorISR_EEEENS5_IMNS1_10MLCEagerOpIfEEKFNS_6StatusESB_SP_PU8__strongP9MLCTensorEJSF_SG_SL_RKNSI_ILi2EEEEEEEESX_jRT_NSQ_ISP_NSS_ISP_EEEESP_NSQ_IPSN_NSS_IS1C_EEEET0_T1_P9MLCDeviceSB_T2_NS4_12basic_stringIcNS4_11char_traitsIcEENSS_IcEEEE + 1664
        7   _pywrap_tensorflow_internal.so      0x0000000106f53894 _ZN10tensorflow9mlcompute5eager15MLCGraphBuilder22RunGraphWithSingleNodeINSt3__16__bindIMNS1_14MLCEagerMatMulIfEEKFP22MLCFullyConnectedLayerPNS_15OpKernelContextERS2_EJPS7_RSB_RKNS4_12placeholders4__phILi1EEEEEEPKNS_6TensorENS4_6vectorIPvNS4_9allocatorISR_EEEENS5_IMNS1_10MLCEagerOpIfEEKFNS_6StatusESB_SP_PU8__strongP9MLCTensorEJSF_SG_SL_RKNSI_ILi2EEEEEEEESX_jRT_SP_SP_PSN_T0_T1_P9MLCDeviceSB_T2_NS4_12basic_stringIcNS4_11char_traitsIcEENSS_IcEEEE + 256
        8   _pywrap_tensorflow_internal.so      0x0000000106f51d9c _ZN10tensorflow9mlcompute5eager14MLCEagerMatMulIfE7ComputeEPNS_15OpKernelContextE + 1620
        9   libtensorflow_framework.2.dylib     0x00000001194d75ac _ZN10tensorflow12_GLOBAL__N_113ExecutorStateINS_21SimplePropagatorStateEE7ProcessENS2_10TaggedNodeEx + 2772
        10  libtensorflow_framework.2.dylib     0x000000011954a4b4 _ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi + 552
        11  libtensorflow_framework.2.dylib     0x000000011954a18c _ZZN10tensorflow6thread16EigenEnvironment12CreateThreadENSt3__18functionIFvvEEEENKUlvE_clEv + 80
        12  libtensorflow_framework.2.dylib     0x000000011953ba74 _ZN10tensorflow12_GLOBAL__N_17PThread8ThreadFnEPv + 104
        13  libsystem_pthread.dylib             0x000000018551106c _pthread_start + 320
        14  libsystem_pthread.dylib             0x000000018550bda0 thread_start + 8
)
libc++abi.dylib: terminating with uncaught exception of type NSException

It crashes in tensorflow/python/eager/def_function.py line 724.
Unfortunately the example is way too complex to put in a github ticket.

Do you haave any hint at what can actually be null ?

Thanks!

Slow GPU performance on Radeon Pro 460

I have tested this fork on a simple MNIST example: https://github.com/tensorflow/datasets/blob/master/docs/keras_example.ipynb
By default the model uses CPU and takes 3ms per step.
If I change to GPU using mlcompute.set_mlc_device(device_name=‘gpu’) each step takes around 12ms.

I am running MacOS Big Sur Version 11.0.1 on the Macbook Pro 2016 15-inch with a Radeon Pro 460 4 GB.
Is it expected that the GPU will run much slower than the CPU?

Almost 4X slower than an i7 chip

I developed an "Embedding + MLP" neural network and read zipped TFRecords using tf.data.TFRecordDataset, the time consumed per step is about 75ms compared to my i7 chip MBP which only takes 22ms per step, it's almost 4X slower.

Exception during training on mac intel (see trace)

Here's my trace:

2020-11-20 11:35:11.019757: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-20 11:35:11.669 Python[62946:1516017] *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[__NSArrayM objectAtIndexedSubscript:]: index 2 beyond bounds [0 .. 1]'
*** First throw call stack:
(
0 CoreFoundation 0x00007fff204856af __exceptionPreprocess + 242
1 libobjc.A.dylib 0x00007fff201bd3c9 objc_exception_throw + 48
2 CoreFoundation 0x00007fff20539a9a -[__NSCFString characterAtIndex:].cold.1 + 0
3 CoreFoundation 0x00007fff203f8e31 -[__NSArrayM objectAtIndexedSubscript:] + 169
4 MLCompute 0x00007fff2a03aa9b -[MLCTrainingGraph executeGradientFromLayerIndex:batchSize:] + 397
5 MLCompute 0x00007fff2a03f547 -[MLCTrainingGraph executeGradientWithBatchSize:options:outputsData:completionHandler:] + 1000
6 _pywrap_tensorflow_internal.so 0x000000012112f325 _ZN10tensorflow9mlcompute3ops13MLCSubgraphOp23ExecuteMLCTrainingGraphEPNS_15OpKernelContextEPNS1_27MLCSubgraphExecutionContextEj + 533
7 _pywrap_tensorflow_internal.so 0x000000012112e591 _ZN10tensorflow9mlcompute3ops13MLCSubgraphOp20ProcessMLCSubgraphOpEPNS_15OpKernelContextEPNS1_27MLCSubgraphExecutionContextE + 1205
8 _pywrap_tensorflow_internal.so 0x000000012113163a _ZN10tensorflow9mlcompute3ops13MLCSubgraphOp7ComputeEPNS_15OpKernelContextE + 1208
9 libtensorflow_framework.2.dylib 0x0000000135f6821d _ZN10tensorflow12_GLOBAL__N_113ExecutorStateINS_21SimplePropagatorStateEE7ProcessENS2_10TaggedNodeEx + 3765
10 libtensorflow_framework.2.dylib 0x0000000135fe5f3b _ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi + 605
11 libtensorflow_framework.2.dylib 0x0000000135fe5be2 _ZZN10tensorflow6thread16EigenEnvironment12CreateThreadENSt3__18functionIFvvEEEENKUlvE_clEv + 66
12 libtensorflow_framework.2.dylib 0x0000000135fd6eb1 _ZN10tensorflow12_GLOBAL__N_17PThread8ThreadFnEPv + 97
13 libsystem_pthread.dylib 0x00007fff20313950 _pthread_start + 224
14 libsystem_pthread.dylib 0x00007fff2030f47b thread_start + 15
)
libc++abi.dylib: terminating with uncaught exception of type NSException
/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown

Reproducing the ResNet50 benchmark

Running TensorFlow official model garden ResNet50-v1.5 gives the following error on an M1 Mac mini:

$ PYTHONPATH=$(PWD) python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --use_synthetic_data
WARNING:tensorflow:Some requested devices in `tf.distribute.Strategy` are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0
W1119 18:36:12.865904 4373019968 cross_device_ops.py:1316] Some requested devices in `tf.distribute.Strategy` are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I1119 18:36:12.866842 4373019968 mirrored_strategy.py:350] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I1119 18:36:12.867031 4373019968 resnet_ctl_imagenet_main.py:138] Training 1 epochs, each epoch has 40036 steps, total steps: 40036; Eval 1563 steps
I1119 18:36:13.539694 4373019968 controller.py:365] restoring or initializing model...
restoring or initializing model...
I1119 18:36:13.539777 4373019968 controller.py:371] initialized model.
initialized model.
I1119 18:36:13.540033 4373019968 controller.py:214] train | step:      0 | training until step 40036...
train | step:      0 | training until step 40036...
2020-11-19 18:36:13.597005: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-19 18:36:13.597137: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
WARNING:tensorflow:From /Users/byronyi/Develop/models/official/staging/training/grad_utils.py:73: Hints.__new__ (from tensorflow.python.distribute.collective_util) is deprecated and will be removed in a future version.
Instructions for updating:
use distribute.experimental.CommunicationOptions instead
W1119 18:36:14.379299 6121844736 deprecation.py:333] From /Users/byronyi/Develop/models/official/staging/training/grad_utils.py:73: Hints.__new__ (from tensorflow.python.distribute.collective_util) is deprecated and will be removed in a future version.
Instructions for updating:
use distribute.experimental.CommunicationOptions instead
2020-11-19 18:36:20.475765: I tensorflow/compiler/tf2mlcompute/kernels/mlc_subgraph_op.cc:545] Compute: Failed in processing TF graph while/body/_1/gradient_tape/while/truediv/MLCSubgraphOp_0_58 with error: Internal: PerformGradientPassNodeRoutine: Failed to find forward-pass output for node: while/body/_1/while/conv1/kernel/Regularizer/mul (error will be reported 5 times unless TF_MLC_LOGGING=1).
2020-11-19 18:36:20.819 python3[11839:1137514] *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[__NSArrayM objectAtIndexedSubscript:]: index 3 beyond bounds [0 .. 2]'
*** First throw call stack:
(
	0   CoreFoundation                      0x000000018af45320 __exceptionPreprocess + 240
	1   libobjc.A.dylib                     0x000000018ac73c04 objc_exception_throw + 60
	2   CoreFoundation                      0x000000018b00c064 -[__NSCFString characterAtIndex:].cold.1 + 0
	3   CoreFoundation                      0x000000018aeb3a1c -[__NSCFString hasSuffix:] + 0
	4   MLCompute                           0x0000000193fab250 -[MLCTrainingGraph resultGradientTensorToUseByExecuteGradientForLayer:sourceIndex:incrementIntermediateIndex:] + 540
	5   MLCompute                           0x0000000193fac598 -[MLCTrainingGraph allocateGradientTensorsForLayersInGraph:] + 660
	6   MLCompute                           0x0000000193fad004 -[MLCTrainingGraph compileAndAllocateGradientTensorsForGraph:] + 140
	7   MLCompute                           0x0000000193fb5708 -[MLCTrainingGraph executeGradientWithBatchSize:options:outputsData:completionHandler:] + 960
	8   _pywrap_tensorflow_internal.so      0x0000000131df6f90 _ZN10tensorflow9mlcompute3ops13MLCSubgraphOp23ExecuteMLCTrainingGraphEPNS_15OpKernelContextEPNS1_27MLCSubgraphExecutionContextEj + 436
	9   _pywrap_tensorflow_internal.so      0x0000000131df6360 _ZN10tensorflow9mlcompute3ops13MLCSubgraphOp20ProcessMLCSubgraphOpEPNS_15OpKernelContextEPNS1_27MLCSubgraphExecutionContextE + 1028
	10  _pywrap_tensorflow_internal.so      0x0000000131df8b4c _ZN10tensorflow9mlcompute3ops13MLCSubgraphOp7ComputeEPNS_15OpKernelContextE + 1084
	11  libtensorflow_framework.2.dylib     0x0000000105b19384 _ZN10tensorflow12_GLOBAL__N_113ExecutorStateINS_15PropagatorStateEE7ProcessENS2_10TaggedNodeEx + 2868
	12  libtensorflow_framework.2.dylib     0x0000000105b1a650 _ZNSt3__110__function6__funcIZN10tensorflow12_GLOBAL__N_113ExecutorStateINS2_15PropagatorStateEE7RunTaskINS_6__bindIMS6_FvNS5_10TaggedNodeExEJPS6_RKS9_RxEEEEEvOT_EUlvE_NS_9allocatorISJ_EEFvvEEclEv + 80
	13  libtensorflow_framework.2.dylib     0x0000000105b924b4 _ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi + 552
	14  libtensorflow_framework.2.dylib     0x0000000105b9218c _ZZN10tensorflow6thread16EigenEnvironment12CreateThreadENSt3__18functionIFvvEEEENKUlvE_clEv + 80
	15  libtensorflow_framework.2.dylib     0x0000000105b83a74 _ZN10tensorflow12_GLOBAL__N_17PThread8ThreadFnEPv + 104
	16  libsystem_pthread.dylib             0x000000018adcd06c _pthread_start + 320
	17  libsystem_pthread.dylib             0x000000018adc7da0 thread_start + 8
)
libc++abi.dylib: terminating with uncaught exception of type NSException
Fatal Python error: Aborted

Thread 0x0000000104a6fd40 (most recent call first):
  File "/Users/byronyi/.virtualenv/tf/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59 in quick_execute
  File "/Users/byronyi/.virtualenv/tf/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 555 in call
  File "/Users/byronyi/.virtualenv/tf/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1918 in _call_flat
  File "/Users/byronyi/.virtualenv/tf/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2942 in __call__
  File "/Users/byronyi/.virtualenv/tf/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 888 in _call
  File "/Users/byronyi/.virtualenv/tf/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828 in __call__
  File "/Users/byronyi/Develop/models/orbit/standard_runner.py", line 141 in train
  File "/Users/byronyi/Develop/models/orbit/controller.py", line 413 in _train_n_steps
  File "/Users/byronyi/Develop/models/orbit/controller.py", line 218 in train
  File "/Users/byronyi/Develop/models/orbit/controller.py", line 306 in train_and_evaluate
  File "official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py", line 176 in run
  File "official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py", line 190 in main
  File "/Users/byronyi/.virtualenv/tf/lib/python3.8/site-packages/absl/app.py", line 251 in _run_main
  File "/Users/byronyi/.virtualenv/tf/lib/python3.8/site-packages/absl/app.py", line 303 in run
  File "official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py", line 197 in <module>
Abort trap: 6

Model training on cpu (Intel) throws seg fault

First tests using this fork, running model training against Cifar10 dataset for benchmark.
But during the first epoch I encounter:

Total params: 309,290
Trainable params: 308,394
Non-trainable params: 896
_________________________________________________________________
Epoch 1/100
5000/5000 [==============================] - ETA: 0s - loss: 2.3416 - accuracy: 0.3065zsh: segmentation fault  python cifar10_cnn.py
multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown

Explicitly setting to run on the GPU however works. But much slower (Intel integrated graphics)

from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')
tensorflow.config.run_functions_eagerly(False)

Python 3.8.6 from Mac Ports if it makes any difference.

Preliminary benchmark on LeNet CNN trained on MNIST

I run some preliminary test with a simple LeNet CNN Model trained on MNIST dataset.

I tested on both CPU and GPU on a Mac Mini M1 and Intel based MacBook Pro (i7 - 6 core - Radeon 5300M).

I forced TF to disable eager execution and tried with different batch parameters trying to optimize GPU memory usage and also trying to measure/understand also memory marshaling cost. More or less I got the same distribution on the different tests. Probably a more complex model may provide better understanding on these tradeoff.

I've also compared results of same model trained with Colab GPU and with CoreML. CoreML is trained as all other test on all layers and I've tested it on M1 and iPhone 11. CoreML as expected as very similar performance to this accelerator based on MLCompute as they both use the same Metal/BNNS api.

I'll test tomorrow on a well know GPU and update results.

For my understanding these results are completely inline with my expectations but I would love to see other point of view.

IDE problem

I have already successfully installed and it worked out in terminal.But when I use pycharm and choose the virtualenv to be the intepreter,the tensorflow still can't be imported for the architecture reason.Is the problem of pycharm or something need to be setted? Any other IDE to choose?

Using GPU sometimes crashes Jupyter Notebook kernel

I tried running a few very simple Keras networks using the GPU and, sometimes, after computations on the GPU were completed, the ipython kernel would just die, without giving any explanation.

This applies on both when I trained the net and when I tried to predict on it.

I am using a 2019 Macbook Pro with the 555x.
I will try to give more details if I can, but this is all I have got as of now.

The installation seems to expect Python 3.8 and fails when 3.9 is installed

See below:

~ % /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/apple/tensorflow_macos/master/scripts/download_and_install.sh)"

Installation script for pre-release tensorflow_macos 0.1alpha0. Please visit https://github.com/apple/tensorflow_macos
for instructions and license information.

This script will download tensorflow_macos 0.1alpha0 and needed binary dependencies, then install them into a new
or existing Python 3.8 virtual enviornoment.
Continue [y/N]? y

Downloading installer.
/var/folders/_j/j0707trd5p5d1skxsr5hkl600000gn/T/tmp.0RRbRquc ~
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 659 100 659 0 0 1387 0 --:--:-- --:--:-- --:--:-- 1384
100 316M 100 316M 0 0 1517k 0 0:03:33 0:03:33 --:--:-- 695k
Extracting installer.
Path to new or existing virtual environment [default: /Users/xxxxxxxxxxxxxxx/tensorflow_macos_venv/]:
##############################################################

ERROR: Error retrieving python version, or python executable /Library/Frameworks/Python.framework/Versions/3.9/bin/python3 not version 3.8. Please specify a Python 3.8 executable with the --python option.

Error running installation script with default options. Please fix the above errors and proceed by running

/var/folders/_j/j0707trd5p5d1skxsr5hkl600000gn/T/tmp.0RRbRquc/tensorflow_macos/install_venv.sh --prompt

Is this repo taking advantage of the Neural Engine at all?

It seems it's not clear to anyone what the Neural Engine is for exactly and if this repo makes use of it.

Two questions please:

Is the Neural Engine meant only for predictions or can it be used for training too?
Is this library making use of it at all? it seems it's only using the GPU.

Reproducing benchmarks for ResNet, MobileNet, etc using sample code

It would be extremely useful and instructive to anyone interested in this repository to have code that can reproduce some benchmark performance of accelerated Tensorflow on M1 Macs and older Intel Macs. Please include example scripts illustrating how to reproduce the benchmarks shown in the blog post.

Seg Fault on CPU Training Run

Trying to run a simple training job based on this:

import tensorflow as tf

from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))

model.summary()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(train_images, train_labels, epochs=10, 
                    validation_data=(test_images, test_labels))

Output:

2020-11-23 16:11:25.588885: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 1024)              0         
_________________________________________________________________
dense (Dense)                (None, 64)                65600     
_________________________________________________________________
dense_1 (Dense)              (None, 10)                650       
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
2020-11-23 16:11:26.691232: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/10
1562/1563 [============================>.] - ETA: 0s - loss: 1.7913 - accuracy: 0.3383[1]    54903 segmentation fault  python train.py

Running macOS 11.0.1 on MacBook Pro, 15 inch, 2019.
2.3 GHz 8-Core Intel Core i9
16 GB 2400 MHz DDR4
Radeon Pro 560X 4 GB

Python version: 3.8.3

requirements.txt

Sometimes makes it to the second epoch, and pretty quick too.

Strangely, removing validation_data=(test_images, test_labels) from the fit call runs successfully:

2020-11-23 16:22:10.595269: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 1024)              0         
_________________________________________________________________
dense (Dense)                (None, 64)                65600     
_________________________________________________________________
dense_1 (Dense)              (None, 10)                650       
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
2020-11-23 16:22:11.791004: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/10
1563/1563 [==============================] - 20s 12ms/step - loss: 1.7672 - accuracy: 0.3439
Epoch 2/10
1563/1563 [==============================] - 22s 14ms/step - loss: 1.1976 - accuracy: 0.5777
Epoch 3/10
1563/1563 [==============================] - 24s 16ms/step - loss: 1.0230 - accuracy: 0.6417
Epoch 4/10
1563/1563 [==============================] - 24s 15ms/step - loss: 0.9139 - accuracy: 0.6806
Epoch 5/10
1563/1563 [==============================] - 24s 15ms/step - loss: 0.8408 - accuracy: 0.7033
Epoch 6/10
1563/1563 [==============================] - 24s 15ms/step - loss: 0.7783 - accuracy: 0.7243
Epoch 7/10
1563/1563 [==============================] - 23s 15ms/step - loss: 0.7413 - accuracy: 0.7393
Epoch 8/10
1563/1563 [==============================] - 24s 15ms/step - loss: 0.7012 - accuracy: 0.7536
Epoch 9/10
1563/1563 [==============================] - 24s 15ms/step - loss: 0.6537 - accuracy: 0.7675
Epoch 10/10
1563/1563 [==============================] - 23s 15ms/step - loss: 0.6149 - accuracy: 0.7834

pandas and other packages for M1

I know this is not strictly related to this repo but I wonder if anyone was able to install/compile pandas and other packages on M1 ?

I was able of course to install these on Intel platform within the virtual env created by this repo but on M1 it's failing the compilation, at least for different packages that I tried.

can't resolve `import tensorflow` in PyCharm

I have configured tensorflow_macos, and the test is successful locally, but I cannot import tensorflow when I use PyCharm;
I refer to related articles: https://youtrack.jetbrains.com/issue/PY-29580?_ga=2.84740368.88237636.1606896657-1728153029.1606494024 but it doesn't solve my problem. When I output'LD_LIBRARY_PATH' on the console, I didn't get any feedback.

ERROR: Failed building wheel for wrapt

Building wheels for collected packages: wrapt
Building wheel for wrapt (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /Users/kahingleung/tensorflow_macos_venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/47/db7p_9vx019c0hsxzt6vjxl00000gn/T/pip-install-5q7ejel1/wrapt/setup.py'"'"'; file='"'"'/private/var/folders/47/db7p_9vx019c0hsxzt6vjxl00000gn/T/pip-install-5q7ejel1/wrapt/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/47/db7p_9vx019c0hsxzt6vjxl00000gn/T/pip-wheel-w0brsxw3
cwd: /private/var/folders/47/db7p_9vx019c0hsxzt6vjxl00000gn/T/pip-install-5q7ejel1/wrapt/
Complete output (56 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.8
creating build/lib.macosx-10.9-x86_64-3.8/wrapt
copying src/wrapt/importer.py -> build/lib.macosx-10.9-x86_64-3.8/wrapt
copying src/wrapt/init.py -> build/lib.macosx-10.9-x86_64-3.8/wrapt
copying src/wrapt/wrappers.py -> build/lib.macosx-10.9-x86_64-3.8/wrapt
copying src/wrapt/decorators.py -> build/lib.macosx-10.9-x86_64-3.8/wrapt
running build_ext
building 'wrapt._wrappers' extension
creating build/temp.macosx-10.9-x86_64-3.8
creating build/temp.macosx-10.9-x86_64-3.8/src
creating build/temp.macosx-10.9-x86_64-3.8/src/wrapt
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g -I/Users/kahingleung/tensorflow_macos_venv/include -I/Library/Frameworks/Python.framework/Versions/3.8/include/python3.8 -c src/wrapt/_wrappers.c -o build/temp.macosx-10.9-x86_64-3.8/src/wrapt/_wrappers.o
gcc -bundle -undefined dynamic_lookup -arch x86_64 -g build/temp.macosx-10.9-x86_64-3.8/src/wrapt/_wrappers.o -o build/lib.macosx-10.9-x86_64-3.8/wrapt/_wrappers.cpython-38-darwin.so
installing to build/bdist.macosx-10.9-x86_64/wheel
running install
running install_lib
creating build/bdist.macosx-10.9-x86_64
creating build/bdist.macosx-10.9-x86_64/wheel
creating build/bdist.macosx-10.9-x86_64/wheel/wrapt
copying build/lib.macosx-10.9-x86_64-3.8/wrapt/_wrappers.cpython-38-darwin.so -> build/bdist.macosx-10.9-x86_64/wheel/wrapt
copying build/lib.macosx-10.9-x86_64-3.8/wrapt/importer.py -> build/bdist.macosx-10.9-x86_64/wheel/wrapt
copying build/lib.macosx-10.9-x86_64-3.8/wrapt/init.py -> build/bdist.macosx-10.9-x86_64/wheel/wrapt
copying build/lib.macosx-10.9-x86_64-3.8/wrapt/wrappers.py -> build/bdist.macosx-10.9-x86_64/wheel/wrapt
copying build/lib.macosx-10.9-x86_64-3.8/wrapt/decorators.py -> build/bdist.macosx-10.9-x86_64/wheel/wrapt
running install_egg_info
running egg_info
creating src/wrapt.egg-info
writing src/wrapt.egg-info/PKG-INFO
writing dependency_links to src/wrapt.egg-info/dependency_links.txt
writing top-level names to src/wrapt.egg-info/top_level.txt
writing manifest file 'src/wrapt.egg-info/SOURCES.txt'
reading manifest file 'src/wrapt.egg-info/SOURCES.txt'
writing manifest file 'src/wrapt.egg-info/SOURCES.txt'
Copying src/wrapt.egg-info to build/bdist.macosx-10.9-x86_64/wheel/wrapt-1.12.1-py3.8.egg-info
running install_scripts
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/47/db7p_9vx019c0hsxzt6vjxl00000gn/T/pip-install-5q7ejel1/wrapt/setup.py", line 102, in
run_setup(with_extensions=True)
File "/private/var/folders/47/db7p_9vx019c0hsxzt6vjxl00000gn/T/pip-install-5q7ejel1/wrapt/setup.py", line 72, in run_setup
setup(**setup_kwargs_tmp)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/Users/kahingleung/tensorflow_macos_venv/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 328, in run
impl_tag, abi_tag, plat_tag = self.get_tag()
File "/Users/kahingleung/tensorflow_macos_venv/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 278, in get_tag
assert tag in supported_tags, "would build wheel with unsupported tag {}".format(tag)
AssertionError: would build wheel with unsupported tag ('cp38', 'cp38', 'macosx_11_0_x86_64')

ERROR: Failed building wheel for wrapt

Any more steps necessary beyond the README instructions?

Hi all,
First off, congratulations to the developers of this long-awaited interface between tensorflow and the Mac's GPU!
Second, I followed all instructions to the letter (including default location for venv), and still the code is not engaging the GPU. Is there any additional steps that must be taken from the README instructions to start putting the graphics card to use? Please see below the result I obtained. Thanks so much in advance. I'm using an early 2020 16' MBP with AMD Radeon Pro 5500M.

import tensorflow as tf
tf.config.list_physical_devices('GPU')
Out[1]: []

tf.test.gpu_device_name()
Out[2]: ''

Request for TF1.15 support

Besides me, I think there are quite a lot of engineers and researchers who prefer to use TF1.X which is more elegant. It's easier to define a custom NN with Tensor, Variable, Operation, and Graph. And with this manner, I'm very clear about the formula behind the codes. However,

The TF2.X hides too many things, I have to go through the source code to make sure my codes will work as I expected.
The TF2.X doesn't support all the features supported by TF1.X, e.g. the layers.Embedding doesn't support sp_weights, it's impossible to export a model which has sparse tensors in the inputs
Using TF2.X I almost need to write 1.5X codes than TF1.X

Tensorflow 2.4.0-rc0 not compatible with coremltools

Very excited for this new package. Base package installed using python 3.8.6 executable in ~/.pyenv/versions/3.8.6/bin/python

I successfully tested the package with the tensorflow tutorial found here:
https://www.tensorflow.org/tutorials/keras/classification

Innstalling coremltools throws several warnings but does install.

Following a coremltools quickstart conversion (link below) leads to failure
https://coremltools.readme.io/docs/introductory-quickstart

Setup:
macOS 11.0.1 BigSur
python 3.8.6 which was installed with:

env PYTHON_CONFIGURE_OPS="--enable-framework" CFLAGS="-I$(brew --prefix openssl)/include -I$(brew --prefix readline)/include -I$(xcrun --show-sdk-path)/usr/include" LDFLAGS="-L$(brew --prefix openssl)/lib -L$(brew --prefix readline)/lib -L$(xcrun --show-sdk-path)/usr/lib" pyenv install 3.8.6

Pytorch on MacOS with M1 chip ?

As the title suggests, I really appreciate that Tensorflow now has MacOs/M1 support, but is there any plan for a similar Pytorch release as well ?

Thank you.

Wrong Architecture Error on M1

When trying to run a script which imports Tensorflow on an M1 Macbook Air I get:

Traceback (most recent call last):
  File "/Users/Matthew/Documents/mac-benchmark/deeplearn_tf.py", line 7, in <module>
    import tensorflow as tf
  File "/Users/Matthew/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/Users/Matthew/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 39, in <module>
    from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
  File "/Users/Matthew/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 83, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/Users/Matthew/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: dlopen(/Users/Matthew/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 6): no suitable image found.  Did find:
	/Users/Matthew/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: mach-o, but wrong architecture
	/Users/Matthew/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: mach-o, but wrong architecture


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

My sw_vers say:
ProductName: macOS
ProductVersion: 11.0
BuildVersion: 20A2411

Kernel crashes anytime !

I installed tensorflow in my new Mac mini M1.
I followed instructions to make new environnement...
But with Jupyter when I try "import tensorflow", I have a kernel crash systematically !
Please help me.

Thanks.

Benchmark: CNN proposal

The following code implements the original @ylecun LeCun's CNN architecture., with Dropout comment out due to an issue.

import tensorflow.compat.v2 as tf
import tensorflow_datasets as tfds

tf.enable_v2_behavior()

from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')


(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

batch_size = 128

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)


ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)


model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32, kernel_size=(3, 3),
                 activation='relu'),
  tf.keras.layers.Conv2D(64, kernel_size=(3, 3),
                 activation='relu'),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
#   tf.keras.layers.Dropout(0.25),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
#   tf.keras.layers.Dropout(0.5),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(0.001),
    metrics=['accuracy'],
)

model.fit(
    ds_train,
    epochs=12,
    validation_data=ds_test,
)

packages required to run:

pip install tensorflow_datasets

Incorrect release steps to install.

After downloading and unpacking the archive, run /bin/bash ./tensorflow_macos/install_venv.sh --help to see options for creating a new virtual environment with these packages installed or for installing them into an existing environment.

What it should say is

After downloading and unpacking the archive, run /bin/bash ./tensorflow_macos/scripts/install_venv.sh --help to see options for creating a new virtual environment with these packages installed or for installing them into an existing environment.

training on GPU is eating up all memory until even the ssd is full

Hello,
I am using this code to train a vgg like conv net on MNIST:

import numpy as np
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input, Flatten, Activation, Conv2D, MaxPool2D
from tensorflow.keras.layers.experimental.preprocessing import Resizing
from tensorflow.keras.preprocessing.image import ImageDataGenerator

from tensorflow.python.compiler.mlcompute import mlcompute

mlcompute.set_mlc_device(device_name='gpu')

seed = 4242

num_classes = 10
input_shape = (28, 28, 1)

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

train_data = ImageDataGenerator(featurewise_std_normalization=True, rotation_range=30, rescale=1.0 / 255)
test_data = ImageDataGenerator(featurewise_std_normalization=True, rescale=1.0 / 255)

x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

batch_size = 256
train_data.fit(x_train, seed=seed)
test_data.fit(x_test, seed=seed)

train_generator = train_data.flow(x_train, y_train, batch_size=batch_size, seed=seed)
test_generator = test_data.flow(x_test, y_test, batch_size=2 * batch_size, seed=seed)

epochs = 10
kernel_size = (3, 3)
stride = (1, 1)
padding = 'same'
pooling = (2, 2)
relu = Activation('relu')


def compile_network():
    model = Sequential([
        Input(shape=input_shape),
        Resizing(32, 32),
        Conv2D(filters=64, kernel_size=kernel_size, strides=stride, padding=padding),
        relu,
        MaxPool2D(pool_size=pooling, strides=2),
        Conv2D(filters=128, kernel_size=kernel_size, strides=stride, padding=padding),
        relu,
        MaxPool2D(pool_size=pooling, strides=2),
        Conv2D(filters=256, kernel_size=kernel_size, strides=stride, padding=padding),
        relu,
        Conv2D(filters=256, kernel_size=kernel_size, strides=stride, padding=padding),
        relu,
        MaxPool2D(pool_size=pooling, strides=2),
        Conv2D(filters=512, kernel_size=kernel_size, strides=stride, padding=padding),
        relu,
        Conv2D(filters=512, kernel_size=kernel_size, strides=stride, padding=padding),
        relu,
        MaxPool2D(pool_size=pooling, strides=2),
        Conv2D(filters=512, kernel_size=kernel_size, strides=stride, padding=padding),
        relu,
        Conv2D(filters=512, kernel_size=kernel_size, strides=stride, padding=padding),
        relu,
        MaxPool2D(pool_size=pooling, strides=2),
        Flatten(),
        Dense(units=10, activation='softmax')
    ])
    model.summary()
    model.compile(loss="categorical_crossentropy",
                  optimizer=keras.optimizers.SGD(learning_rate=1e-2, momentum=0.5, clipnorm=5.0),
                  metrics=["accuracy"])
    return model


model = compile_network()

history = model.fit(train_generator, epochs=epochs, validation_data=test_generator)

I am currently on macOS 11.0.1. The GPU was a Radeon Vega 64.

The memory used by the python process increases to 85GB which is what i have left on my drive, then the process crashes.

Kind Regards

Which machines benefit from using this fork?

Will MacBook Pros with AMD GPUs see a performance improvement?

No detecting GPU's when requesting data on current availability

This isn't really a major problem (i.e. it doesn't impact the core purpose of this software) but it could be a bit of an issue if someones trying to debug in the future. When you run

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('any')))

Using the supplied build, the number returned is 0. Not sure if this on purpose or if I am missing something, just thought it could be something to note.

FP16 support

Feature request: FP16 support.

A variety of Mac GPUs support FP16 but ML Compute does not include support [1]. Reduced memory usage and in many cases increased compute throughput would be welcome.

https://developer.apple.com/documentation/mlcompute/mlcdatatype

Certain kind of layers breaks: `No MLCTrainingGraph has been found.`

Adding tf.keras.layers.Dropout to model results the following error:

tensorflow.python.framework.errors_impl.AbortedError: Compute: Operation received an exception: Compute: No MLCTrainingGraph has been found.
         [[{{node gradients/dropout_grad/MLCDropoutGrad}}]]

Expected behavior:
Running .fit on model success

Example model configuration:

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32, kernel_size=(3, 3),
                 activation='relu'),
  tf.keras.layers.Conv2D(64, kernel_size=(3, 3),
                 activation='relu'),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
#   tf.keras.layers.Dropout(0.25),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
#   tf.keras.layers.Dropout(0.5),
  tf.keras.layers.Dense(10, activation='softmax')
])

CPU Underutilization

I am testing an auto encoder, built with Keras, over TensorFlow. Testing is being done on a MacBook Pro 2018 13". I have tested one epoch, on the following three setups:

tensorflow-macos, utilising the integrated GPU (Intel Iris Plus 655)
tensorflow-macos, utilising the CPU
TensorFlow v2.3.1, utilising the CPU

I get the following run-times:

682s, 1s/step, 567 steps
1177s, 2s/step, 567 steps
732s, 1s/step, 567 steps

Based on tensorFlow-macos, utilising the integrated GPU indeed is much faster than utilising the CPU. However, when using the master TensorFlow version, the results are much closer (almost on par). Upon analysis, when running on the CPU from the master package, all 8 threads (from the 4 cores) are showing peak utilisation around 90% each.

On the other hand, when utilising the CPU via tensorflow-macos, only 4 threads are showing utilisation, which goes to explain the 2x speedup from 1177s to 732s when reverting to the master branch TensorFlow. Looks like a threading bottleneck on tensorflow-macos.

import tensorflow error

I download and extract the release,create the new virtualenv and install to it.It occurs error when I import tensorflow in the virtualenv,which is:

from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: dlopen(/Users/fzzfbyx/pythonvirtual/tff/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 6): no suitable image found. Did find:
/Users/fzzfbyx/pythonvirtual/tff/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: mach-o, but wrong architecture
/Users/fzzfbyx/pythonvirtual/tff/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: mach-o, but wrong architecture
Failed to load the native TensorFlow runtime.

And I check my python
$ file $(which python)
/Users/fzzfbyx/pythonvirtual/tff/bin/python: Mach-O 64-bit executable x86_64

is that a problem?

Transfer Learning with EfficientNetB7 exited with Error

Error:

warnings.warn('`Model.fit_generator` is deprecated and '
2020-11-23 14:56:55.114107: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/10
Error: command buffer exited with error status.
	The Metal Performance Shaders operations encoded on it may not have completed.
	Error: 
	(null)
	Internal Error (IOAF code -536870211)
	<GFX10_MtlCmdBuffer: 0x7fd02a008200>
    label = <none> 
    device = <GFX10_MtlDevice: 0x7fd0222f9000>
        name = AMD Radeon Pro 5500M 
    commandQueue = <GFXAAMD_MtlCmdQueue: 0x7fd02f9e7b20>
        label = <none> 
        device = <GFX10_MtlDevice: 0x7fd0222f9000>
            name = AMD Radeon Pro 5500M 
    retainedReferences = 1
/AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MetalPerformanceShaders-124.0.30/MPSNeuralNetwork/Filters/MPSCNNKernel.mm:1335: failed assertion `[MPSCNNBatchNormalizationStatistics encodeBatchToCommandBuffer:sourceImages:inStates:destinationImages:] Error: the source image texture is uninitialized.
	This typically means that nothing has written to it yet, and its contents are undefined.
<MPSImage: 0x7fd030974190> ""
	device: 0x7fd02f429200 "AMD Radeon Pro 5500M"
	width: 150
	height: 150
	featureChannelsPerImage: 288
	numberOfImages: 1
	MTLPixelFormat: MTLPixelFormatRGBA32Float
	feature channel format: MPSImageFeatureChannelFormatFloat32
	parent:  0x0
	texture: 0x0

'

Code used:

#import cv2
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.python.compiler.mlcompute import mlcompute

mlcompute.set_mlc_device(device_name='gpu')

# img = cv2.imread('cassava-leaf-disease-classification/train_images/3988625744.jpg')
# img.shape

df = pd.read_csv('cassava-leaf-disease-classification/train.csv')
labels_dict = {
  0: "Cassava Bacterial Blight (CBB)",
  1: "Cassava Brown Streak Disease (CBSD)",
  2: "Cassava Green Mottle (CGM",
    3: "Cassava Mosaic Disease (CMD)",
    4:"Healthy"
}

df.head()
df['label'] = df['label'].apply(lambda x: labels_dict[x])

datagen = ImageDataGenerator(
    featurewise_center=False,
    samplewise_center=False,
    featurewise_std_normalization=False,
    samplewise_std_normalization=False,
    zca_whitening=False,
    zca_epsilon=1e-06,
    rotation_range=0,
    width_shift_range=0.0,
    height_shift_range=0.0,
    brightness_range=None,
    shear_range=0.0,
    zoom_range=0.0,
    channel_shift_range=0.0,
    fill_mode="nearest",
    cval=0.0,
    horizontal_flip=False,
    vertical_flip=False,
    rescale=None,
    preprocessing_function=None,
    data_format=None,
    validation_split=0.0,
    dtype=None,
)

IMG_SIZE = 600
NUM_CLASSES = 5

train_generator = datagen.flow_from_dataframe(dataframe= df,
                                              directory='cassava-leaf-disease-classification/train_images/',
                                              x_col = 'image_id',
                                              y_col = 'label',
                                              subset = 'training',
                                              batch_size=32,
                                              seed=42,
                                              shuffle=True,
                                              class_mode = 'categorical',
                                              target_size=(IMG_SIZE,IMG_SIZE))

img_augmentation = Sequential(
    [
        preprocessing.RandomRotation(factor=0.15),
        preprocessing.RandomTranslation(height_factor=0.1, width_factor=0.1),
        preprocessing.RandomFlip(),
        preprocessing.RandomContrast(factor=0.1),
    ],
    name="img_augmentation",
)


from tensorflow.keras.applications import EfficientNetB7

inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
x = img_augmentation(inputs)
model = EfficientNetB7(include_top=False, input_tensor=x, weights="imagenet")

model.trainable = False

x = layers.GlobalAveragePooling2D(name="avg_pool")(model.output)
x = layers.BatchNormalization()(x)

top_dropout_rate = 0.2
x = layers.Dropout(top_dropout_rate, name="top_dropout")(x)
outputs = layers.Dense(NUM_CLASSES, activation="softmax", name="pred")(x)

model = tf.keras.Model(inputs, outputs, name="EfficientNet")
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-2)
model.compile(
    optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"]
)


# model.layers[-6].trainable= True
# model.layers[-7].trainable= True
# model.layers[-10].trainable= True
# model.layers[-11].trainable= True

model.summary()

model.fit_generator(generator=train_generator,
                    steps_per_epoch=512,
                    epochs=10)
                    
model.save('saved_model_v1.h5')

Shape mismatch in input gradient - up_sampling2d/MLCUpsample2D.

The code I'm trying to run:

from decoder.zplanedecoder import zPlaneDecoder
from PropagationLayer.PropagationLayer import PropagationLayer`
from CustomLoss.CustomLoss import CustomLoss
import tensorflow as tf
from tensorflow.keras import Model, Input, layers, optimizers, utils
tf.compat.v1.disable_eager_execution()

decoder = zPlaneDecoder()

training_data = decoder.get_training_data()
val_data = decoder.get_validation_data()

# 1 level down U-Net
train_in = Input((decoder.M, decoder.M, decoder.nz), name='Input', batch_size=16)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(train_in)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D()(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)

real_branch = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
real_branch = layers.Conv2D(1, (3, 3), activation='relu', padding='same', name="real")(real_branch)
imag_branch = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
imag_branch = layers.Conv2D(1, (3, 3), activation='relu', padding='same', name="imag")(imag_branch)

prop_layer = PropagationLayer(decoder.nz, decoder.M, decoder.wl, decoder.l0, (decoder.xw, decoder.yw), decoder.lp, decoder.dz)([real_branch, imag_branch])

model = Model(train_in, prop_layer)
model.compile(optimizer=optimizers.Adam(), loss=CustomLoss(), metrics=['acc'])
model.fit(training_data, training_data, epochs=3, batch_size=16, validation_data=(val_data, val_data))

Produces an error that does not occur when I run the code in pyenv with Tensorflow 2.3.1.

Input data is (256, 256, 3), and if I print out the shape of x before and after the upsampling layer, output is what I expect:
(16, 128, 128, 256)
(16, 256, 256, 256)

I'm running on a 2018 MacBook Pro 15" Radeon Pro 555X.

Traceback (most recent call last):
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 755, in set_shape
pywrap_tf_session.TF_GraphSetTensorShape_wrapper(
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 2 in both shapes must be equal, but are 256 and 128. Shapes are [16,128,256,256] and [16,128,128,256].

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/ops/gradients_util.py", line 712, in _GradientsHelper
in_grad.set_shape(t_in.get_shape())
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 762, in set_shape
raise ValueError(str(e))
ValueError: Dimension 2 in both shapes must be equal, but are 256 and 128. Shapes are [16,128,256,256] and [16,128,128,256].

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/andreas/Google Drev/Speciale/Private/ML/Beam scanner/BeamScanner.py", line 34, in
model.fit(training_data, training_data, epochs=3, batch_size=16, validation_data=(val_data, val_data))
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_v1.py", line 789, in fit
return func.fit(
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_arrays_v1.py", line 647, in fit
return fit_loop(
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_arrays_v1.py", line 185, in model_iteration
f = _make_execution_function(model, mode)
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_arrays_v1.py", line 555, in _make_execution_function
return model._make_execution_function(mode)
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_v1.py", line 2078, in _make_execution_function
self._make_train_function()
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_v1.py", line 2009, in _make_train_function
updates = self.optimizer.get_updates(
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 727, in get_updates
grads = self.get_gradients(loss, params)
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 716, in get_gradients
grads = gradients.gradients(loss, params)
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/ops/gradients_impl.py", line 169, in gradients
return gradients_util._GradientsHelper(
File "/Users/andreas/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/ops/gradients_util.py", line 714, in _GradientsHelper
raise ValueError(
ValueError: Incompatible shapes between op input and calculated input gradient. Forward operation: up_sampling2d/MLCUpsample2D. Input index: 0. Original input shape: (16, 128, 128, 256). Calculated input gradient shape: (16, 128, 256, 256)

Error: no MLCTrainingGraph has been found

I was able to install everything just fine. I'm basically running the code in the file attached.

metal-py.txt

When I get to the line containing code:

model.fit(x = [x_train_cat, x_train_cont], y = [y_train], batch_size = 50, epochs = 50, validation_split = 0.3)

I get the following error message:

AbortedError: Compute: Operation received an exception: Compute: No MLCTrainingGraph has been found.
	 [[{{node gradients/dropout_grad/MLCDropoutGrad}}]]

Any thoughts on what I may have done wrong? On a side note, when I use PlaidML, the code works with no errors.

Specs I'm using:

2019 16" MBP
8-core i9
dGPU -> AMD Radeon Pro 5500M 4GB
eGPU -> AMD Radeon Pro RX580 8GB
OS: macOS Big Sur 11.0.1

Not Using Second GPU

I have created a simple training script using the MNIST data and the getting started from TensorFlow. It uses approximately 30% gpu with 800% CPU when running. I ran the script several times at the same time to see if it would use a second GPU and it will take my first GPU to 100% but not switch to my second GPU on future processes.

Was this supposed load balance GPU over multiple GPUs...

How to install if I use conda environments?

I do not have virtualenvs installed but I do use conda environments. Is there a way to install in a conda environment? Thanks!

Running much slower than standard TensorFlow 2.0

I just compared the performance of the TensorFlow_macos package with the standard TensorFlow 2.0 python package. I used a very simple tensor testing program, running on my Mac 2020 (x86), with the 2 package respectively. I found that the tensorflow_macos is much slower. I'm not sure whether the Mac-optimized package is only good for the new M1 ARM Mac
or it is the problem of my test program. Here is the result and my code:

standard TensorFlow: 10.87s
tensorflow_macos: 79.41s

Code:

from datetime import datetime
import numpy as np
import tensorflow as tf
from tensorflow.python.compiler.mlcompute import mlcompute


mlcompute.set_mlc_device(device_name="any") 
print("start" , datetime.now())
X_raw = np.array([2013, 2014, 2015, 2016, 2017, 2018], dtype=np.float32)
y_raw = np.array([12000, 14000, 15000, 16500, 17500, 19000], dtype=np.float32)

X = (X_raw - X_raw.min()) / (X_raw.max() - X_raw.min())
y = (y_raw - y_raw.min()) / (y_raw.max() - y_raw.min())

X = tf.constant(X)
y = tf.constant(y)

a = tf.Variable(initial_value=0.)
b = tf.Variable(initial_value=0.)
variables = [a, b]

num_epoch = 10000
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
for e in range(num_epoch):

    with tf.GradientTape() as tape:
        y_pred = a * X + b
        loss = 0.5 * tf.reduce_sum(tf.square(y_pred - y))

    grads = tape.gradient(loss, variables)

    optimizer.apply_gradients(grads_and_vars=zip(grads, variables))

print(a, b)
print("end" , datetime.now())

Does this framework take advantage of the neural engine in M1 chip?

As far as I understood this framework uses only CPU and GPU inside the M1 chip. Is it right? Is there any way that neural engine can be used for ML operations?

Numpy error at the end of installation

Hi,
I just followed the steps here and in this video: https://www.youtube.com/watch?v=6W8pjnW65Q8&feature=youtu.be .
I'm still unable to install tensorflow on my m1 mac. Please see error message below:

Installing bundled binary dependencies.
ERROR: numpy-1.18.5-cp38-cp38-macosx_11_0_arm64.whl is not a supported wheel on this platform.

I've been trying to install tf for about 2 weeks now. I tried creating virtual environments through conda, venv and anaconda navigator. I tried putting together the various instructions given by contributors here, but still no success.
Please can someone help?

import grpc raise error

I get an error in grpcio.

(venv) % python
Python 3.8.2 (default, Oct  2 2020, 10:45:41)
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import grpc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/btm/tmp/python/venv/lib/python3.8/site-packages/grpc/__init__.py", line 23, in <module>
    from grpc._cython import cygrpc as _cygrpc
ImportError: dlopen(/Users/btm/tmp/python/venv/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-darwin.so, 2): Symbol not found: _v3_pkey_usage_period
  Referenced from: /Users/btm/tmp/python/venv/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-darwin.so
  Expected in: flat namespace
 in /Users/btm/tmp/python/venv/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-darwin.so

I install grpcio with this script with virtualenv.

% pip freeze|grep grpc
grpcio @ file:///var/folders/fk/97f489c14ld7p2bg389mtbfr0000gn/T/tmp.7Wau228R/tensorflow_macos/arm64/grpcio-1.33.2-cp38-cp38-macosx_11_0_arm64.whl

Model cat fit with only active GPU

Hello!

I tried to use tensorflow_macos-0.1a0-cp38 with MBP16 2019 Radion Pro 5500M, and judging by Activity Monitor GPU History, it looks like this tensorflow revision can use only active GPU. Because when I checked in preferences "Automatic graphics switching" flag (Intel graphics activated here), and when I started model fit, there was no Radeon Pro activity, but Internal GPU activity was high. And opposite.

oneAPI DNN Usage

I am not sure if this is the appropriate forum to post this question, but I am curious with the oneDNN message that is printed even when GPU is selected. Further, oneAPI is not officially supported on macOS. Is this message correct?

This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 AVX AVX2 FMA

Also, I'm not certain if a specific GPU can be selected or if I have to disable dynamic GPU switch to force dGPU usage.