Comments (11)
Thanks @atw1020 , indeed reducing the batch size in my benchmark allows epochs to complete on cpu.
It's an interesting behaviour.
I don't expect to be able to use large batch sizes on a laptop with integrated GPU, but when so much is shared.
It's surprising that TF with CoreML is so limited on CPU, yet the GPU with the same memory can handle larger batch sizes.
For reference, the original benchmark used 32
as batch size, that worked only on the GPU. Taking it down to 16
works on the CPU (20
is too high, crashes again).
from tensorflow_macos.
@tux-o-matic Thank you for reporting this issue. Could you, please, point us to or attach an example you are running? This way, we can reproduce this issue locally and investigate.
from tensorflow_macos.
Hi @anna-tikhonova , I uses this Python code.
Just needs the TF fork and NumPy:
python cifar10_cnn.py
In my case, on a MacBook Air with Intel chips, the backend seems to choose the CPU by default and then throws the error.
However, if I specify
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')
tensorflow.config.run_functions_eagerly(False)
Then the model gets trained, I can see with the Activity Monitor that Python threads are offloading work to the GPU. But on this integrated Intel GPU the perf is worse than the CPU and even PlaidML as a backend for TF could do better on the GPU.
from tensorflow_macos.
If you need another example, running this code (from #35 ) also defaults to CPU and Seg Faults.
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
Machine specs:
MacOS 11.0.1 on MacBook Pro, 15 inch, 2019.
2.3 GHz 8-Core Intel Core i9
16 GB 2400 MHz DDR4
Radeon Pro 560X 4 GB
from tensorflow_macos.
@tux-o-matic @hughack I apologize for the late reply. I just tried both of the scripts you provided. I'm not able to reproduce the issue. It's possible that it is resolved in a MacOS update. Could you please try again using an updated MacOS and let me know if you can still reproduce this?
from tensorflow_macos.
Hi @pooyadavoodi.
On an up to date BigSur, Python 3.8.7
and latest release of this project, still hit the same error.
from tensorflow_macos.
I managed to reproduce the segfault from @hughack's script using v0.1alpha0, and that issue is resolved in the latest release v0.1alpha2.
@tux-o-matic Could you share the BigSur version you are using? Also are you using the python that comes with the OS, otherwise how did you install it?
from tensorflow_macos.
I'm testing from BigSur 11.0.1. Python 3.8.7
comes from MacPorts. Earlier tests were on older point release of Python 3.8, still from MacPorts.
from tensorflow_macos.
this appears to be the same as #127
I posted over there that I've found that this issue seems to be tied to batch size, where the segmentation fault occurs with sufficiently large batches. "Sufficiently large" appears to depend on the Neural network itself. However, all of the neural networks I have tried so far all experience this segfault when the batch size is larger than a certain amount. It's probably possible to solve or replicate this issue by increasing or decreasing your batch size.
I am still experiencing this on the february alpha build and I am using a Conda environment described on this page. (some of the pip commands need to be updated to match the new file names) I hope this helps you replicate the issue.
Also, using @tux-o-matic's workaround I was able to get my network to stop Segfaulting but it caused a memory leak instead (?!?). It appeared to run faster on GPU than it did on CPU (until I run out of memory, that is).
from tensorflow_macos.
I'm seeing nonlsegfault issues on 0.1-alpha3 but I'm still getting errors that are solved by using a smaller batch size. Going to keep investigating and hopefully get some new code to reproduce the issue I'm seeing
from tensorflow_macos.
I've been trying to replicate this issue on 0.1-alpha3 and I haven't been able to so I'm becoming pretty confidant that this issue was fixed in that patch. There seem to be other bugs related to batch size but this one has been addressed. Update if you are still experiencing this issue
from tensorflow_macos.
Related Issues (20)
- Using `model.set_weights()` yields incorrect behavior when MLC is enabled HOT 2
- Kernel is dying when I run the tensorflow.keras.models.Sequential.Fit() on Apple M1 HOT 2
- Apple M1 TensorFlow HOT 6
- Can't use SpectralNormalization wrapper layer.
- When can get new version basic on tf release version HOT 1
- Initialization issues and model.predict doesn't work HOT 6
- Error while running keras.sequential HOT 2
- Tensorflow with gpu HOT 2
- issues while doing model.fit HOT 1
- Apple M1: process finished with exit code 132 (interrupted by signal 4: sigill) HOT 8
- Transformer hugginface BERT model not working HOT 3
- installer script fail
- ImportError: dlopen( ... _pywrap_tensorflow_internal.so: mach-o, but wrong architecture HOT 2
- Request for comment: re: future of TF development on macOS. HOT 9
- `model.fit` makes the kernel crash when passing a `class_weight`
- op type not registered `NormalizeUTF8` initializing BERT HOT 1
- TransposeMLCBytes nullptr error during training process HOT 1
- TensorFlow Addons
- Error retrieving python version, or python executable /Library/Frameworks/Python.framework/Versions/3.9/bin/python3 not version 3.8. Please specify a Python 3.8 executable with the --python option.
- Check failed: outputs_[index].tensor == nullptr
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow_macos.