I am testing an auto encoder, built with Keras, over TensorFlow. Testing is being done

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

CPU Underutilization about tensorflow_macos HOT 5 OPEN

apple commented on May 26, 2024

CPU Underutilization

from tensorflow_macos.

Comments (5)

anna-tikhonova commented on May 26, 2024

Thank you very much for reporting this issue. Could you please point us to the code you are running or provide a reproducible test case?

from tensorflow_macos.

andreademarco86 commented on May 26, 2024

Hi, sure. The code is attached.

There's a lot of code related to pre-processing my specific data in the AutoClean class. Otherwise though, the main elements are in

build_autoencoder()
train()
CustomCallback (callback related operations when epochs end)

Also, the setup of tensorFlow_macos is in the top part of the file, most of which I have been commenting in/out depending if I'm running on master tensorFlow, or tensorflow_macos.

Let me know if I can help any further.

autoclean.py.zip

from tensorflow_macos.

andreademarco86 commented on May 26, 2024

Hi @anna-tikhonova ,

I have modified my sample - this doesn't do much except demonstrate the problem. Hopefully it will be easier for you to test.

import sys
sys.path.append('../../')

import numpy as np
import warnings
from astropy.io.fits.verify import VerifyWarning
warnings.simplefilter('ignore', category=VerifyWarning)

import logging
import tensorflow as tf
from keras.optimizers import Adadelta
import tensorflow.keras.backend as K
from keras.layers import LeakyReLU, Input
from keras.models import Model
from keras.layers.convolutional import Conv2D, UpSampling2D, MaxPooling2D

# setup tensorflow_macos
# Import mlcompute module to use the optional set_mlc_device API for device selection with ML Compute.
EXECUTION_MODE = 'cpu'
from tensorflow.python.compiler.mlcompute import mlcompute
# Select CPU device.
mlcompute.set_mlc_device(device_name=EXECUTION_MODE) # Available options are 'cpu', 'gpu', and ‘any'.
# Turn off eager execution for GPU mode

if EXECUTION_MODE == 'gpu':
    tf.config.run_functions_eagerly(False)
logging.basicConfig(filename='output.log', level=logging.DEBUG)


def recall_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def precision_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def f1_m(y_true, y_pred):
    precision = precision_m(y_true, y_pred)
    recall = recall_m(y_true, y_pred)
    return 2 * ((precision * recall) / (precision + recall + K.epsilon()))


class AutoClean:
    def __init__(self, pretrained_model=None):
        self.CNN_SIDE = 128
        self.img_rows = self.CNN_SIDE
        self.img_cols = self.CNN_SIDE
        self.channels = 1
        self.batch_size = 8
        self.val_batch_size = 8
        self.img_shape = (self.img_rows, self.img_cols, self.channels)
        self.dataset_size = 5000
        self.validation_size = 256

        self.optimizer = Adadelta()

        if pretrained_model:
            self.aen = pretrained_model
        else:
            self.aen = self.build_autoencoder()

        # self.aen = to_multi_gpu(self.aen, n_gpus=N_GPUS)
        self.aen.compile(optimizer=self.optimizer, loss='binary_crossentropy', metrics=['accuracy',
                                                                                           f1_m,
                                                                                           precision_m,
                                                                                           recall_m])

    def load_image_batch(self, n_samples):
        """
        Gets a random batch of full and mask images from a directory with .npy files
        :param n_samples:
        :return:
        """
        while True:
            X_train_image = np.ones((n_samples,self.CNN_SIDE, self.CNN_SIDE,1))
            X_train_mask = np.zeros((n_samples,self.CNN_SIDE, self.CNN_SIDE,1))
            yield X_train_image, X_train_mask

    def load_validation_batch(self, n_samples):
        """
        Gets a the full validation batch of image and masks from a directory with .npy files
        :param n_samples:
        :return:
        """
        while True:
            X_vld_image = np.ones((n_samples,self.CNN_SIDE, self.CNN_SIDE,1))
            X_vld_mask = np.ones((n_samples,self.CNN_SIDE, self.CNN_SIDE,1))
            yield X_vld_image, X_vld_mask

    def build_autoencoder(self):
        depth = 128
        input_img = Input(shape=self.img_shape)

        # ENCODER
        x = Conv2D(int(depth), kernel_size=3, strides=1, padding='same')(input_img)
        x = LeakyReLU()(x)
        x = MaxPooling2D((2, 2), padding='same')(x)

        x = Conv2D(int(depth / 2), kernel_size=3, strides=1, padding='same')(x)
        x = LeakyReLU()(x)
        # x = MaxPooling2D((2, 2), padding='same')(x)

        x = Conv2D(int(depth / 4), kernel_size=3, strides=1, padding='same')(x)
        x = LeakyReLU()(x)
        # x = MaxPooling2D((2, 2), padding='same')(x)

        # x = Conv2D(int(depth/4), kernel_size=3, strides=1, padding='same')(x)
        # x = LeakyReLU()(x)
        # # x = MaxPooling2D((2, 2), padding='same')(x)

        # # DECODER
        # x = Conv2D(int(depth/4), kernel_size=3, strides=1, padding='same')(x) # (encoded)
        # x = LeakyReLU()(x)
        # # x = UpSampling2D((2, 2))(x)

        x = Conv2D(int(depth / 4), kernel_size=3, strides=1, padding='same')(x)
        x = LeakyReLU()(x)
        # x = UpSampling2D(size=(2, 2))(x)

        x = Conv2D(int(depth / 2), kernel_size=3, strides=1, padding='same')(x)
        x = LeakyReLU()(x)
        # x = UpSampling2D(size=(2, 2))(x)

        x = Conv2D(int(depth), kernel_size=3, strides=1, padding='same')(x)
        x = UpSampling2D(size=(2, 2))(x)
        x = LeakyReLU()(x)

        decoded = Conv2D(1, kernel_size=3, strides=1, activation='sigmoid', padding='same')(x)

        autoencoder = Model(input_img, decoded)
        # autoencoder = CustomModel(input_img, decoded)
        autoencoder.summary()
        return autoencoder

    def train(self, n_epochs=100, batch_size=8):
        self.batch_size = batch_size
        self.val_batch_size = batch_size

        self.aen.fit(
            self.load_image_batch(n_samples=self.batch_size),
            epochs=n_epochs,
            steps_per_epoch=int(self.dataset_size / self.batch_size),
            validation_data=self.load_validation_batch(n_samples=self.val_batch_size),
            validation_steps=int(self.validation_size / self.val_batch_size)
        )

if __name__ == '__main__':
    aen = AutoClean(pretrained_model=None)
    aen.train(n_epochs=4, batch_size=20)

Here is the CPU history showing the problem:

On standard TensorFlow all cores are in use, and training time is doubled accordingly. Hope this helps :)

from tensorflow_macos.

danbricedatascience commented on May 26, 2024

I have the same issue, even on much simpler case.

Hardware : MacBook Air M1 - 8GB / 512 GB

Very simple convnet on MNIST.

import numpy as np
import time

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.utils import to_categorical

#set cpu or gpu
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='cpu')

mnist = tf.keras.datasets.mnist

(train_images,train_labels),(test_images,test_labels) = mnist.load_data()

train_images=train_images.reshape((60000,28,28,1))
train_images=train_images.astype('float32')/255

test_images=test_images.reshape((10000,28,28,1))
test_images=test_images.astype('float32')/255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

model = tf.keras.models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation = 'relu',input_shape=(28,28,1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation = 'relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation = 'relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64,activation='relu'))
model.add(layers.Dense(10,activation='softmax'))

model.compile(optimizer = 'rmsprop',
             loss = 'categorical_crossentropy',
             metrics = ['accuracy'])

print("Start Learning with tensorflow.keras")

start = time.time()

history = model.fit(train_images,train_labels,epochs=5,batch_size=128)

print("Ran in {} seconds".format(time.time() - start))

test_loss, test_acc = model.evaluate(test_images,test_labels)

print('test_acc:',test_acc)

I get underutilization of the CPU (see the right side).

This is the same for MLP and LSTM models.

from tensorflow_macos.

andreademarco86 commented on May 26, 2024

@danbricedatascience

Could be that in the case of M1, tensor flow is using only the 4 high-performance cores (and not the 4 efficiency cores) with tensorflowe, which would be Cores 4/5/6/8?, whilst Cores 1/2/3/7 are left out - but I can't be sure of this, so perhaps wait for a reply from somebody else.

In my case, I'm on an Intel Mac, and the code I had made full use of all cores on standard tensor flow, but only 50% of cores on this macOS fork.

from tensorflow_macos.

CPU Underutilization about tensorflow_macos HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs