GithubHelp home page GithubHelp logo

Not Using Second GPU about tensorflow_macos HOT 15 OPEN

apple avatar apple commented on May 27, 2024
Not Using Second GPU

from tensorflow_macos.

Comments (15)

BatmanDZ avatar BatmanDZ commented on May 27, 2024

The same thing. I've tried to run code on Mac Pro with 2 AMD Radeon Pro Vega II Duo (4 GPU). Using MNIST and CIFAR datasets it uses only one GPU and only 30-50%.

from tensorflow_macos.

dmmajithia avatar dmmajithia commented on May 27, 2024

@BatmanDZ can you try increasing the batch size? That increases GPU utilization for me.

from tensorflow_macos.

BatmanDZ avatar BatmanDZ commented on May 27, 2024

from tensorflow_macos.

dmmajithia avatar dmmajithia commented on May 27, 2024

Can you post your model definition?
Is it similar to the CNN posted in #25 ?

from tensorflow_macos.

BatmanDZ avatar BatmanDZ commented on May 27, 2024

from tensorflow_macos.

BatmanDZ avatar BatmanDZ commented on May 27, 2024

With mode 'GPU', the batch size 1000
-GPU usage - 16%,
-CPU usage - 839%.
Снимок экрана 2020-12-08 в 09 38 50
Снимок экрана 2020-12-08 в 09 39 21
Снимок экрана 2020-12-08 в 09 39 50

from tensorflow_macos.

BatmanDZ avatar BatmanDZ commented on May 27, 2024

Train on 60 steps, validate on 10 steps
Epoch 1/12
60/60 [==============================] - ETA: 0s - batch: 29.5000 - size: 1.0000 - loss: 0.5020 - accuracy: 0.8679
/Users/dz/opt/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically.
warnings.warn('Model.state_updates will be removed in a future version. '
60/60 [==============================] - 27s 375ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.5020 - accuracy: 0.8679 - val_loss: 0.1656 - val_accuracy: 0.9499
Epoch 2/12
60/60 [==============================] - 27s 382ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.1169 - accuracy: 0.9657 - val_loss: 0.0736 - val_accuracy: 0.9777
Epoch 3/12
60/60 [==============================] - 27s 390ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.0658 - accuracy: 0.9808 - val_loss: 0.0519 - val_accuracy: 0.9838
Epoch 4/12
60/60 [==============================] - 26s 377ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.0490 - accuracy: 0.9854 - val_loss: 0.0480 - val_accuracy: 0.9840
Epoch 5/12
60/60 [==============================] - 27s 384ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.0393 - accuracy: 0.9884 - val_loss: 0.0414 - val_accuracy: 0.9865
Epoch 6/12
60/60 [==============================] - 27s 391ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.0314 - accuracy: 0.9904 - val_loss: 0.0421 - val_accuracy: 0.9866
Epoch 7/12
60/60 [==============================] - 27s 386ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.0273 - accuracy: 0.9920 - val_loss: 0.0413 - val_accuracy: 0.9870
Epoch 8/12
60/60 [==============================] - 27s 389ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.0236 - accuracy: 0.9930 - val_loss: 0.0388 - val_accuracy: 0.9877
Epoch 9/12
60/60 [==============================] - 27s 386ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.0190 - accuracy: 0.9944 - val_loss: 0.0354 - val_accuracy: 0.9887
Epoch 10/12
60/60 [==============================] - 27s 389ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.0151 - accuracy: 0.9955 - val_loss: 0.0370 - val_accuracy: 0.9884
Epoch 11/12
60/60 [==============================] - 27s 388ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.0126 - accuracy: 0.9966 - val_loss: 0.0343 - val_accuracy: 0.9889
Epoch 12/12
60/60 [==============================] - 27s 389ms/step - batch: 29.5000 - size: 1.0000 - loss: 0.0119 - accuracy: 0.9967 - val_loss: 0.0424 - val_accuracy: 0.9878
Out[3]:
<tensorflow.python.keras.callbacks.History at 0x7fe190e83fa0>

from tensorflow_macos.

BatmanDZ avatar BatmanDZ commented on May 27, 2024

With the batch size 5000 it was worse - about 12% GPU usage and 36s per step

from tensorflow_macos.

dmmajithia avatar dmmajithia commented on May 27, 2024

It might be the case that the model is small enough to not gain from GPU usage.
Can you try adding 2x more convolutional layers?

from tensorflow_macos.

BatmanDZ avatar BatmanDZ commented on May 27, 2024

Tried. Usage increased to 50% and time per epoch decreased to 24s

from tensorflow_macos.

anna-tikhonova avatar anna-tikhonova commented on May 27, 2024

@anagrath Thank you for reporting this issue. Could you please send us more information about your config? Also, are you setting device type to 'any'? What batch size are you using?

from tensorflow_macos.

anagrath avatar anagrath commented on May 27, 2024

Am new to Tensorflow. It has been 20 years since I last played with neural nets and so I am sure terminology has changed and so I went and got a book a few days back that I am now going through.

What I did was go here: https://www.tensorflow.org/tutorials/quickstart/beginner and turn that into a file. Then I imported the ml and set it to 'any'. I have played around with any/cpu/gpu settings.

After I filed the issue, because other people seem to be experiencing the issue and there was fast uptake, I assumed this will get resolved at some point. So, I began modifying the code to see if I could get start get information on my own problem working with it.... (marketing data that I downloaded from one of our providers). here is the sorely misguided code I am currently using:

import tensorflow as tf
import csv
import numpy as np
import random
import sys
from tensorflow.python.compiler.mlcompute import mlcompute

method = sys.argv[6]

mlcompute.set_mlc_device(device_name=method)

fname = sys.argv[1]
epochs = int(sys.argv[2])
linear = int(sys.argv[3])
swish = int(sys.argv[4])
relu = int(sys.argv[5])

OUTPUT_SIZE = 50
BUCKET_SIZE = 2

def splitTrainTest(data):
  random.shuffle(data)
  trainLen = int(len(data) * 0.7)
  return (data[:trainLen], data[trainLen:])
  
def splitInputOutput(data):
  inputs = []
  outputs = []
  for row in data:
    outval = round(row[-1] / BUCKET_SIZE)
    if outval < OUTPUT_SIZE:
      inputs.append(row[0:-1])
      outputs.append(outval)
  return (np.matrix(inputs), np.array(outputs))


data = []
first_row = True
header = []
with open(fname) as csvfile:
  reader = csv.reader(csvfile)
  for row in reader:
    if first_row:
      first_row = False
      header = row
    else:
      floatRow = [float(e) for e in row]
      data.append(floatRow)

(train, test) = splitTrainTest(data)
(x_train, y_train) = splitInputOutput(train)
(x_test,  y_test)  = splitInputOutput(test)

model = tf.keras.models.Sequential([ 
  tf.keras.layers.Dense(linear, activation='linear'),
  tf.keras.layers.Dense(swish, activation='swish'),
  tf.keras.layers.Dense(relu, activation='relu'),

  tf.keras.layers.Dense(OUTPUT_SIZE)
])



predictions = model(x_train[:1]).numpy()


loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)


loss_fn(y_train[:1], predictions).numpy()

model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])
              
model.fit(x_train, y_train, epochs=epochs)

print("Test Set:")
model.evaluate(x_test,  y_test, verbose=2)

As you can see I started to extract out to command line different settings including the cpu/gpu/any. I am also now running three layers. Since it was not using all the resources, I figured I could run the same experiment with different settings to see if one of the settings work better on my data, though even when I fill up GPU 1 it does not switch over to GPU 2 (same as what I saw if I ran the MNIST tutorial setup multiple times). It would be interesting if we could specify which gpu on that line so that one configuration could be run on one gpu while another could be run on the second.

In the end, I have very little clue on what I am supposed to be doing :) so I do not know if I am doing it right or if the above suggestion even makes sense. Maybe in another week when I kill the tensorflow book I will have a better idea... the early examples in the book I found use scikit and tensorflow examples are much later. It may take me a minute to catch up to you guys.

from tensorflow_macos.

anagrath avatar anagrath commented on May 27, 2024

I should also say that if you have an example that you need me to run, I am happy to try to run it to help get the issue resolved.

from tensorflow_macos.

anna-tikhonova avatar anna-tikhonova commented on May 27, 2024

@anagrath Thank you for posting the code. Could you tell me which command line arguments you are running this code with?

from tensorflow_macos.

anagrath avatar anagrath commented on May 27, 2024

have been running different numbers, sometimes simultaneously (different tabs). Was doing powers of 2. Started to get declining returns around 4096 on accuracy. Again, I am running multiple instances on different numbers to use up more resources get some answers faster. Here are some example runs:

300 512 1024 512 cpu
300 1024 2048 1024 any
300 2048 4096 2048 gpu

from tensorflow_macos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.