• Wayne Cheng

Distributing Machine Learning Jobs Across Multiple GPUs

Updated: Feb 16

Do you have multiple machine learning jobs that you would like to distribute evenly among your GPU resources?

The following explains the methods that I find effective for distributing machine learning jobs across multiple GPUs. I am creating my neural network models in Jupyter Notebook, and running Keras version 2.3.1 and Tensorflow version 2.0.0.

Designate GPU for the Jupyter Notebook Session

By default, Keras allocates resources for a Jupyter session on all GPUs. By isolating the session to a specific GPU, I have more control in how I want to distribute my machine learning jobs.

At the start of the session, the GPU is designated with the following code :

import os os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = "<gpu id>"

<gpu id> is replaced with ID of the GPU that I want to run on. For example, if I want to run the session on GPU 1, I would use the following :

os.environ["CUDA_VISIBLE_DEVICES"] = "1"

The GPU ID can be found by using the following command on the terminal :


This is a single digit, starting at 0, and can be found on the leftmost column :

+-------------------------------- | NVIDIA-SMI 435.21 Driver |-------------------------------+ | GPU Name Persistence-M| | Fan Temp Perf Pwr:Usage/Cap| |===============================+ | 0 GeForce RTX 2080 Off | | 42% 59C P2 156W / 245W | +-------------------------------+ | 1 GeForce RTX 2080 Off | | 0% 57C P2 87W / 245W | +-------------------------------+

Enable Dynamic Memory Allocation

By default, Tensorflow statically allocates the memory in the GPU for the model. The static allocation takes up the entire memory of the GPU, limiting the number of jobs per GPU to 1. By enabling dynamic memory allocation, more than 1 job can be run on the same GPU.

Dynamic memory allocation is enabled with the following code :

from tensorflow.compat.v1 import ConfigProto from tensorflow.compat.v1 import InteractiveSession config = ConfigProto() config.gpu_options.allow_growth = True session = InteractiveSession(config=config)

Determine the Resources Needed for the Jobs

The resources that are needed for the job can be determined by running the job, and then using following command on the terminal :


While the job is running, the GPU memory and utilization can be monitored. For example :

|-----+----------------------+----------------------+ | GPU | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan | Memory-Usage | GPU-Util Compute M. | |=====+======================+======================| | 0 | 00000000:01:00.0 Off | N/A | | 44% | 1241MiB / 7981MiB | 67% Default | +-----+----------------------+----------------------+ | 1 | 00000000:04:00.0 Off | N/A | | 0% | 314MiB / 7982MiB | 36% Default | +-----+----------------------+----------------------+

In this example, a job on GPU 0 is using 1241MiB of memory and 67% of the utilization. Another similar job will fit onto GPU 0, but the utilization will exceed 100%. The GPU will be able to run both jobs, at a slower speed.

A job on GPU 1 is using 314MiB of memory and 36% of the utilization. Another similar job will fit onto GPU 1 (with a total of 628MiB of memory and 72% utilization), and the GPU will be able to run both jobs at full capacity.

Thank you for reading. I hope you find this guide helpful for distributing machine learning jobs across multiple GPUs.

Questions or comments? You can reach me at info@audoir.com

Wayne Cheng is an A.I., machine learning, and deep learning developer at Audoir, LLC. His research involves the use of artificial neural networks to create music. Prior to starting Audoir, LLC, he worked as an engineer in various Silicon Valley startups. He has an M.S.E.E. degree from UC Davis, and a Music Technology degree from Foothill College.

Copyright © 2020 Audoir, LLC

All rights reserved