Topic VI
This video lecture focuses on using TensorFlow on a multi-GPU system. The presenter assumes prior knowledge of TensorFlow and demonstrates how to utilize all available GPU resources. We start by running a code on a server with four GPUs and explain how to extend it to a cluster with distributed GPUs across multiple nodes. The presenter shows how to write a TensorFlow script that relies on the MNIST dataset. Distributed mirrored strategy to use specific GPU devices and adjust the batch size based on the number of devices used is described. In the context of running on a cluster, it is briefly mentioned how to write a Slurm script and specify the number of GPUs needed. Finally, the instructions for running the script on a cluster are provided.