Using TensorFlow on HEP Linux systems

Tensorflow is provided for Centos7 systems, for normal CPU processing and with GPU acceleration.

Configuring Local TensorFlow Installs

Tensorflow using only CPUs can be used on any Centos 7 system.

The latest supported version can be configured by simply running (note the configured versions of GCC, Python and TensorFlow can change)
  • source /user/software/tensorflow/setup-tensorflow-latest.sh
This version will also support GPU-acceleration if it is run on a system with a suitable CUDA device. Note that systems with old CUDA cards may run slower than just running on CPU.

You can install TensorFlow in your own virtualenv, allowing you to install a different version than default and any other modules. Set up the versions of GCC, Python etc you wish to use then create the virtualenv
  • virtualenv my-tensorflow
  • source my-tensorflow/bin/activate.sh
  • pip install tensorflow
Tensorflow V2 onwards has support for both CPU and GPU in the same package.

Tensorflow V1 (V1.15 is the last supported version) can be installed in the same way expect the CPU and GPU packages are separate eg
  • pip install tensorflow==1.15
  • pip install tensorflow-gpu==1.15
If you wish to use GPU-acceleration you will need to set up the correct version of CuDNN (version depends on version of Tensorflow installed) eg
  • source /user/software/cuda/cudnn-7.6-cuda10.1-x86_64/setup.sh
This must be set up in any new shell before running TensorFlow.

Running TensorFlow from Containers

TensorFlow builds in Docker Containers are provided by the TensorFlow developers. These can be run on our local system using Singularity eg to run the standard CPU-only TensorFlow
  • singularity run docker://tensorflow/tensorflow
This will take a few minutes to convert the first time it is run, but will start in seconds subsequently.

GPU support can be enabled by running with the gpu tag and enabling NVidia support eg
  • singularity run --nv docker://tensorflow/tensorflow:latest-gpu
Note that this requires running on a system with a supported NVidia card. Older cards may well be slower than running directly on CPU.

Support for Jupyter is also included with the jupyter tag eg latest-jupyter or latest-gpu-jupyter. This will print out a URL to use to access the Jupyter webpage, this can be accessed from any HEP system. To access from a browser offsite you will need to use an SSH tunnel.

The container environment is based on Ubuntu so won't have the usual system-supplied software but should be able to access local user storage and networks as normal.

Using TensorFlow

When running TensorFlow you may see warnings about "The TensorFlow library wasn't compiled to use XXX instructions". These can be ignored, the install will be run on a variety of systems so we have to use the lowest common denominator build. Installs can be run with extra optimisations or used with GPU acceleration as required.

Tutorials are provided for getting started, see

https://www.tensorflow.org/get_started/get_started

The Keras frontend is also installed alongside Tensorflow. There are instructions on the webpage

https://keras.io/

Multi-GPU Systems

By default TensorFlow will claim exclusive use of any suitable GPUs on the system, and reserve almost all of their RAM. This can be a problem if there are multiple users on the system wishing to use TensorFlow.

The specific GPUs TensorFlow uses can be limited with the environment variable CUDA_VISIBLE_DEVICES. For example if there are 3 GPU devices the TensorFlow session can be limited to just the first 2 with
  • CUDA_VISIBLE_DEVICES=0,1 python
Batch jobs submitted through Slurm will automatically constrain jobs to the number of GPUs requested eg a 1 GPU job on a 2 GPU system the cards will always be enumerated as GPU0.

Python processes should be terminated if they are no longer needed, as this will allow any GPU devices claimed by TensorFlow to be made available to other users.
Topic revision: r13 - 08 Mar 2021, JohnBland
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback