You are here: Foswiki>Computing Web>HEPTensorFlow (09 May 2025, MarkWong)Edit Attach

Using TensorFlow on HEP Linux systems

Tensorflow is provided for Centos7 systems, for normal CPU processing and with GPU acceleration. Torch is also usually provided by the same setup.

Configuring Local TensorFlow Installs

Tensorflow using only CPUs can be used on any Centos 7 system.

The latest supported version can be configured by simply running (note the configured versions of GCC, Python and TensorFlow can change)

source /user/software/tensorflow/setup-tensorflow-latest.sh

This version will also support GPU-acceleration if it is run on a system with a suitable CUDA device. Note that systems with old CUDA cards may run slower than just running on CPU.

You can install TensorFlow in your own virtualenv, allowing you to install a different version than default and any other modules. Set up the versions of GCC, Python etc you wish to use then create the virtualenv

virtualenv my-tensorflow
source my-tensorflow/bin/activate.sh
pip install tensorflow

Tensorflow V2 onwards has support for both CPU and GPU in the same package.

Tensorflow V1 (V1.15 is the last supported version) can be installed in the same way expect the CPU and GPU packages are separate eg

pip install tensorflow==1.15
pip install tensorflow-gpu==1.15

If you wish to use GPU-acceleration you will need to set up the correct version of CuDNN (version depends on version of Tensorflow installed) eg

source /user/software/cuda/cudnn-7.6-cuda10.1-x86_64/setup.sh

This must be set up in any new shell before running TensorFlow.

Running TensorFlow from Containers

TensorFlow builds in Docker Containers are provided by the TensorFlow developers. These can be run on our local system using Singularity eg to run the standard CPU-only TensorFlow

singularity run docker://tensorflow/tensorflow

This will take a few minutes to convert the first time it is run, but will start in seconds subsequently.

GPU support can be enabled by running with the gpu tag and enabling NVidia support eg

singularity run --nv docker://tensorflow/tensorflow:latest-gpu

Note that this requires running on a system with a supported NVidia card. Older cards may well be slower than running directly on CPU.

Support for Jupyter is also included with the jupyter tag eg latest-jupyter or latest-gpu-jupyter. This will print out a URL to use to access the Jupyter webpage, this can be accessed from any HEP system. To access from a browser offsite you will need to use an SSH tunnel.

The container environment is based on Ubuntu so won't have the usual system-supplied software but should be able to access local user storage and networks as normal.

Using TensorFlow

When running TensorFlow you may see warnings about "The TensorFlow library wasn't compiled to use XXX instructions". These can be ignored, the install will be run on a variety of systems so we have to use the lowest common denominator build. Installs can be run with extra optimisations or used with GPU acceleration as required.

Tutorials are provided for getting started, see

https://www.tensorflow.org/get_started/get_started

The Keras frontend is also installed alongside Tensorflow. There are instructions on the webpage

https://keras.io/

Multi-GPU Systems

By default TensorFlow will claim exclusive use of any suitable GPUs on the system, and reserve almost all of their RAM. This can be a problem if there are multiple users on the system wishing to use TensorFlow.

The specific GPUs TensorFlow uses can be limited with the environment variable CUDA_VISIBLE_DEVICES. For example if there are 3 GPU devices the TensorFlow session can be limited to just the first 2 with

CUDA_VISIBLE_DEVICES=0,1 python

Batch jobs submitted through Slurm will automatically constrain jobs to the number of GPUs requested eg a 1 GPU job on a 2 GPU system the cards will always be enumerated as GPU0.

Python processes should be terminated if they are no longer needed, as this will allow any GPU devices claimed by TensorFlow to be made available to other users.

Latest updates as of May 2025

Things move very fast in the machine learning and AI realm. Here are the steps relevant to people trying out machine learning on the alpha.ph.liv.ac.uk machine (Almalinux 9.5 with 6 A100 GPUs). We use jupyter lab as it is most convenient to bypass the technical difficulties in dealing with GPUs directly.

Setup a Python3.12 virtual environment with python3.12 -m venv my_virtual_env
source my_virtual_env/bin/activate
install rapids.ai with https://docs.rapids.ai/install/#selector. Current alpha runs on cuda 12. Use pip. Then copy the install command and run it.
pip install jupyterlab tensorflow[and-cuda]
You now have a personal jupyterlab notebook. Run it with jupyter lab ip=0.0.0.0

rapids.ai will provide GPU multiprocessing to dataframes if used together with %load_ext cudf.pandas

Check that you can access GPUs with

import tensorflow as tf
tf.config.list_physical_devices('GPU')

Analysing and storing data

Typically in HEP, we store data in the TTree format in a ROOT file format. There are more ways to read and write data stored in this way, for example using pandas DataFrame and writing into a parquet file. Some of the issues faced when converting from a TTree to a DataFrame is the variable sized std::vectors but we can solve this by using awkward.flatten(...). It is also more user friendly to use uproot instead of pyROOT.

import os
import uproot
%load_ext cudf.pandas
import pandas as pd
import matplotlib.pyplot as plt
import awkward as ak
import numpy as np
tracksFile = os.path.join("reco.root")
tracks = uproot.open(tracksFile)['mc_tracks']
arr_tracks = tracks.arrays(library="pd")
data_dict = {col: ak.flatten(arr_tracks[col]) for col in arr_tracks if arr_tracks[col].dtype == "awkward"}
arr_tracks = pd.DataFrame(data_dict)
arr_tracks = arr_tracks.dropna()
parquet_file = 'reco.parquet.gzip'
arr_tracks.to_parquet(parquet_file, compression='gzip', engine='fastparquet')

Topic revision: r16 - 09 May 2025, MarkWong

Computing

Categories

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback