HPL (Linpack) with CUDA - Install Guide

I am writing this HPL with CUDA install guide to help others get Linpack installed on a CUDA-accelerated machine or cluster. I had troubles getting this installed and configured correctly, so I figured that I should help others out and demonstrate the issues I ran into along the way. The hardware that I used for this installation was 3 Dell PowerEdge R720's with the following hardware configurations:

  • CPU: 2 x Intel Xeon E5-2660
  • RAM: 64GB (8 x 8GB) of DDR3-1333Mhz Low-Voltage
  • GPU: 2 x nVidia Tesla M2090
  • Storage: 2 x 100GB Dell Solid State Drives
  • Interconnect: Mellanox QDR Infiniband & built-in 10G ethernet

Concerning the software stack I used for this, it is as follows (if I miss something in this list, please make sure that you leave a comment and I will get it added):

  • Operating System: StackIQ's Rocks+ 6.0.2 Cluster Software with CentOS 6.2 (64-bit) Download
  • GPU / CUDA Driver: nVidia Linux x64 Driver v310.19 (updated from the standard Rocks+ install)
    Download & Installation Guide
  • Compilers: GNU Compiler Collection v4.4.6 (in the standard Rocks+ install)
    Download & Installation Guide
  • BLAS / LAPACK Libraries: Intel MKL v11 (can be installed using an additional Rocks+ roll)
    Download & Installation Guide
  • Infiniband Drivers: Mellanox OFED drivers (installed from the default Rocks+ install
  • MPI: OpenMPI v1.4.5 (I had troubles installing HPL with the newer versions)

Now that you know what hardware and software stack was used, let's move onto how to actually install everything. The last thing you will need to know about my references in this guide is that /installdir/ is the directory that you choose to install HPL into. I have used the /home/username/ and /share/apps/ directories before and they both work, but it is up to you to choose.

One last thing before we get started, you need to make sure that you have installed OpenMPI v1.4.5 on your system before you can do this. You also need to make sure that you have edited your .bashrc (or other similar login profile file for another OS, but it is the .bashrc file for Red Hat Enterprise Linux or CentOS) to include the following:

# OpenMPI v1.4.5 Settings
export PATH=/openmpi-install-dir/bin:$PATH
export INCLUDE=/openmpi-install-dir/include:$INCLUDE
export LDLIBRARYPATH=/openmpi-install-dir/lib:$LDLIBRARYPATH

Now we can install HPL on your system:

  1. Download the nVidia HPL v2.0 (I am using v13 for this guide) from the nVidia Developers Zone after you register (link here) and place it into the /installdir/.
  2. Open the shell and change to that directory:
    [root@linux root] $ cd /install-dir/
  3. Untar the contents of the .tar file you just downloaded:
    [root@linux root] $ tar -xf hpl-2.0_FERMI_v13.tgz
  4. Navigate into the directory that was just created:
    [root@linux root] $ cd hpl-2.0_FERMI_v13
  5. Open the Make.CUDA file for editing using your favorite editor (I prefer vim):
    [root@linux root] $ vim Make.CUDA
  6. Make the following changes to that file:
    NOTE: Ignore the Line #xxx: whenever you are making changes to the file. I am just putting that in there so you can easily find the line you are supposed to be changing.
    • Line #104: TOPdir = /directory_where_files_were_unzipped/hpl-2.0_FERMI_v13
    • Comment out Lines #119 through #122 (place a # in front of the line).
    • Line #132: LAdir = /opt/intel/composer_xe_2013.1.117/mkl/lib/intel64
    • Line #133: LAinc = -I/opt/cuda/include
    • Line #135: LAlib = -L$(TOPdir)/src/cuda -ldgemm -L/usr/local/cuda/lib64 -lcublas -lcuda -lcudart -L$(LAdir) -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core
    • Comment out Lines #212 through #213 (place a # in front of the line).
    • Line #215: CC = mpiicc
    • Line #216: CCFLAGS = $(HPL_DEFS) -fomit-fram-pointer -O3 -funroll-loops -W -Wall -fopenmp
  7. Make sure you save those changes to that file.
  8. Just as a special note, if your CUDA Toolkit isn’t installed in the /usr/local/cuda/ directory (mine is at /opt/cuda/), then you will need to create a link to where it is installed. For me, this is what I did (do this as root or use sudo before the command):
    [root@linux root] $ ln -s /opt/cuda /usr/local/cuda
  9. Once you have made that link (if it is necessary), you then need to make everything:
    [root@linux root] $ make arch=CUDA
  10. After that is complete, if everything went correctly, then you should be able to go to the /installdir/hpl-2.0_FERMI_v13/bin/CUDA/ directory and see the following 5 files:
    • HPL.dat
    • HPL.datexample
    • outputexample
    • run_linpack
    • xhpl
  11. If you see those 5 files, then you are good to go. Now you are free to use the HPL.dat and run_linpack files to start tweaking Linpack for your cluster!

Thanks for reading guys and gals and I hope that I helped someone. If you have any questions, please feel free and reach out to me using either LinkedIn or Twitter (both links are in the header).