High performance computing hardware is increasing complex. Servers features at least two processors, with many cores each, shared caches, non-uniform memory access, and possibly accelerators. The actual hardware organization of these resources has a deep impact of HPC application performance since computation and data transfer speed depends on data locality. Unfortunately this organization as viewed from application is unpredictable. Resources can be hierarchically organized or horizontally ordered differently from one machine to another, making topology assumptions highly non-portable and causing application performance to vary significantly even on apparently-similar platforms.
This lecture will first detail the complexity of current hardware architectures and explain how it matters to HPC application performance. Then we will introduce the hwloc tool (Hardware Locality) which aims at hiding all these deep hardware details and non-portability issues. We will explain how hwloc models the hardware resource organization in an abstracted and portable manner and exposes it in a simple way to applications. Finally we will show how hwloc can ease the building of portable locality optimization in HPC libraries and applications.