Overview
In this tutorial, we present a complete simulation infrastructure --Multi2Sim-- supporting the execution of OpenCL and CUDA programs on an architectural model of an APU (Accelerated Processing Unit), Discrete GPU, or heterogeneous device. The infrastructure is composed of runtime libraries, compilers and hardware models. Multi2Sim's runtime libraries replace the real-world, vendor-specific OpenCL and CUDA libraries. The hardware models simulate CPU and GPU pipelines at the ISA level on a cycle basis. The simulation of a complete OpenCL application occurs seamlessly by launching vendor-compliant host and device binaries.
As part of this tutorial, we present recently released support for the Nvidia Kepler platform, a detailed cycle-based GPU microarchitectural performance simulator that runs NVIDIA's Kepler shader assembly (SASS) code. We provide insight into the architecture of our Nvidia Kepler GPU simulation, describing our models of the Streaming Multiprocessor, front end and instruction pipelines.
We also present Multi2Sim-HSA, a heterogeneous system architecture (HSA) emulator that works at the HSA intermediate language (HSAIL) level. This is a new system architecture that emphasizes CPU-GPU collaboration, which requires a tool to better understand the opportunities offered by a closer CPU-GPU relationship. Multi2Sim-HSA provides the user with low-level micro-architectural analysis and a low-level software debugger. In this presentation, we cover the system design of the emulator, including the runtime system, virtual driver, and virtual GPU device. We then take a dive deep into CPU-GPU communication and provide participants with sample use cases of how we can use Multi2Sim-HSA to analyze CPU-GPU system behavior.
As a third element of the tutorial we present Multi2C, our own compiler developed for the AMD Southern Islands simulator for Multi2Sim. Multi2C is a replacement for the proprietary compiler provided by GPU vendors. Multi2C is composed of a Clang-based frontend, an LLVM-based backend and an assembler written by Flex and Bison.
The tutorial is organized in three parts, covering the software and the hardware components involved in the execution of GPU application. Each section of the tutorial is accompanied with simulation examples using working demos. All material to reproduce these demos, as well as the tutorial slides, will be available to the tutorial participants.