ResearchReliablity Projects
Alt text

Vulnerability Analysis for GPU Systems

These highly parallelized units are primarily graphics processing systems. They do not offer too much support for detecting errors and especially transient errors which have long been a significant design constraint in the general purpose processor design. Transient errors (or soft errors) are random bit flips occurring in a system due to radiation from energetic particles. Because protection schemes against soft errors often hurt performance and are very costly, it is essential to add protection at the most vulnerable parts of the system. In this research, we analyze different parts of different GPU architectures and use the Architectural Vulnerability Factor (AVF) to classify the different GPU parts in terms of their vulnerability to soft errors.
Participants
Fritz Gerald Previlon