fpchecker

Overview

FPChecker is a dynamic analysis tool to detect floating-point errors in HPC applications. It is the only tool of its class that supports the most common programming languages and models in HPC, including C/C++, MPI, OpenMP, and CUDA. It is designed to be easy to use and easy to integrate into applications. The tool provides a detailed HTML report that helps users identify the exact location of floating-point errors in the software.

Features

  • Easy to use: it only requires a few changes to the application build script, such as changing the compiler (e.g., clang++) by the FPChecker compiler wrappers (e.g., clang++-fpchecker). It automatically instruments the code at build time.
  • Accurate detection: it accurately detects errors dynamically (when code is executed) for specific inputs; it doesn’t give alarms for unused or invalid inputs.
  • Designed for HPC: it supports the most used programming languages and models in HPC: C/C++, MPI, OpenMP, Pthreads, and CUDA.
  • Detailed report: it provides a detailed report that programmers can use to identify the exact location (file and line number) of floating-point errors in the software.
fpchecker

Errors and Warnings

FPChecker detects the following floating-point issues:

  • Infinity +: detected when operations produce positive infinity, for example, when 1.0 / 0.0 occurs.
  • Infinity -: this is the same as infinity +, except that the sign of the resulting calculation is negative.
  • NaN: NaN (not a number) results from invalid operations, such as 0/0 or sqrt(-1).
  • Division by zero: This occurs when a finite nonzero number is divided by zero. This typically produces either infinity or NaN.
  • Underflow (subnormal): Underflow is detected when an operation produces a subnormal number because the result was not representable as a normal number. More here.
  • Comparison: This occurs when two floating-point numbers are compared for equality. Sometimes checking if two floating-point numbers are equal can lead to inaccuracies. More here.
  • Cancellation: cancellation occurs when two nearly equal numbers are subtracted. By default, this event is detected when at least ten decimal digits are lost due to a subtraction. More here.
  • Latent Infinity +: is detected when an operation produces a large normal and is close to positive infinity.
  • Latent Infinity -: is detected when an operation produces a large normal number and is close to negative infinity.
  • Latent underflow: is detected when an operation produces a small normal number and is close to becoming an underflow (subnormal number).

How FPChecker Works

FPChecker is designed as an extension of the clang/LLVM compiler. When the application is compiled, an LLVM pass instruments the LLVM IR code after optimizations and inserts check code to all floating-point operations. The check code calls routines in the FPChecker runtime system, which detects several floating-point events (see above). When the code execution ends, traces are saved in the current directory. These traces are then used to build a detailed report of the location of the detected events.

Demo

Contact

For questions, contact Ignacio Laguna ilaguna@llnl.gov.

To cite FPChecker please use

Laguna, Ignacio. "FPChecker: Detecting Floating-point Exceptions in GPU Applications." In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1126-1129. IEEE, 2019.