fpchecker

Overview

FPChecker is a dynamic analysis tool for profiling floating-point behavior in HPC applications. It gives developers clear, execution-based insight into how floating-point arithmetic behaves under real workloads and is currently the only tool in its class tailored to the HPC domain. Designed for straightforward adoption, FPChecker integrates with existing build workflows and produces detailed reports that pinpoint the exact file and line locations of numerical issues, including exceptions, high accumulated rounding error, cancellation, and related events. Its rounding-error reports also support data-driven mixed-precision tuning by highlighting code locations where promotion to higher precision is most beneficial.

Features

  • Easy adoption: FPChecker requires only minor build-script updates, such as replacing the compiler invocation (for example, clang++) with FPChecker wrappers (for example, clang++-fpchecker). Instrumentation is then applied automatically at build time.
  • Execution-accurate detection: FPChecker reports issues based on actual execution for the selected inputs, reducing false alarms from paths that are not exercised.
  • Built for HPC workflows: FPChecker supports key HPC languages and programming models, including C/C++, MPI, and, in selected modes, OpenMP and Pthreads.
  • Actionable reporting: FPChecker produces detailed reports that identify the exact source location (file and line) of floating-point issues.
  • Mixed-precision guidance: Rounding-error accumulation reports highlight high-risk lines, helping users prioritize selective FP32-to-FP64 promotion based on measured numerical impact.
fpchecker

Errors and Warnings

FPChecker detects the following floating-point issues:

  • Infinity +: detected when operations produce positive infinity, for example, when 1.0 / 0.0 occurs.
  • Infinity -: this is the same as infinity +, except that the sign of the resulting calculation is negative.
  • NaN: NaN (not a number) results from invalid operations, such as 0/0 or sqrt(-1).
  • Division by zero: This occurs when a finite nonzero number is divided by zero. This typically produces either infinity or NaN.
  • Underflow (subnormal): Underflow is detected when an operation produces a subnormal number because the result was not representable as a normal number. More here.
  • Comparison: This occurs when two floating-point numbers are compared for equality. Sometimes checking if two floating-point numbers are equal can lead to inaccuracies. More here.
  • Cancellation: cancellation occurs when two nearly equal numbers are subtracted. By default, this event is detected when at least ten decimal digits are lost due to a subtraction. More here.
  • Latent Infinity +: detected when an operation produces a large normal and is close to positive infinity.
  • Latent Infinity -: detected when an operation produces a large normal number and is close to negative infinity.
  • Latent underflow: detected when an operation produces a small normal number and is close to becoming an underflow (subnormal number).

Exponent Usage

FPChecker profiles the code and quantifies the exponent usage of the application in FP32 and FP64 precision. For FP32 (single-precision) and FP64 (double-precision), these ranges determine the magnitude of the numbers that can be represented. While the internal representation uses a base 2 exponent, the equivalent range in base 10 is often used to provide a more intuitive understanding of the scale of numbers supported:

  • FP32: an approximate base 10 exponent range from 10−38 to 1038.

  • FP64: an approximate base 10 exponent range from 10−308 to 10308.

FPChecker can create histograms of the exponent usage in your application. Understanding the exponent usage in your application allows you to understand the numerical magnitudes your code operates on. This is useful when porting code to lower precision or mixed-precision.

fpchecker

Rounding Error Tracking

FPChecker can also generate line-level rounding-error accumulation reports to identify numerically unstable operations in the code. These reports support a data-driven mixed-precision workflow by showing where selective FP32 and FP64 promotion is most beneficial for accuracy while minimizing performance impact.

See Rounding Error Tracking for details.

fpchecker-rounding-error-report

How FPChecker Works

FPChecker is designed as an extension of the clang/LLVM compiler. When the application is compiled, an LLVM pass instruments the LLVM IR code after optimizations and inserts check code to all floating-point operations. The check code calls routines in the FPChecker runtime system, which detects several floating-point events (see above). When the code execution ends, traces are saved in the current directory. These traces are then used to build a detailed report of the location of the detected events.

Demo

Contact

For questions, contact Ignacio Laguna ilaguna@llnl.gov.

To cite FPChecker please use

Laguna, Ignacio. "FPChecker: Detecting Floating-point Exceptions in GPU Applications." In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1126-1129. IEEE, 2019.

Laguna, Ignacio, Tanmay Tirpankar, Xinyi Li, and Ganesh Gopalakrishnan. "FPChecker: Floating-point exception detection tool and benchmark for parallel and distributed hpc." In 2022 IEEE International Symposium on Workload Characterization (IISWC), pp. 39-50. IEEE, 2022.