IEEE 754 Floating-Point Arithmetic Quiz

The following questions are about floating-point arithmetic as defined by the IEEE 754 standard. The revised version from 2008 generalized floating-point arithmetic and introduced three decimal formats. Here we only consider the binary floating-point formats single precision (32-bit) and double precision (64-bit). Although, most questions are independent of the radix, range of exponent, and precision.

In the sequel we denote the floating-point approximation of any real number by the projection \(\fl \colon \mathbb{R} \to \mathbb{F}\) where \(\mathbb{F}\) is any finite and discrete floating-point number system. If not mentioned otherwise, we assume “round ties to even” which is the default rounding-direction (IEEE 754-2008 §4.3.3).

In order to distinguish between elementary operations over the reals and floating-point numbers we introduce the floating-point operators \(\oplus,\ominus,\otimes,\oslash\) for their counterparts over the reals \({+},{-},{\times},{/}\), respectively. We then have \(x \oplus y = \fl(x + y)\) for floating-point numbers \(x,y\) as well as similar definitions for \(\ominus,\otimes,\oslash\). Furthermore, we lift these operators in order to deal with NaNs and infinities as defined by the IEEE 754 standard.

In order to distinguish between numbers, infinities, and NaNs the following notation is used. A number is typed in normal font like \(x, y, z\). A datum is either an infinity, NaN, or a number and is typed bold face like \(\mathbf{x}, \mathbf{y}, \mathbf{z}\).


You answered 0 out of 0 questions correctly!