Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

Normalized numbers are only a fraction (heh) of floating point representations. For single-precision (32-bit), IEEE defines Table 1 based on the exponent field (here, the “biased exponent”):

Table 1:Exponent field values for IEEE 754 single-precision.

Biased ExponentSignificand fieldDescription
0 (0000000)all zeroszero
0 (0000000)nonzeroDenormalized numbers, aka denorms
1 – 254anythingNormalized floating point (mantissa has implicit leading 1)
255 (1111111)all zerosinfinity
255 (1111111)nonzeroNaNs

In this section, we will motivate why these “special numbers” exist by considering the pitfalls of overflow and underflow. Then, we’ll define each of the special numbers.

2Overflow and Underflow

Because 0 and 255 are reserved exponent fields, the range of normalized single-precision floating point is [3.4×1038,2126][-3.4\times 10^{38}, -2^{-126}] and [+2126,+3.4×1038][+2^{-126}, +3.4\times 10^{38}]. Note 21261.2×10382^{-126} \approx 1.2 \times 10^{-38}.

Because the floating point standard represents fractional components, it must now consider both overflow and underflow (Figure 1):

"TODO"

Figure 1:Floating point representations can encounter both overflow and underflow.

Recall that with integers, integer overflow causes arithmetic results to “wrap around.” This means adding large positive integers can result in negative integers. Unlike integer representations, floating point representations similar to the IEEE 754 standards can more “gracefully” handle overflow, underflow, and errors with special numbers, which we discuss next.

When floating point arithmetic causes overflow, we signal infinity or directly represent arithmetic errors with NaN. With underflow, we gradually move towards zero with denorms.

3Special Numbers

See the four non-normalized categories shown in Table 1.

3.1Zero

Just like in the sign-magnitude zero, IEEE 754 floating point has two zeros (Table 2). Recall that the standard was built for scientific computing! Having two zeros is mathematically useful. Two examples: limits towards zero and computing ±\pm \infty, the latter of which we discuss next).

Table 2:IEEE 754 single-precision: Zero

valuesexponentsignificand
+000000 0000000 0000 0000 0000 0000 0000
-010000 0000000 0000 0000 0000 0000 0000

If we consider its mathematical representation, zero is our first encounter with a floating point representation that is not normalized. After all, there 0.0 in scientific notation has no leading 1!

Floating point hardware often implements zero by reserving the biased exponent value zero 00000000 to signal no normalization, i.e., not to implicitly add 1. If the significand is additionally all zeros, then the hardware knows it is zero. If the significand is non zero, we represent other non-normalized numbers, which we discuss below as denormalized numbers.

3.2Infinity

The IEEE 754 standard defines positive infinity (++\infty) and negative infinity (-\infty), as shown in Table 2. To represent infinity, we reserve the biased exponent value 11111111 and set the significand to zero.

Table 3:IEEE 754 single-precision: Infinity

valuesexponentsignificand
++\infty01111 1111000 0000 0000 0000 0000 0000
-\infty11111 1111000 0000 0000 0000 0000 0000

Because infinity is such an important concept in mathematics, the standard differentiates infinity from other arithmetic errors (which we discuss next). Importantly, dividing by ±0\pm 0 yields ±\pm \infty. Computations like x/0>yx / 0 > y should be representable,[1] even if not as actual “numbers.”

3.3Not a Number (NaN)

What if we try to compute invalid arithmetic, like 4\sqrt{-4} or 0/00/0? For scientific computing, it may be more valuable to “bubble” these errors up to the user–instead of explicitly crashing the program or computing incorrect values due to wrap-around (e.g., in integer overflow).

NaNs (Not a Number) are values of the following form (Table 4):

Table 4:IEEE 754 single-precision: NaNs

sexponentsignificand
either1111 1111non-zero

Because these values are triggered upon overflow (note the high exponent), they contaminate: op(NaN,x)=NaN\text{op}(\text{NaN}, x) = \text{NaN}.

Certain proprietary hardware for floating point go further and use the significand to encode or identify where errors occur. This practice of error codes is not defined in the standard.

3.4Denormalized Numbers

3.4.1Gap around zero

In the case of overflow, infinity seems reasonable—after all, it is one step size past the largest representable normalized float (approximately 3.4×10383.4\times 10^{38}). Similarly, with underflow, zero is indeed one step size past the smallest representable normalized float (2-126).

However, when we consider the mathematical range in question in Figure 2, we observe a large gap around zero.

"TODO"

Figure 2:Because of underflow, there is a “gap” of representable numbers around zero.

Magnitude-wise, this gap is not huge2-126 is tiny! However, consider the 23-bit precision of floats. For normalized numbers in this area, we can use our precision to take tiny step sizes of 2-149.

In this range, we want to maintain high precision to represent tiny steps between said tiny numbers. However, because of the implicit 1 in the normalized mantissa—and zero’s lack thereof—there is a relatively huge difference in step size between 0 and the smallest normalized number compared to the smallest and the second-smallest normalized numbers.

3.4.2Gradual underflow

Given the above, the IEEE 754 standard specifies a range of numbers that can be still be used when we encounter underflow, so that not all arithmetic is lost. Denormalized numbers in the standard help support gradual underflow.[2]

The IEEE 754 standard defines denormalized numbers of the form in Equation (1).

(1)s×(significand)×2126(-1)^\text{s} \times (\text{significand}) \times 2^{-126}

The standard specifies how to interpret fields for representing denormalized numbers, also known as denorms (Table 5):

Table 5:Sign, exponent, and significand fields for denorms

Field NameRepresentsDenormalized Numbers
sSign1 is negative; 0 is positive
exponent0000 0000The exponent for denormalized numbers is always implicitly -126.
significandFractional Component of the MantissaInterpret the significand as a 23-bit fraction (0.xx...xx). Do not add implicit 1 to get the mantissa value.

The “implicit exponent” for denorms is the smallest normalized exponent: 21127=21262^{1 - 127} = 2^{126}. This denormalized exponent therefore enforces a uniform step size of 2149 across the denormalized range and the smallest normalized numbers[3]. This consistency also yields the gradual underflow we want, as shown in Figure 3:

"TODO"

Figure 3:Gradual underflow by specifying denormalized numbers in the IEEE 754 standard.

Footnotes
  1. We defer to math majors.

  2. If a denormalized number results from arithmetic of two normalized numbers, we still say that underflow occured. Put another way, denorms help preserve arithmetic precision during underflow.

  3. We leave it to you to work this out.