Special Numbers - CS 61C Course Notes

1Learning Outcomes¶

Understand how the IEEE 754 standard represents zero, infinity, and NaNs
Understand what overflow or overflow mean with floating point numbers
Understand how denormalized numbers implement “gradual” underflow
Convert denormalized numbers into their decimal counterpart

🎥 Lecture Video (overflow and underflow)

Overflow and Underflow, 6:54 - 8:40

🎥 Lecture Video (everything else)

Normalized numbers are only a fraction (heh) of floating point representations. For single-precision (32-bit), IEEE defines Table 1 based on the exponent field (here, the “biased exponent”):

Table 1:Exponent field values for IEEE 754 single-precision.

Biased Exponent	Significand field	Description
0 (`0000000`)	all zeros	zero
0 (`0000000`)	nonzero	Denormalized numbers, aka denorms
1 – 254	anything	Normalized floating point (mantissa has implicit leading 1)
255 (`1111111`)	all zeros	infinity
255 (`1111111`)	nonzero	`NaN`s

In this section, we will motivate why these “special numbers” exist by considering the pitfalls of overflow and underflow. Then, we’ll define each of the special numbers.

2Overflow and Underflow¶

Because 0 and 255 are reserved exponent fields, the range of normalized single-precision floating point is $[-3.4\times 10^{38}, -2^{-126}]$ and $[+2^{-126}, +3.4\times 10^{38}]$ . Note $2^{-126} \approx 1.2 \times 10^{-38}$ .

Show Explanation

Largest magnitude number: (1.1…1) $\times 2^{(254 - 127)} \approx 3.4 × 10^{38}$ . Note the largest biased exponent is 254, because 255 is reserved for infinity and NaNs.

s	exponent	significand
`s`	`1111 1110`	`111 1111 1111 1111 1111 1111`

Smallest magnitude number: (1.0) $\times 2^{(1-127)} = 2^{-126} \approx 1.2 × 10^{-38}$ . Note the smallest biased exponent is 1, because 0 (all zeros) is reserved for zero and denorms.

s	exponent	significand
`s`	`0000 0001`	`000 0000 0000 0000 0000 0000`

Because the floating point standard represents fractional components, it must now consider both overflow and underflow (Figure 1):

Overflow: Magnitude of value is too large to represent.
Underflow: Magnitude of value is too small to represent.

"TODO" — Figure 1:Floating point representations can encounter both overflow and underflow.

Recall that with integers, integer overflow causes arithmetic results to “wrap around.” This means adding large positive integers can result in negative integers. Unlike integer representations, floating point representations similar to the IEEE 754 standards can more “gracefully” handle overflow, underflow, and errors with special numbers, which we discuss next.

When floating point arithmetic causes overflow, we signal infinity or directly represent arithmetic errors with NaN. With underflow, we gradually move towards zero with denorms.

3Special Numbers¶

See the four non-normalized categories shown in Table 1.

3.1Zero¶

Just like in the sign-magnitude zero, IEEE 754 floating point has two zeros (Table 2). Recall that the standard was built for scientific computing! Having two zeros is mathematically useful. Two examples: limits towards zero and computing $\pm \infty$ , the latter of which we discuss next).

Table 2:IEEE 754 single-precision: Zero

value	s	exponent	significand
+0	`0`	`0000 0000`	`000 0000 0000 0000 0000 0000`
-0	`1`	`0000 0000`	`000 0000 0000 0000 0000 0000`

If we consider its mathematical representation, zero is our first encounter with a floating point representation that is not normalized. After all, there 0.0 in scientific notation has no leading 1!

Floating point hardware often implements zero by reserving the biased exponent value zero 00000000 to signal no normalization, i.e., not to implicitly add 1. If the significand is additionally all zeros, then the hardware knows it is zero. If the significand is non zero, we represent other non-normalized numbers, which we discuss below as denormalized numbers.

3.2Infinity¶

The IEEE 754 standard defines positive infinity ( $+\infty$ ) and negative infinity ( $-\infty$ ), as shown in Table 2. To represent infinity, we reserve the biased exponent value 11111111 and set the significand to zero.

Table 3:IEEE 754 single-precision: Infinity

value	s	exponent	significand
$+\infty$	`0`	`1111 1111`	`000 0000 0000 0000 0000 0000`
$-\infty$	`1`	`1111 1111`	`000 0000 0000 0000 0000 0000`

Because infinity is such an important concept in mathematics, the standard differentiates infinity from other arithmetic errors (which we discuss next). Importantly, dividing by $\pm 0$ yields $\pm \infty$ . Computations like $x / 0 > y$ should be representable,^[1] even if not as actual “numbers.”

3.3Not a Number (NaN)¶

What if we try to compute invalid arithmetic, like $\sqrt{-4}$ or $0/0$ ? For scientific computing, it may be more valuable to “bubble” these errors up to the user–instead of explicitly crashing the program or computing incorrect values due to wrap-around (e.g., in integer overflow).

NaNs (Not a Number) are values of the following form (Table 4):

Table 4:IEEE 754 single-precision: NaNs

s	exponent	significand
either	`1111 1111`	non-zero

Because these values are triggered upon overflow (note the high exponent), they contaminate: $\text{op}(\text{NaN}, x) = \text{NaN}$ .

Certain proprietary hardware for floating point go further and use the significand to encode or identify where errors occur. This practice of error codes is not defined in the standard.

3.4Denormalized Numbers¶

3.4.1Gap around zero¶

In the case of overflow, infinity seems reasonable—after all, it is one step size past the largest representable normalized float (approximately $3.4\times 10^{38}$ ). Similarly, with underflow, zero is indeed one step size past the smallest representable normalized float (2^-126).

However, when we consider the mathematical range in question in Figure 2, we observe a large gap around zero.

Magnitude-wise, this gap is not huge—2^-126 is tiny! However, consider the 23-bit precision of floats. For normalized numbers in this area, we can use our precision to take tiny step sizes of 2^-149.

Show Explanation

Smallest normalized number from before: (1.0) $\times 2^{(1-127)} = 2^{-126}$

s	exponent	significand
`0`	`0000 0001`	`000 0000 0000 0000 0000 0000`

Second smallest normalized number: (1.00...001) $\times 2^{(1-127)} = (1 + 2^{-23}) \times 2^{-126} = 2^{-126} + 2^{-149}$

s	exponent	significand
`0`	`0000 0001`	`000 0000 0000 0000 0000 0000`

Smallest normalized step size is this difference: 2^-149

In this range, we want to maintain high precision to represent tiny steps between said tiny numbers. However, because of the implicit 1 in the normalized mantissa—and zero’s lack thereof—there is a relatively huge difference in step size between 0 and the smallest normalized number compared to the smallest and the second-smallest normalized numbers.

3.4.2Gradual underflow¶

Given the above, the IEEE 754 standard specifies a range of numbers that can be still be used when we encounter underflow, so that not all arithmetic is lost. Denormalized numbers in the standard help support gradual underflow.^[2]

The IEEE 754 standard defines denormalized numbers of the form in Equation (1).

(-1)^\text{s} \times (\text{significand}) \times 2^{-126}

(1)

The standard specifies how to interpret fields for representing denormalized numbers, also known as denorms (Table 5):

Table 5:Sign, exponent, and significand fields for denorms

Field Name	Represents	Denormalized Numbers
s	Sign	1 is negative; 0 is positive
exponent	`0000 0000`	The exponent for denormalized numbers is always implicitly -126.
significand	Fractional Component of the Mantissa	Interpret the significand as a 23-bit fraction (`0.xx...xx`). Do not add implicit 1 to get the mantissa value.

The “implicit exponent” for denorms is the smallest normalized exponent: $2^{1 - 127} = 2^{126}$ . This denormalized exponent therefore enforces a uniform step size of 2¹⁴⁹ across the denormalized range and the smallest normalized numbers^[3]. This consistency also yields the gradual underflow we want, as shown in Figure 3:

Footnotes¶

We defer to math majors.
↩
If a denormalized number results from arithmetic of two normalized numbers, we still say that underflow occured. Put another way, denorms help preserve arithmetic precision during underflow.
↩
We leave it to you to work this out.
↩