IEEE-754

IEEE-754

Floating Point Made Simple!

The IEEE-754 standard defines how computers represent floating point numbers — that is, numbers with decimals like 3.14 or -0.001.

It helps ensure that all systems (computers, languages, CPUs) store and calculate with floating point numbers consistently.

Why Do We Need IEEE-754?

Computers use binary (1s and 0s), so representing fractions like 1/3 or 0.1 isn't straightforward. IEEE-754 defines a binary format for real numbers so they can be stored and calculated efficiently with a predictable level of precision.

What Is a Floating Point Number?

A floating point number has three parts:

  • Sign: is the number positive or negative?
  • Exponent: how many times do we shift the decimal point?
  • Mantissa (also called significand): the actual digits of the number

It works similarly to scientific notation.

Example:

1234.56 in scientific notation = 1.23456 × 10^3

In binary, IEEE-754 stores numbers as:

value = (-1)^sign × 1.mantissa × 2^(exponent - bias)

IEEE-754 Formats

The most common formats are:

Format Total Bits Sign Exponent Mantissa (Fraction) Approx. Decimal Digits
Half 16 1 5 10 ~3-4 digits
Single 32 1 8 23 ~7 digits
Double 64 1 11 52 ~15-17 digits
Quad 128 1 15 112 ~33 digits

Example: IEEE-754 Single Precision (32 bits)

A 32-bit floating point number is structured as:

  • 1 bit for the sign
  • 8 bits for the exponent
  • 23 bits for the mantissa

Let's break down a sample number: 0.15625

  1. Convert to binary:
0.15625 = 0.00101 (in binary)
  1. Normalize it:
0.00101 = 1.01 × 2^-3
  1. Store it in IEEE-754 format:
  • Sign = 0 (positive)
  • Exponent = -3 + 127 = 124 → 01111100
  • Mantissa = 010000... (fill the rest with 0s)

So the full 32-bit representation is:

0 01111100 01000000000000000000000

What Is the Bias?

IEEE-754 uses a bias to store positive and negative exponents without a sign bit. For single precision, the bias is 127. So an exponent of 0 is stored as 127, 1 as 128, and -1 as 126.

Special Values

IEEE-754 also has special cases for certain values:

Value Description
0.0 All bits 0
-0.0 Sign bit 1, rest 0
Exponent all 1s, mantissa all 0s
-∞ Exponent all 1s, mantissa all 0s, sign bit 1
NaN Exponent all 1s, non-zero mantissa
Denormals Exponent all 0s, non-zero mantissa

Denormal numbers allow representation of values very close to zero.

Rounding and Precision

IEEE-754 includes rounding rules, and operations are performed with limited precision, which can lead to small rounding errors.

For example:

>>> 0.1 + 0.2
0.30000000000000004

This happens because 0.1 and 0.2 cannot be exactly represented in binary, so the result is close, but not exact.

Summary

  • IEEE-754 is the standard for representing floating point numbers in computers.
  • It breaks numbers into sign, exponent, and mantissa.
  • It supports different sizes (32-bit, 64-bit, etc.) for different levels of precision.
  • Special values like infinity and NaN are built into the format.
  • Small rounding errors are expected and normal.

IEEE-754 is a brilliant compromise between precision and performance — once you understand how it works, debugging floating point issues becomes much easier.