IEEE-754

Floating Point Made Simple!
The IEEE-754 standard defines how computers represent floating point numbers — that is, numbers with decimals like 3.14 or -0.001.
It helps ensure that all systems (computers, languages, CPUs) store and calculate with floating point numbers consistently.
Why Do We Need IEEE-754?
Computers use binary (1s and 0s), so representing fractions like 1/3 or 0.1 isn't straightforward. IEEE-754 defines a binary format for real numbers so they can be stored and calculated efficiently with a predictable level of precision.
What Is a Floating Point Number?
A floating point number has three parts:
- Sign: is the number positive or negative?
- Exponent: how many times do we shift the decimal point?
- Mantissa (also called significand): the actual digits of the number
It works similarly to scientific notation.
Example:
1234.56 in scientific notation = 1.23456 × 10^3
In binary, IEEE-754 stores numbers as:
value = (-1)^sign × 1.mantissa × 2^(exponent - bias)
IEEE-754 Formats
The most common formats are:
Format | Total Bits | Sign | Exponent | Mantissa (Fraction) | Approx. Decimal Digits |
---|---|---|---|---|---|
Half | 16 | 1 | 5 | 10 | ~3-4 digits |
Single | 32 | 1 | 8 | 23 | ~7 digits |
Double | 64 | 1 | 11 | 52 | ~15-17 digits |
Quad | 128 | 1 | 15 | 112 | ~33 digits |
Example: IEEE-754 Single Precision (32 bits)
A 32-bit floating point number is structured as:
- 1 bit for the sign
- 8 bits for the exponent
- 23 bits for the mantissa
Let's break down a sample number: 0.15625
- Convert to binary:
0.15625 = 0.00101 (in binary)
- Normalize it:
0.00101 = 1.01 × 2^-3
- Store it in IEEE-754 format:
- Sign = 0 (positive)
- Exponent = -3 + 127 = 124 →
01111100
- Mantissa = 010000... (fill the rest with 0s)
So the full 32-bit representation is:
0 01111100 01000000000000000000000
What Is the Bias?
IEEE-754 uses a bias to store positive and negative exponents without a sign bit. For single precision, the bias is 127. So an exponent of 0 is stored as 127
, 1 as 128
, and -1 as 126
.
Special Values
IEEE-754 also has special cases for certain values:
Value | Description |
---|---|
0.0 | All bits 0 |
-0.0 | Sign bit 1, rest 0 |
∞ | Exponent all 1s, mantissa all 0s |
-∞ | Exponent all 1s, mantissa all 0s, sign bit 1 |
NaN | Exponent all 1s, non-zero mantissa |
Denormals | Exponent all 0s, non-zero mantissa |
Denormal numbers allow representation of values very close to zero.
Rounding and Precision
IEEE-754 includes rounding rules, and operations are performed with limited precision, which can lead to small rounding errors.
For example:
>>> 0.1 + 0.2
0.30000000000000004
This happens because 0.1 and 0.2 cannot be exactly represented in binary, so the result is close, but not exact.
Summary
- IEEE-754 is the standard for representing floating point numbers in computers.
- It breaks numbers into sign, exponent, and mantissa.
- It supports different sizes (32-bit, 64-bit, etc.) for different levels of precision.
- Special values like infinity and NaN are built into the format.
- Small rounding errors are expected and normal.
IEEE-754 is a brilliant compromise between precision and performance — once you understand how it works, debugging floating point issues becomes much easier.