IEEE-754

Floating Point Made Simple!

The IEEE-754 standard defines how computers represent floating point numbers — that is, numbers with decimals like 3.14 or -0.001.

It helps ensure that all systems (computers, languages, CPUs) store and calculate with floating point numbers consistently.

Why Do We Need IEEE-754?

Computers use binary (1s and 0s), so representing fractions like 1/3 or 0.1 isn't straightforward. IEEE-754 defines a binary format for real numbers so they can be stored and calculated efficiently with a predictable level of precision.

What Is a Floating Point Number?

A floating point number has three parts:

Sign: is the number positive or negative?
Exponent: how many times do we shift the decimal point?
Mantissa (also called significand): the actual digits of the number

It works similarly to scientific notation.

Example:

1234.56 in scientific notation = 1.23456 × 10^3

In binary, IEEE-754 stores numbers as:

value = (-1)^sign × 1.mantissa × 2^(exponent - bias)

IEEE-754 Formats

The most common formats are:

Format	Total Bits	Sign	Exponent	Mantissa (Fraction)	Approx. Decimal Digits
Half	16	1	5	10	~3-4 digits
Single	32	1	8	23	~7 digits
Double	64	1	11	52	~15-17 digits
Quad	128	1	15	112	~33 digits

Example: IEEE-754 Single Precision (32 bits)

A 32-bit floating point number is structured as:

1 bit for the sign
8 bits for the exponent
23 bits for the mantissa

Let's break down a sample number: 0.15625

Convert to binary:

0.15625 = 0.00101 (in binary)

Normalize it:

0.00101 = 1.01 × 2^-3

Store it in IEEE-754 format:

Sign = 0 (positive)
Exponent = -3 + 127 = 124 → 01111100
Mantissa = 010000... (fill the rest with 0s)

So the full 32-bit representation is:

0 01111100 01000000000000000000000

What Is the Bias?

IEEE-754 uses a bias to store positive and negative exponents without a sign bit. For single precision, the bias is 127. So an exponent of 0 is stored as 127, 1 as 128, and -1 as 126.

Special Values

IEEE-754 also has special cases for certain values:

Value	Description
0.0	All bits 0
-0.0	Sign bit 1, rest 0
∞	Exponent all 1s, mantissa all 0s
-∞	Exponent all 1s, mantissa all 0s, sign bit 1
NaN	Exponent all 1s, non-zero mantissa
Denormals	Exponent all 0s, non-zero mantissa

Denormal numbers allow representation of values very close to zero.

Rounding and Precision

IEEE-754 includes rounding rules, and operations are performed with limited precision, which can lead to small rounding errors.

For example:

>>> 0.1 + 0.2
0.30000000000000004

This happens because 0.1 and 0.2 cannot be exactly represented in binary, so the result is close, but not exact.

Summary

IEEE-754 is the standard for representing floating point numbers in computers.
It breaks numbers into sign, exponent, and mantissa.
It supports different sizes (32-bit, 64-bit, etc.) for different levels of precision.
Special values like infinity and NaN are built into the format.
Small rounding errors are expected and normal.

IEEE-754 is a brilliant compromise between precision and performance — once you understand how it works, debugging floating point issues becomes much easier.

IEEE-754

Why Do We Need IEEE-754?

What Is a Floating Point Number?

IEEE-754 Formats

Example: IEEE-754 Single Precision (32 bits)

What Is the Bias?

Special Values

Rounding and Precision

Read next

Install Kong OSS DB-less mode in Ubuntu Mate Impish 21.10 in Raspberry PI 4

Install DecK for Kong in Ubuntu 20.04 (focal)

Authorize a user to authenticate with SSH key