# Fixed-Point Numbers

## Introduction

Modern computers have access to dedicated floating point units, but in older machines (such as the GBA) this was not the case. In such cases, if we require the use of fractional numbers in our code we have to use integers. If used directly, integers will cause a loss of precision, which is why fixed-point numbers are used.

Fixed-point are integer representations of fractional numbers. Normally you need to know how many bits of precision are used for the unit part and how many for the fractional part. For example if we want to represent the number 123.4567 we would need to use 4 bits of fractional precision, though we can use more if necessary. We normally denote the number of bits used for the integer part (i) and the fractional part (f) as (i.f). For example, if we are using 32 bit numbers and we want to have 10 bits of precision the notation to describe these numbers would be (22.10). Note that in some cases, signed numbers are described as (1.i.f) but this doesn’t mean that we have a separate sign bit, we are always using integers with two’s complement numbers.

It is a bit difficult to see this representation with decimal numbers
but using hex it should be clear. For example the 32 bit number
0xAABBCCDD at (20.12) would have 0xAABBC as the integer part and 0xCDD
as the fractional part. This means building a fixed-point number is as
simple as shifting left the integer part and adding the fractional part
to it: `fixe_num = (0xAABBC << 12) | 0xCDD`

. Likewise
we can get the fractional part with a masked `and`

and the
integer part with a shift down:
`integer_part = 0xAABBCCDD >> 12;`

,
`fractional_part = 0xAABBCCDD & 0xFFF;`

. This only
applies to unsigned numbers, since sign may be considered when getting
the fractional part and using a mask would effectively destroy the
sign.

One important consideration of fixed-point numbers is that depending on the selected integer precision we are limited in the largest number we can represent. With that said, nothing stops us from reducing shifting the fractional bit around as needed in our algorithms.

## Math

Working with fixed-point numbers require certain things to ensure the
correctness of arithmetic operations. In case of addition and
subtraction, the numbers must have the same fixed-point representation,
that is, the number of bits being used for the fractional part. When
multiplying fixed-point numbers, we need to also multiply the fractional
scale, which translate to a right shift after the multiplication
(`fpa * fpb = (A * B) >> f`

). To keep the highest
accuracy, when using division, we shift the scale before we divide
(`fpa / fpb = (A << f) / B`

).