Insight 5 - Floating Point Representation and their Computational Issues

- novembre 01, 2018

Floating-point representation

In computing, a floating-point representation is an approximation which allows modern computers to support the trade-off between range and precision when handling real numbers. Representing numbers as integers in a fixed number of bits has some notable limitations. It cannot handle numbers that have a fraction and it's not suitable for very large numbers that don't fit into (e.g.) 32 bits.
In general, a number is represented by an approximation to a fixed number of significant digits (the significant) and scaled using an exponent for some fixed base; the base for the scaling is normally two, ten, or sixteen. A number that can be represented exactly has the following form:

where significand, base, and exponent are all integers and the base is greater than or equal to two.
As an example, consider 12345 x 10^-4 = 1.2345, or 11111 x 10^-1 = 1111.1.
The term floating point refers to the fact that a number's radix point (decimal point or, more commonly in computer science, binary point) can "float"; that is, it can be placed anywhere relative to the significant digits of the number. This position is indicated as the exponent component and consequently the floating-point representation can be thought of as a kind of scientific notation.
One of the greater advantage of using this representation on modern systems is because it allows to represent, using just a fixed number of digits, numbers that have a very different order of magnitude.

Even though the use of the mentioned representation (and arithmetic) is vital for the performance and functioning of modern systems, it may have some issue which are typically known as catastrophic cancellation and rounding error.
A catastrophic cancellation occurs when an operation on two numbers increases the relative error (substantially) more than it increases absolute error. For instance, it occurs when subtracting two nearly equal numbers. On the other hand, a rounding error occurs when some numbers are rounded too early during a computation and/or when this is not done properly. It's the difference between a rounded-off value and the actual value.

- Catastrophic cancellation example

Consider the decimal number 0.1234567891234567890, which on a machine that keeps 10 floating-point digits it would be represented by 0.1234567891, which is fairly close when measuring the error as a percentage of the value but it's very different when measured in order of precision.
Now consider the following subtraction:

0.1234567891234567890 - 0.1234567890000000000

The answer accurate to 20 significant digits is 0.0000000001234567890 but, however, the result on the 10-digits floating-point machine yields

0.1234567891 - 0.1234567890 = 0.0000000001

In both cases the result is accurate to the same order of magnitude as the input but, in the second case, there's a cancellation of several significant digits.

- Rounding error example

As an example of rounding error, consider the speed of light in a vacuum. The official value is 299,792,458 meters per second. In scientific notation (power 10), that quantity is expressed as 2.299792458 x 10⁸and rounding it to three decimal places yields 2.998 x 10⁸. The rounding error is the difference between the actual value and the rounded value, in this case it's equal to 0.00007542 x 10⁸. In scientific notation, the rounding error is 7.542 x 10³, which is equals to 7452 in plain decimal notation.

Cerca nel blog

Statistics for Cybersecurity

Insight 5 - Floating Point Representation and their Computational Issues

Commenti

Posta un commento

Post popolari in questo blog

Welford Algorithm

Research 6 - Derivation of Chebyshev's inequality and its application to prove the (weak) LLN

Research 7 -- Central Limit Theorem, LLN, and most common probability distributions