Storage of floating-point numbers in memory

1. Introduction to floating point numbers

1. Understanding of floating point numbers

  • First of all, we need to understand that floating-point numbers are decimals in mathematics .
    ——————
    But why are they called floating-point numbers in computer languages?

1. First of all, know that [12.3] is a decimal
2. But it can also be written as [1.23*10^1]
3. After comparing these two numbers, it is not difficult to find that when the same number is expressed in different scientific notation,The decimal point is floating, so this is called a floating point number .

2. Introduction to floating point types

1. Floating point types include the following types:

float 、double 、long double

2. The range of floating-point numbers is defined inflaot.hMiddle:
see its definition:
insert image description here

How floating-point numbers are stored in memory

  • Above we know how to express a floating point number, but how to store it in memory? Is it still the same as plastic surgery?
    Is == still the same as shaping? ==Let’s explore this question: If you still don’t know how plastic surgery is stored in memory, you can take a look at: My last blog post Plastic surgery is stored in memory .
    You can see the code to explore it:
#include<stdio.h>

int main()
{
    
    
	int i = 9;
	float* p =(float*) & i;

	printf("%d\n", i);
	printf("%f\n", i);

	return 0;
}

insert image description here

  1. Of course, if it is different from the storage method of shaping, then is it stored in this way?
    Next is our main event

Storage regulations:

According to the international standard IEEE 754, any base floating-point number V can be expressed in the following form:

V = (-1)^S * M * 2^E

  1. Where (-1)^S represents a valid number, when S is 0, V is a positive number, and when S is 1, V is a negative number.
  2. M represents a valid number, and the range is specified: greater than 1 and less than 2.
  3. 2^E represents the exponent.

To write a floating-point number in base form: the values ​​​​of S, M, and E must be required

————

So let's see how a floating point number is converted into base form:

insert image description here

It can be seen from this: V = (-1)^S * M * 2^E
5.5 --> 101.1 —> 1.011 * 2^2
so S = 0, M = 1.011, E = 2;

Note: It is also conceivable that as long as there are S, M, and E values, this floating-point number can be restored.

1. Storage of floating-point numbers

  • The international standard IEEE 754 stipulates
    that when storing single-precision floating-point numbers in memory:

insert image description here
Of course, in addition to these, there are some special storage regulations for the significant number M and exponent E in the international standard IEEE 754

1. As we know above, M is a valid number, the range: 1<=M<2 , when saving M in the computer memory, the first digit of this number is always 1 by default, and can be discarded (when reading, put The first digit is added), and only the decimal point is reserved; for example: when 101.1 is reserved: only 011 is saved, for easier understanding, as shown in the figure:

insert image description here

2. The deposit of index E is a bit special:
First E is an unsigned number (unsigned int).
Because when it is 8 bits, it has a range of 0 ~ 255; and when it is 11 bits, its range is 0 ~ 2047, but it is known from scientific notation that there can be negative numbers at this time, (when storing 0.5, E = -1), so IEEE 754 stipulates that the real value of E needs to be added with an intermediate value when stored in memory. For 8 bits, the intermediate value is 127, and for 11 bits, the intermediate value is 1024.

And for the storage of double-precision floating-point type:

insert image description here

There are three types of exponent E taken from the memory:
1. When E is all 0: that is to say, the real value of the exponent E is equal to -127, and the effective number M is no longer 1, but a decimal such as 0.xxxxxx , the purpose of doing this is to represent such an almost very small number, close to 0.
——
2. When E is all 1: that is to say, such a number represents infinity (positive or negative depends on S).
——
3. When E has 1 and 0: subtract 127 (or 1024) from the calculated value of exponent E to get the real value, and then it is stipulated that the positive part must be 1 (that is, add 1 after M is taken out).

This is the case for the storage of floating-point types in memory. Below I will give an example to facilitate your understanding of the number:

int main()
{
    
    
	float n = 9.0;
	int* p = (int*)&n;

	printf("%f\n", n);
	printf("%d ", *p);

	return 0;
}

insert image description here
Finally, I hope that comrades can receive their own things after reading my explanation, and finally come on, mount! ! !

Guess you like

Origin blog.csdn.net/m0_66780695/article/details/131013673