Notes on using binary scientific notation to represent float data

//Carried from my space log

//2017-10-25 22:01

 

First of all, let’s talk about today’s difficulties

        The thing the teacher talked about, I will describe it as using binary scientific notation to represent float data

        I believe everyone is familiar with the scientific notation of decimal system, and the formula given by the teacher is 2x×1.y, here we just need to change the position and replace it with 1.y×2x. Is it more familiar? In the decimal system, 1.y belongs to 1 to 10, and the binary system is naturally one to two.

        After understanding the meaning of this formula, the next step is to express it with a float data type, which is the form stored in the computer.

        Insert a picture here

        Everyone knows that the float data type occupies 4 bytes, which is 32 bits. The double is 64 bits (8 bytes), so the precision will be higher. The first of the 32 bits of the float data type is the sign, 1 is negative, and 0 is positive. The last 8 bits are the exponent bits (order code), which is the binary expression of x (complement to 8 bits). Finally, there is the decimal place (mantissa), which is the binary expression of y (padded to 23 digits). The x and y in the double data are 11 and 52 bits respectively. In fact, if you don’t look at the sign bit and reverse x and y, it’s the familiar form of scientific notation.

        We give a floating-point data: 129.96 (> 1)

        First, separate the largest power of 2, which is 128, the 7th power of 2, and x=7, which is 00000111. How to understand it, you can think of it as taking the integer part of the original number into binary, and then taking the highest bit.

        The remaining 1.96 is 2 to the 7th power multiplied by 0.y.

        Take out 1 first. Here, the y is converted to binary. The specific method is mentioned in the teacher's class, which is to multiply by 2 and then one takes 1, and no 1 takes 0. The number of bits taken is 23-x. The specific process will not be repeated, the final result is 1.11110101... (23-x digits after the decimal point). Finally, move the decimal point to the left according to the power of 2 taken out earlier. The analogy to decimal scientific notation is easy to understand. If the exponent of 10 is a few, it is shifted to the left.

        But the x here is not the final x, and 128 has to be added. Personally think this is to prevent the original number from being less than 1, and x being negative (that is, to ensure that x is positive).

        The original number>0, the sign bit is 0.

        The final result is: 0 10000111 00000011111010111000010

        We are giving an example: 0.073 (<1)

        Here x is a negative number, one by one: 2 to the -1 power, 2 to -2... There are 2 to the power of 4, 0.0625. The remaining 0.0105, the same method, multiply by two and multiply once. Here x is -4, we take 23+4=27 bits. The final result is .000000010101100... we move the decimal point 4 places to the right to get .00010101100... (should be 23 places).

        The sign is 0, and x=-4+128=124 is 01111100.

        The final result: 0 01111100 00010101100000010000010

        Finally, I have two links, interested students can take a look:
        1. The original code, complement, inverse code detailed explanation
         http://www.cnblogs.com/zhangziqiu/archive/2011/03/30/ComputerCode.html  ;

        2. Binary storage and conversion of floats
        http://blog.csdn.net/rayxp/article/details/40855665 If you
        have any questions, please leave a message in the comment area to exchange and learn together. The above are just some immature summaries and assumptions of a freshman. 

Guess you like

Origin blog.csdn.net/qq_39586345/article/details/79580948