About the base

Today, when summarizing character sets and encoding methods, it involves hexadecimal, here is a summary

 

The hexadecimal is also the carry system, which is a method of carry that people stipulate. For any kind of base --- X base, it means that the number operation at a certain position is carried by one digit every X. Decimal means every ten and one, hexadecimal means every sixteen, and binary means every two and one.

 

String and Encoding

    Computers can only process numbers. If you want to process text, you must convert the text to numbers before you can process it. The earliest computers were designed with 8 bits as a byte, so the largest integer that a byte can represent is 255 (binary 11111111=decimal 255). If you want to represent a larger integer, more bytes must be used. For example, the largest integer that can be represented by two bytes is 65535, and the largest integer that can be represented by 4 bytes is 4294967295.

 

【expand】

①A byte is 8 bits

(Because the earliest computer was designed with 8 bits as a byte), there are two cases of 1 and 0 for each bit, which can be obtained according to the calculation of the permutation and combination.

Specifically: for each bit in two cases, 8 bits are 2 to the 8th power = 256 kinds, that is, 256 numbers from 0 to 255

 

②Why is the maximum number that a byte can store is 255?

Byte is the basic unit of memory, with a length of 8 bits. The data in computer memory is stored in binary, and each bit is either 0 or 1, so when all 8 binary bits are 1, the number represented is the largest. 11111111 in binary, converted to decimal is 255.

The data stored in the storage unit is related to the data type of the storage unit. Generally, the data stored in one byte is a positive integer and 0. At this time, the maximum number that can be stored in one byte is 255. If a byte stores character data, it can also be represented by a positive integer and 0. For example, ASCII code uses numbers from 0 to 255 to represent 256 characters. At this time, it is reasonable to say that the maximum number stored is 255.

 

③What are the largest and smallest decimal numbers that a byte can store, and what are their binary forms?

1 byte 8 bits, 

If glyph data, the representation range is -128~127: 10000000~01111111 [binary] 

If it represents unsigned data, the representation range: 0~255: 00000000~11111111[binary]

Decimal is 0~255; corresponding binary is 00000000~11111111; corresponding hexadecimal is 0x00~0xff 

 

④The difference between ASCII encoding and Unicode encoding: ASCII encoding is 1 byte, while Unicode encoding is usually 2 bytes

      for example:

If the ASCII-encoded A is encoded in Unicode, you only need to add 0 in front.

Therefore, the Unicode encoding of A is 00000000 01000001

      A new problem appeared again:

If unified into Unicode encoding, the problem of garbled characters has disappeared since then. However, if the text you write is basically all in English, using Unicode encoding requires twice as much storage space as ASCII encoding, which is very uneconomical in terms of storage and transmission. 

      UTF-8 encoding:

In the spirit of saving, UTF-8 encoding, which converts Unicode encoding into "variable-length encoding", appeared again. UTF-8 encoding encodes a Unicode character into 1-6 bytes according to different number sizes, commonly used English letters are encoded into 1 byte, Chinese characters are usually 3 bytes, and only very rare characters will be encoded. Encoded into 4-6 bytes. If the text you want to transfer contains a lot of English characters, encoding in UTF-8 can save space:

Character ASCII Unicode UTF-8
A 01000001 00000000 01000001 01000001
middle x 01001110 00101101 11100100 10111000 10101101

   It can also be found from the above table that UTF-8 encoding has an additional benefit, that is, ASCII encoding can actually be regarded as a part of UTF-8 encoding, so a large number of historical legacy software that only supports ASCII encoding can be used in UTF- 8 code to continue working.

   After figuring out the relationship between ASCII, Unicode and UTF-8, we can summarize the working methods of character encoding commonly used in computer systems:

In the computer memory, the Unicode encoding is uniformly used, and when it needs to be saved to the hard disk or needs to be transmitted, it is converted to UTF-8 encoding.

When editing with Notepad, the UTF-8 characters read from the file are converted to Unicode characters into the memory. After editing, the Unicode is converted to UTF-8 and saved to the file when saving.

 

 

 

 

 

 

.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326227117&siteId=291194637