History: Ascll~Unicode~UTF-8
For the encoding of a single character, Python provides a ord()
function to obtain the integer representation of the character, and the chr()
function converts the encoding to the corresponding character:
>>> ord('A')
str
Pass methods in Unicode encode()
can be encoded as specified bytes
, for example:
>>> 'ABC'.encode('ascii')
b'ABC'
>>> '中文'.encode('utf-8')
b'\xe4\xb8\xad\xe6\x96\x87'
>>> ord('中') 20013 >>> chr(66) 'B' >>> chr(25991) '文'
If you know the integer encoding of the character, you can also write it in hexadecimal like this str
:
>>> '\u4e2d\u6587'
'中文'
Python uses prefixed single or double quotes bytes
for data of types :b
x = b'ABC'
Pay attention to distinguish between 'ABC'
and b'ABC'
, the former is str
that although the content of the latter is displayed the same as the former, bytes
each character of the latter occupies only one byte.
str
Pass methods in Unicode encode()
can be encoded as specified bytes
, for example:
>>> 'ABC'.encode('ascii')
b'ABC'
>>> '中文'.encode('utf-8')
b'\xe4\xb8\xad\xe6\x96\x87'