bytes, str, unicode

编写Python代码的时候,一定要把编码和解码操作放在最外围来做,程序的核心部分应该使用Unicode字符类型,(Python3中的str, Python2中的unicode),不要对字符编码做任何假设.程序输出的文本信息最好采用一种格式(UTF-8).

Python3中两种字符串表示类型: bytes 和 str.

bytes : 实例为原始的8位值(每个字节有8个二进制(位)字符表示)
str : Unicode字符

Python2中两种字符串表示类型: str 和 unicode.

str : 原始的8位值
unicode: Unicode字符

Unicode字符表示为二进制数据,常用的编码方式为utf-8,Python3中的str实例和Python2中的unicode实例,虽然没有关联固定的编码格式,但是推荐UTF-8.

Unicode ——> 二进制, encode方法.
二进制 ——> Unicode, decode方法.

Python3

接收 bytes or str , 返回 str

def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value

接收bytes or str,返回 bytes

def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str
    else:
        value = bytes_or_str.encode('utf-8')
    return value

注意，Python3

python3中，open()获得的文件句柄(标识符好哦指针，文件描述符），该句柄会默认一UTF-8的编码格式来操作文件．

Python3中的open()新增了encoding的参数，默认值为＇utf-8’,
Python2中，open()默认是二进制编码格式操作文件的．
例如,向bin文件中写入一些数据,Python３会出错，python2正常

with open('random.bin', 'w') as f:
    f.write(os.urandom(10))

对于这个问题，建议使用二进制编码格式进行读写，＇rb’,’wb’,
在Python3中str和unicode两种编码格式的字符串，是不能比较的

Python2

接收str or unicode,返回str

def to_str(unicode_or_str):
    if isinstance(unicode_or_str, unicode):
        value = unicode_or_str.encode('utf-8')
    else:
        value = unicode_or_str
    return value

接收str or unicode,返回unicode

def to_unicode(unicode_or_str):
    if isinstance(unicode_or_str, str):
        vlaue = unicode_or_str.decode('utf-8')
    else:
        vlaue = unicode_or_str
    return value

注意,Python2

如果str只有7位的ASCII字符,那么str和unicode几乎是没有区别的(仅在Python2中).

+ 可以连接str 和 unicode 字符
== / != 可以比较str 和 unicode字符
%s表示unicode实例

Python2已经要凉了，Python3才是未来

Effective Python: bytes, str, unicode

bytes, str, unicode

Python3

Python2

猜你喜欢