GBK,Unicode,UTF-8编码的相互转换

说明:
1.在python2默认编码是ASCII, python3里默认是unicode。

2.unicode 分为 utf-32(4个字节),utf-16(2个字节),utf-8(1-4个字节), utf-16是现在最常用的unicode版本, 不过在文件里存的还是utf-8,因为utf-8省空间。

3.在py3中encode,在转码的同时还会把string 变成bytes类型,decode在解码的同时还会把bytes变回string。

GBK,UTF-8编码转换思路:
以Unicode为桥梁进行转换(见文末流程图)

示例代码:

# In Python2
msg = "GBK,UTF-8编码的转换"
msg_gb2312 = msg.decode("utf-8").encode("gb2312")
gb2312_to_gbk = msg_gb2312.decode("gbk").encode("gbk")

print(msg)
print(msg_gb2312)
print(gb2312_to_gbk)
# In Python3
msg = "GBK,UTF-8编码的转换"
# msg_gb2312 = msg.decode("utf-8").encode("gb2312")
msg_gb2312 = msg.encode("gb2312") # 默认就是unicode,不用再decode
gb2312_to_unicode = msg_gb2312.decode("gb2312")
gb2312_to_utf8 = msg_gb2312.decode("gb2312").encode("utf-8")

print(msg)
print(msg_gb2312)
print(gb2312_to_unicode)
print(gb2312_to_utf8)

trantes
推荐阅读:ASCII、GB2312、GBK、GB18030、Unicode、UTF-8、BIG5 编码详解(全网最全)

本文参考于:https://www.cnblogs.com/alex3714/articles/5717620.html

发布了15 篇原创文章 · 获赞 4 · 访问量 824

猜你喜欢

转载自blog.csdn.net/qq_41320433/article/details/104322948