问题描述
s = '\u4f60\u597d'
s = '\xe6\x97\xa9\xe4\xb8\x8a\xe5\xa5\xbd'
s = '\xc4\xe3\xba\xc3'
\u字符串
编码
import json
print(json.dumps('你好')) # "\u4f60\u597d"
解码,直接print
s = '\u4f60\u597d'
print(s) # 你好
print(str(s)) # 你好
print(repr(s)) # '你好'
\x字符串
编码
print('你好'.encode('utf-8')) # b'\xe4\xbd\xa0\xe5\xa5\xbd'
print('你好'.encode('gbk')) # b'\xc4\xe3\xba\xc3'
先 encode('raw_unicode_escape')
将 str 转 bytes,再解码
s = '\xe4\xbd\xa0\xe5\xa5\xbd'
print(s.encode('raw_unicode_escape').decode('utf-8')) # 你好
s = '\xc4\xe3\xba\xc3'
print(s.encode('raw_unicode_escape').decode('gbk')) # 你好
十六进制字符串
编码
import base64
print(base64.b16encode('你好'.encode())) # b'E4BDA0E5A5BD'
解码,调用 base64
import base64
s = 'E4BDA0E5A5BD'
print(base64.b16decode(s)) # b'\xe4\xbd\xa0\xe5\xa5\xbd'
print(base64.b16decode(s).decode()) # 你好
或用 binascii
import binascii
print(binascii.b2a_hex('你好'.encode())) # b'e4bda0e5a5bd'
print(binascii.a2b_hex('e4bda0e5a5bd').decode()) # 你好
检测类型
pip install chardet
调用 chardet.detect()
import chardet
s = '你好'.encode()
print(chardet.detect(s)) # {'encoding': 'utf-8', 'confidence': 0.7525, 'language': ''}
s = b'\u4f60\u597d'
print(chardet.detect(s)) # {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
s = b'\xe6\x97\xa9\xe4\xb8\x8a\xe5\xa5\xbd'
print(chardet.detect(s)) # {'encoding': 'utf-8', 'confidence': 0.87625, 'language': ''}
s = b'\xc4\xe3\xba\xc3'
print(chardet.detect(s)) # {'encoding': 'TIS-620', 'confidence': 0.3598212120361634, 'language': 'Thai'}
最后一个出错