1. Use Python’s built-in ord()
The ord() function converts characters to Unicode encoding, and then determines whether its range is within the range of Chinese characters:
Sample code:
def is_chinese(char):
if '\u4e00' <= char <= '\u9fff':
return True
else:
return False
2. Use Python’s built-in unicodedata library:
Using Python's built-in unicodedata library can be used to determine whether a character is a Chinese
character. Sample code:
import unicodedata
def is_chinese(char):
if 'CJK' in unicodedata.name(char):
return True
else:
return False
3. Use regular expressions
You can use regular expressions to determine whether a character is a Chinese character. For example, use [^\u4e00-\u9fa5] to match all non-Chinese characters, and [^\x00-\xff] to match all double-byte characters, including Chinese characters and symbols.
Sample code:
import re
# 判断字符是否为汉字
def is_chinese(word):
pattern = re.compile(r'[^\u4e00-\u9fa5]')
if pattern.search(word):
return False
else:
return True
4. Use Chinese character set
You can use the Chinese character set to determine whether a character is a Chinese character. For example, use the GB2312 character set or GBK character set to encode each Chinese character into a double-byte character and determine whether a character is in this character set.
Sample code:
# 判断字符是否为汉字
def is_chinese(word):
if b'\xb0\xa1' <= word.encode('gb2312') <= b'\xd7\xf9':
return True
else:
return False
5. Use third-party libraries
You can also use some third-party libraries to determine whether a character is a Chinese character. For example, the xpinyin library can convert a string into Pinyin and determine whether the string is a Chinese character.
Sample code:
from xpinyin import Pinyin
# 判断字符是否为汉字
def is_chinese(word):
pinyin = Pinyin()
if pinyin.get_pinyin(word, '').isalpha():
return False
else:
return True