Recently, several projects have encountered this problem in actual practice. After consulting a lot of information and blogs, the solution used at the beginning was:
demo = open(r"demo.txt", "r", encoding="utf-8")
soup = BeautifulSoup(demo.read(), 'html.parser')
html_data = soup.find('div', id="J_goodsList")
Use the most original written file and re-read the file
Later I found a simpler method:
text.replace('\xaf','')
Let pycharm replace it with empty characters when the output encounters changed characters to solve this problem
But yesterday I encountered an HTML original code that contained multiple such characters, which gave me a headache.
So I started trying to use capture:
try:
print(text.replace('\xaf',''))
except UnicodeEncodeError:
continue
Although the problem was solved, I found that a lot of things were lost, which was very uncomfortable. I checked more information and blogs this morning and found that I have the best of both worlds:
You can set it in the file encoding of pycharm, which will make you laugh at your speechless operation. That's it.