python 识别图片中的汉字

我们就识别上面的汉字。

安装软件tesseract和python库

https://www.cnblogs.com/sea-stream/p/10961580.html

然后新建一个文件夹test,把上面那张图片放在文件夹里面,再新建一个test文件

写入如下内容

#coding=utf-8
from PIL import Image
import pytesseract
#上面都是导包,只需要下面这一行就能实现图片文字识别
text=pytesseract.image_to_string(Image.open('xxx.png'),lang='chi_sim')
print(text)

目录如下:

运行可能会出现错误:

C:\Users\k\Desktop\test>python test.py
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    text=pytesseract.image_to_string(Image.open('xxx.png'),lang='chi_sim')
  File "C:\Users\k\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 309, in image_to_string
    }[output_type]()
  File "C:\Users\k\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 308, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "C:\Users\k\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 218, in run_and_get_output
    run_tesseract(**kwargs)
  File "C:\Users\k\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 194, in run_tesseract
    raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:\\Program Files (x86)\\Tesseract-OCR/tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language \'chi_sim\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

因为tesseract-ocr默认不支持中文识别。 将下载到的文件:chi_sim.traineddata 放到Tesseract-OCR安装目录 D:\Program Files (x86)\Tesseract-OCR\tessdata 下

链接:https://pan.baidu.com/s/1c-fveIYnm1sQHxX9WRpUZw
提取码:9ovq


再次运行

python test.py

下面是输出结果

C:\Users\k\Desktop\test>python test.py
风急天高猿啸衷′ 渚麦冒麦少丑弓飞口。
u边洛木萧萧下′ 不〖长江滚滚来。
万 悲禾火常作畜′ 年多病独登台。
艰难苦恨萦霜 渣倒新停澍酉木不=

参考:

https://www.cnblogs.com/lizhe860/p/8969171.html

https://blog.csdn.net/showgea/article/details/82656515

猜你喜欢

转载自www.cnblogs.com/sea-stream/p/10961744.html