ubuntu 安装 pytesseract 模块进行图片内容识别（ubuntu安装python模块）

25-05-01 2

以上就是给各位分享ubuntu安装pytesseract模块进行图片内容识别，其中也会对ubuntu安装python模块进行解释，同时本文还将给你拓展centos下使用pytesseract识别文字、

以上就是给各位分享ubuntu 安装 pytesseract 模块进行图片内容识别，其中也会对ubuntu安装python模块进行解释，同时本文还将给你拓展centos 下使用 pytesseract 识别文字、image_to_string() 转换中的 Tesseract 错误：ytesseract.pytesseract.TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')、install nginx on ubuntu install ubuntu usb install ubuntu 14.04 ubuntu install jd、OpenCV Pytesseract "cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)" 请给我答案等相关知识，如果能碰巧解决你现在面临的问题，别忘了关注本站，现在开始吧！

本文目录一览：

ubuntu 安装 pytesseract 模块进行图片内容识别（ubuntu安装python模块）
centos 下使用 pytesseract 识别文字
image_to_string() 转换中的 Tesseract 错误：ytesseract.pytesseract.TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')
install nginx on ubuntu install ubuntu usb install ubuntu 14.04 ubuntu install jd
OpenCV Pytesseract "cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)" 请给我答案

ubuntu 安装 pytesseract 模块进行图片内容识别（ubuntu安装python模块）

主要是实现图片内容的离线识别，python 提供了一个库完成此功能。

一. 安装 tesseract-ocr 包

sudo apt-get install tesseract-ocr

二. 安装 PIL PIL(python imaging library)是python中的图像处理库

 sudo apt-get install python-imaging

三. 安装 PyTesseract

pip install PyTesseract

四.代码测试

# -*- coding: UTF-8 -*-
from PIL import Image
import PyTesseract
# 识别中文
text = PyTesseract.image_to_string(Image.open('chinese.png'),lang='chi_sim')
print text

# 识别英文
text = PyTesseract.image_to_string(Image.open('english.png'))
print text

五.要想识别的中文需要添加中文字库

需要在ubuntu 系统中找到 tessdata 文件夹把中文字库放进去

也可以在线安装中文字库

sudo apt-get install tesseract-ocr-chi-sim

六.此模块还支持命令行识别

使用命令：
识别英文：
tesseract e.png 1   #1 是存储获取内容的文件，会在本地生成一个1文件
识别中文
tesseract --help  # 查看帮助
tesseract --list -langs  # 查看是否安装了中文库chi_sim
tesseract -l chi_sim c.png 1 # 1也是结果的文件把识别的结果存到此文件中

centos 下使用 pytesseract 识别文字

偶发一个想法搭一个验证码识别工具，网上查了一下有Tesseract 这个工具可以识别，所以有了后面一小时的搭建过程

ps：Ubuntu 下似乎可以直接用包管理工具来安装，我使用的源码编译安装

前提

由于自己电脑是工作用的，所以一些常用编译工具齐全，不这里介绍, 另外最好使用root 来编译
tesseract 依赖 leptonica，而安装leptonica前前先安装常用图片库，因为leptonica其实是对那些常用库进行了封装，如果编译时没有找到这个库，后面使用的时候就不会支持了

yum install libtiff-devel libjpeg-devel libpng-devel -y

安装 leptonica: 上github 上下载源码后

 ./autogen.sh 
./configure --prefix=/usr/local
make -j2   # 如果更多核可以并发编译速度快

编译leptonica 后，再编译tesseract 同样下载源码后执行三个命令

 ./autogen.sh 
./configure --prefix=/usr/local
make -j2   # 如果更多核可以并发编译速度快

其中make 可能会报错，直接删除 aclocal.m4，重新执行 ./autogen.sh

libtool: Version mismatch error.  This is libtool 2.4.6, but the
libtool: definition of this LT_INIT comes from libtool 2.4.2.
libtool: You should recreate aclocal.m4 with macros from libtool 2.4.6
libtool: and run autoconf again.

下载训练数据,可以直接在github 上下载, 保存在 =/usr/local/share/tessdata 下面

https://github.com/tesseract-ocr/tessdata

chi_sim.traineddata  中文
eng.traineddata      英文
enm.traineddata      数字

然后添加环境变量 : 添加 export TESSDATA_PREFIX=/usr/local/share/tessdata 到 /etc/bashrc

使用

安装好后可以直接使用tesseract命令

tesseract cde.png result -l chi_sim

但是我自己测试的很多问题，识别不出来，但是使用python 可以

安装python 库 pip install pytesseract 很简单可使用了

Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pytesseract
>>> from PIL import Image
>>> image = Image.open("abc.png")
>>> text = pytesseract.image_to_string(image,lang=''chi_sim'')
>>> print(text)
Bai暨匡''
『 百 度
>>>

这里识别的是百度首页logo

image_to_string() 转换中的 Tesseract 错误：ytesseract.pytesseract.TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')

如何解决image_to_string() 转换中的 Tesseract 错误：ytesseract.pytesseract.TesseractError: (2, ''Usage: pytesseract [-l lang] input_file'')

请注意：我知道有很多关于 Tesseract 的帖子。我还没有找到不会产生错误的有效解决方案。

我正在尝试使用 Tesseract 在图像上简单地使用 OCR。我在各种论坛上尝试了许多解决方案，但都没有成功。我已将 pdf 转换为图像并保存了该图像。然后我使用 cv2 调用了这个图像。我也正要展示图像。现在，我正在尝试应用来自 Tesseract 的 image_to_string() 命令。

我尝试调整 PyTesseract.PyTesseract.tesseract_cmd 并确保安装了包装器和真正的 tesseract 包。代码如下：

from wand.image import Image
import cv2
import PyTesseract
PyTesseract.PyTesseract.tesseract_cmd = r''C:/Users/Afton/anaconda3/Scripts/PyTesseract.exe''
# Convert from pdf and save as image
pdf = ''C:/path/example.pdf''
outputFilename = ''C:/path/example.jpg''
with Image(filename=pdf) as img:
    img.save(filename=outputFilename)
# Read image
imagePath = outputFilename
image = cv2.imread(imagePath)    
# Configure OCR with PyTesseract
config = r''-l deu --oem 1 --psm 3''
text = PyTesseract.image_to_string(image,config=config)
# Print text output
text = text.split(''\\n'')
print(text)

这是当前的错误：

PyTesseract.PyTesseract.TesseractError: (2,''Usage: PyTesseract [-l lang] input_file'')

之前，该错误与 PyTesseract.PyTesseract.tesseract_cmd 输入有关。

感谢任何帮助。

更新：图片为德语。我试图在配置中澄清这一点。

Update2：我尝试了来自 this resource 的替代路径（使用我的文件位置）

PyTesseract.PyTesseract.tesseract_cmd = r''C:/Program Files/Tesseract-OCR/tesseract.exe''

我现在收到此错误：

PyTesseract.PyTesseract.TesseractError: (1,''Error opening data file C:\\\\Program Files\\\\Tesseract-OCR/tessdata/deu.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \\''deu\\'' Tesseract Couldn\\''t load any languages! Could not initialize tesseract.'')

其他遇到此问题的人请注意：从 https://github.com/tesseract-ocr/tessdata 下载语言包，因为我正在阅读德语文档。所有语言文件都可以在这里找到。问题在于语言的多样性。

解决方法

这一行是错误的：

pytesseract.pytesseract.tesseract_cmd = r''C:/Users/Afton/anaconda3/Scripts/pytesseract.exe''

请阅读pytesseract documentation.

install nginx on ubuntu install ubuntu usb install ubuntu 14.04 ubuntu install jd

OpenCV Pytesseract

OpenCV Pytesseract "cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)" 请给我答案

如何解决OpenCV Pytesseract "cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)" 请给我答案

大家好，我正在尝试制作一个应用程序，但我不断收到此错误

"回溯（最近一次调用最后一次）：文件“C:\\Users\\ratho\\AppData\\Local\\Programs\\Python\\python39\\lib\\site-packages\\PyTesseract\\PyTesseract.py”，第 255 行，在 run_tesseract proc = subprocess.Popen(cmd_args,**subprocess_args()) init 中的文件“C:\\Users\\ratho\\AppData\\Local\\Programs\\Python\\python39\\lib\\subprocess.py”，第 951 行 self._execute_child（参数，可执行文件，preexec_fn，close_fds，文件“C:\\Users\\ratho\\AppData\\Local\\Programs\\Python\\python39\\lib\\subprocess.py”，第 1420 行，在 _execute_child hp,ht,pid,tid = _winapi.CreateProcess(executable,args,FileNotFoundError: [WinError 2] 系统找不到指定的文件

在处理上述异常的过程中，又发生了一个异常：

回溯（最近一次调用最后一次）：文件“c:\\Users\\ratho\\Documents\\Programs\\Ml\\Dyslexia\\test.py”，第 58 行，在文本 = PyTesseract.image_to_string（裁剪）文件“C:\\Users\\ratho\\AppData\\Local\\Programs\\Python\\python39\\lib\\site-packages\\PyTesseract\\PyTesseract.py”，第 409 行，在 image_to_string 中返回 { 文件“C:\\Users\\ratho\\AppData\\Local\\Programs\\Python\\python39\\lib\\site-packages\\PyTesseract\\PyTesseract.py”，第 412 行，在 Output.STRING: lambda: run_and_get_output(*args),文件“C:\\Users\\ratho\\AppData\\Local\\Programs\\Python\\python39\\lib\\site-packages\\PyTesseract\\PyTesseract.py”，第 287 行，在 run_and_get_output 中 run_tesseract(**kwargs) 文件“C:\\Users\\ratho\\AppData\\Local\\Programs\\Python\\python39\\lib\\site-packages\\PyTesseract\\PyTesseract.py”，第 259 行，在 run_tesseract 引发 TesseractNotFoundError() PyTesseract.PyTesseract.TesseractNotFoundError: System_path_to_tesseract.exe 未安装或不在您的 PATH 中。有关详细信息，请参阅自述文件。”

这是我的代码： "

import cv2
import PyTesseract

PyTesseract.PyTesseract.tesseract_cmd = ''System_path_to_tesseract.exe''
  

img = cv2.imread("image.jpg")
  

  

gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
  

ret,thresh1 = cv2.threshold(gray,255,cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)
  
rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(18,18))
  
dilation = cv2.dilate(thresh1,rect_kernel,iterations = 1)
  
contours,hierarchy = cv2.findContours(dilation,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
  
im2 = img.copy()
  
file = open("recognized.txt","w+")
file.write("")
file.close()
  
for cnt in contours:
    x,y,w,h = cv2.boundingRect(cnt)
      
    rect = cv2.rectangle(im2,(x,y),(x + w,y + h),(0,0),2)
      
    cropped = im2[y:y + h,x:x + w]
      
    file = open("recognized.txt","a")
      
    text = PyTesseract.image_to_string(cropped)
      
    file.write(text)
    file.write("\\n")
      
    file.close"

关于ubuntu 安装 pytesseract 模块进行图片内容识别和ubuntu安装python模块的问题就给大家分享到这里，感谢你花时间阅读本站内容，更多关于centos 下使用 pytesseract 识别文字、image_to_string() 转换中的 Tesseract 错误：ytesseract.pytesseract.TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')、install nginx on ubuntu install ubuntu usb install ubuntu 14.04 ubuntu install jd、OpenCV Pytesseract "cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)" 请给我答案等相关知识的信息别忘了在本站进行查找喔。

本文标签：