编程 Python

python 利用百度API识别图片文字（多线程版）

Posted in Python onDecember 14, 2020

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jun 12 09:37:38 2018
利用百度api实现图片文本识别
@author: XnCSD
"""

import glob
from os import path
import os
from aip import AipOcr
from PIL import Image
from queue import Queue
import threading
import datetime

def convertimg(picfile, outdir):
  '''调整图片大小，对于过大的图片进行压缩
  picfile:  图片路径
  outdir：  图片输出路径
  '''
  img = Image.open(picfile)
  width, height = img.size
  while (width * height > 4000000): # 该数值压缩后的图片大约 两百多k
    width = width // 2
    height = height // 2
  new_img = img.resize((width, height), Image.BILINEAR)
  new_img.save(path.join(outdir, os.path.basename(picfile)))


def baiduOCR(ts_queue):
  """利用百度api识别文本，并保存提取的文字
  picfile:  图片文件名
  outfile:  输出文件
  """
  while not ts_queue.empty():
    picfile = ts_queue.get()
    filename = path.basename(picfile)
    outfile = 'D:\Study\pythonProject\scrapy\IpProxy\port_zidian.txt'
    APP_ID = '' # 刚才获取的 ID，下同
    API_KEY = ''
    SECRECT_KEY = ''
    client = AipOcr(APP_ID, API_KEY, SECRECT_KEY)

    i = open(picfile, 'rb')
    img = i.read()
    print("正在识别图片：\t" + filename)
    message = client.basicGeneral(img) # 通用文字识别，每天 50 000 次免费
    # message = client.basicAccurate(img)  # 通用文字高精度识别，每天 800 次免费
    #print("识别成功！")
    i.close()
    try:
      filename1 = filename.split('.')[0]
      filename1 = ''.join(filename1)
      with open(outfile, 'a+') as fo:
        for text in message.get('words_result'):
          fo.writelines('\'' + filename1 + '\'' + ':' + text.get('words') + ',')
          fo.writelines('\n')
        # fo.writelines("+" * 60 + '\n')
        # fo.writelines("识别图片：\t" + filename + "\n" * 2)
        # fo.writelines("文本内容：\n")
        # # 输出文本内容
        # for text in message.get('words_result'):
        #   fo.writelines(text.get('words') + '\n')
        # fo.writelines('\n' * 2)
      os.remove(filename)
      print("识别成功！")
    except:
      print('识别失败')



    print("文本导出成功！")
    print()
def duqu_tupian(dir):
  ts_queue = Queue(10000)

  outdir = dir
  # if path.exists(outfile):
  #   os.remove(outfile)
  if not path.exists(outdir):
    os.mkdir(outdir)
  print("压缩过大的图片...")
  # 首先对过大的图片进行压缩，以提高识别速度，将压缩的图片保存与临时文件夹中
  try:
    for picfile in glob.glob(r"D:\Study\pythonProject\scrapy\IpProxy\端口\*"):
      convertimg(picfile, outdir)
    print("图片识别...")
    for picfile in glob.glob("tmp/*"):
      ts_queue.put(picfile)
      #baiduOCR(picfile, outfile)
      #os.remove(picfile)
    print('图片文本提取结束！文本输出结果位于文件中。' )
    #os.removedirs(outdir)
    return ts_queue
  except:
    print('失败')

if __name__ == "__main__":

  start = datetime.datetime.now().replace(microsecond=0)
  t = 'tmp'
  s = duqu_tupian(t)
  threads = []
  for i in range(100):
    t = threading.Thread(target=baiduOCR, name='th-' + str(i), kwargs={'ts_queue': s})
    threads.append(t)
  for t in threads:
    t.start()
  for t in threads:
    t.join()
  end = datetime.datetime.now().replace(microsecond=0)
  print('删除耗时：' + str(end - start))

速度快，准确率99百分，100里必回出错一张。

实测，识别1500张图片，还是小图片验证码大小，高清，用时30秒，不能识别150张，出错14张左右。但总体快，不会出现乱码啥的。

以上就是python 利用百度API识别图片文字（多线程版）的详细内容，更多关于python 识别图片文字的资料请关注三水点靠木其它相关文章！

python 利用百度API识别图片文字（多线程版）

- Author -

凹凸曼大人

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

python利用beautifulSoup实现爬虫

Sep 29 Python

理解Python中函数的参数

Apr 27 Python

Python比较2个时间大小的实现方法

Apr 10 Python

Python学习笔记之图片人脸检测识别实例教程

Mar 06 Python

详解python校验SQL脚本命名规则

Mar 22 Python

pytorch 共享参数的示例

Aug 17 Python

解决Django删除migrations文件夹中的文件后出现的异常问题

Aug 31 Python

django框架cookie和session用法实例详解

Dec 10 Python

Django密码存储策略分析

Jan 09 Python

Python warning警告出现的原因及忽略方法

Jan 31 Python

Django实现将一个字典传到前端显示出来

Apr 03 Python

python开根号实例讲解

Aug 30 Python

python3中for循环踩过的坑记录

Dec 14 #Python

Python 数据分析之逐块读取文本的实现

Dec 14 #Python

Python 2.6.6升级到Python2.7.15的详细步骤

Dec 14 #Python

python 通过pip freeze、dowload打离线包及自动安装的过程详解(适用于保密的离线环境

Dec 14 #Python

Pandas中DataFrame交换列顺序的方法实现

Dec 14 #Python

python中time、datetime模块的使用

Dec 14 #Python

全面介绍python中很常用的单元测试框架unitest

Dec 14 #Python

You might like

php xml 入门学习资料

2011/01/01 PHP

一个经典的PHP文件上传类分享

2014/11/18 PHP

深入了解PHP中的Array数组和foreach

2016/11/06 PHP

yii2 数据库读写分离配置示例

2017/02/10 PHP

PHP htmlspecialchars() 函数实例代码及用法大全

2018/09/18 PHP

PHP观察者模式定义与用法实例分析

2019/03/22 PHP

制作特殊字的脚本

2006/06/26 Javascript

jQuery Mobile 导航栏代码

2013/11/01 Javascript

js定时调用方法成功后并停止调用示例

2014/04/08 Javascript

JavaScript中的this到底是什么(一)

2015/12/09 Javascript

JS实现获取当前URL和来源URL的方法

2016/08/24 Javascript

基于百度地图实现产品销售的单位位置查看功能设计与实现

2016/10/21 Javascript

jQuery解析返回的xml和json方法详解

2017/01/05 Javascript

浅谈Webpack打包优化技巧

2018/06/12 Javascript

新手必须知的Node.js 4个JavaScript基本概念

2018/09/16 Javascript

ES6知识点整理之Proxy的应用实例详解

2019/04/16 Javascript

JS实现小星星特效

2019/12/24 Javascript

jQuery实现图片切换效果

2020/10/19 jQuery

[49:42]DOTA2上海特级锦标赛主赛事日 - 3 胜者组第二轮#2Secret VS EG第一局

2016/03/04 DOTA

python多线程扫描端口示例

2014/01/16 Python

Python标准库urllib2的一些使用细节总结

2015/03/16 Python

Python实现堡垒机模式下远程命令执行操作示例

2019/05/09 Python

Python如何实现转换URL详解

2019/07/02 Python

Python Lambda函数使用总结详解

2019/12/11 Python

python 实现图片上传接口开发并生成可以访问的图片url

2019/12/18 Python

俄罗斯首家面向中国消费者的一站式购物网站：Wruru

2020/05/08 全球购物

商务日语专业毕业生求职信

2013/10/26 职场文书

母亲节演讲稿范文

2014/01/02 职场文书

残疾人创业典型事迹

2014/02/01 职场文书

后备干部考察材料

2014/02/12 职场文书

教师党员承诺书

2014/03/25 职场文书

2015年班级元旦晚会活动总结

2014/11/28 职场文书

党的群众路线教育实践活动先进个人材料

2014/12/24 职场文书

幼师辞职信范文

2015/02/27 职场文书

大学生英文求职信范文

2015/03/19 职场文书

宝宝满月祝酒词

2015/08/10 职场文书