编程 Python

Python 爬虫多线程详解及实例代码

Posted in Python onOctober 08, 2016

python是支持多线程的，主要是通过thread和threading这两个模块来实现的。thread模块是比较底层的模块，threading模块是对thread做了一些包装的，可以更加方便的使用。

虽然python的多线程受GIL限制，并不是真正的多线程，但是对于I/O密集型计算还是能明显提高效率，比如说爬虫。
下面用一个实例来验证多线程的效率。代码只涉及页面获取，并没有解析出来。

# -*-coding:utf-8 -*-
import urllib2, time
import threading

class MyThread(threading.Thread):
 def __init__(self, func, args):
  threading.Thread.__init__(self)
  self.args = args
  self.func = func

 def run(self):
  apply(self.func, self.args)

def open_url(url):
 request = urllib2.Request(url)
 html = urllib2.urlopen(request).read()
 print len(html)
 return html

if __name__ == '__main__':
 # 构造url列表
 urlList = []
 for p in range(1, 10):
  urlList.append('http://s.wanfangdata.com.cn/Paper.aspx?q=%E5%8C%BB%E5%AD%A6&p=' + str(p))

# 一般方式
 n_start = time.time()
 for each in urlList:
  open_url(each)
 n_end = time.time()
 print 'the normal way take %s s' % (n_end-n_start)

# 多线程
 t_start = time.time()
 threadList = [MyThread(open_url, (url,)) for url in urlList]
 for t in threadList:
  t.setDaemon(True)
  t.start()
 for i in threadList:
  i.join()
 t_end = time.time()
 print 'the thread way take %s s' % (t_end-t_start)

分别用两种方式获取10个访问速度比较慢的网页，一般方式耗时50s，多线程耗时10s。
多线程代码解读:

# 创建线程类，继承Thread类
class MyThread(threading.Thread):
 def __init__(self, func, args):
  threading.Thread.__init__(self) # 调用父类的构造函数
  self.args = args
  self.func = func

 def run(self): # 线程活动方法
  apply(self.func, self.args)

threadList = [MyThread(open_url, (url,)) for url in urlList] # 调用线程类创建新线程，返回线程列表
 for t in threadList:
  t.setDaemon(True) # 设置守护线程，父线程会等待子线程执行完后再退出
  t.start() # 线程开启
 for i in threadList:
  i.join() # 等待线程终止，等子线程执行完后再执行父线程

以上就是本文的全部内容，希望对大家的学习有所帮助。

Python 爬虫多线程详解及实例代码

- Author -

lqh

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

对Python的多进程锁的使用方法详解

Feb 18 Python

Python面向对象实现一个对象调用另一个对象操作示例

Apr 08 Python

Python模拟FTP文件服务器的操作方法

Feb 18 Python

python numpy--数组的组合和分割实例

Feb 24 Python

Python通过2种方法输出带颜色字体

Mar 02 Python

python 3.8.3 安装配置图文教程

May 21 Python

通俗易懂了解Python装饰器原理

Sep 17 Python

Python3如何使用range函数替代xrange函数

Oct 05 Python

教你漂亮打印Pandas DataFrames和Series

May 29 Python

如何在pycharm中快捷安装pip命令(如pygame)

May 31 Python

Python实现简单的猜单词

Jun 15 Python

Python pygame实现中国象棋单机版源码

Jun 20 Python

python字符串，数值计算

Oct 05 #Python

python制作企业邮箱的爆破脚本

Oct 05 #Python

python爬取NUS-WIDE数据库图片

Oct 05 #Python

python2.7的编码问题与解决方法

Oct 04 #Python

Python Sqlite3以字典形式返回查询结果的实现方法

Oct 03 #Python

Python实现屏幕截图的代码及函数详解

Oct 01 #Python

Python爬取APP下载链接的实现方法

Sep 30 #Python

You might like

Laravel 5.3 学习笔记之配置

2016/08/28 PHP

php+jQuery递归调用POST循环请求示例

2016/10/14 PHP

PHP验证码类ValidateCode解析

2017/01/07 PHP

PHP7新特性

2021/03/09 PHP

从javascript语言本身谈项目实战

2006/12/27 Javascript

JS实现定时页面弹出类似QQ新闻的提示框

2013/11/07 Javascript

Jquery取得iframe下内容的方法

2013/11/18 Javascript

鼠标悬浮显示二级菜单效果的jquery实现

2014/10/29 Javascript

javascript中数组array及string的方法总结

2014/11/28 Javascript

浅谈JavaScript函数节流

2014/12/09 Javascript

深入理解JavaScript系列（35）：设计模式之迭代器模式详解

2015/03/03 Javascript

jquery实现可自动判断位置的弹出层效果代码

2015/10/12 Javascript

jquery对复选框(checkbox)的操作汇总

2016/01/13 Javascript

类似于QQ的右滑删除效果的实现方法

2016/10/16 Javascript

nodejs中各种加密算法的实现详解

2019/07/11 NodeJs

微信小程序表单验证插件WxValidate的二次封装功能（终极版）

2019/09/03 Javascript

vue实现将数据存入vuex中以及从vuex中取出数据

2019/11/08 Javascript

node.js 基于 STMP 协议和 EWS 协议发送邮件

2021/02/14 Javascript

[46:58]完美世界DOTA2联赛PWL S3 Forest vs LBZS 第一场 12.17

2020/12/19 DOTA

python 根据正则表达式提取指定的内容实例详解

2016/12/04 Python

详解Python中表达式i += x与i = i + x是否等价

2017/02/08 Python

Python 经典面试题 21 道【不可错过】

2018/09/21 Python

python如何给字典的键对应的值为字典项的字典赋值

2019/07/05 Python

python爬取Ajax动态加载网页过程解析

2019/09/05 Python

在OpenCV里使用Camshift算法的实现

2019/11/22 Python

Python基于类路径字符串获取静态属性

2020/03/12 Python

pip install命令安装扩展库整理

2021/03/02 Python

奇怪的鱼：Weird Fish

2018/03/18 全球购物

会议开场欢迎词

2014/01/15 职场文书

护士毕业自我鉴定

2014/02/07 职场文书

颂军魂爱军营演讲稿

2014/09/13 职场文书

什么是检讨书？检讨书的格式及范文

2019/11/05 职场文书

go 原生http web 服务跨域restful api的写法介绍

2021/04/27 Golang

MySQL 覆盖索引的优点

2021/05/19 MySQL

浅谈Python响应式类库RxPy

2021/06/14 Python

万能密码的SQL注入漏洞其PHP环境搭建及防御手段

2021/09/04 SQL Server