编程 Python

Python3中使用urllib的方法详解(header,代理,超时,认证,异常处理)

Posted in Python onSeptember 21, 2016

我们可以利用urllib来抓取远程的数据进行保存哦，以下是python3 抓取网页资源的多种方法，有需要的可以参考借鉴。

1、最简单

import urllib.request
response = urllib.request.urlopen('http://python.org/')
html = response.read()

2、使用 Request

import urllib.request
req = urllib.request.Request('http://python.org/')
response = urllib.request.urlopen(req)
the_page = response.read()

3、发送数据

#! /usr/bin/env python3
import urllib.parse
import urllib.request
url = 'http://localhost/login.php'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {
'act' : 'login',
'login[email]' : 'yzhang@i9i8.com',
'login[password]' : '123456'
}
data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data)
req.add_header('Referer', 'http://www.python.org/')
response = urllib.request.urlopen(req)
the_page = response.read()
print(the_page.decode("utf8"))

4、发送数据和header

#! /usr/bin/env python3
import urllib.parse
import urllib.request
url = 'http://localhost/login.php'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {
'act' : 'login',
'login[email]' : 'yzhang@i9i8.com',
'login[password]' : '123456'
}
headers = { 'User-Agent' : user_agent }
data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data, headers)
response = urllib.request.urlopen(req)
the_page = response.read()
print(the_page.decode("utf8"))

5、http 错误

#! /usr/bin/env python3
import urllib.request
req = urllib.request.Request('https://3water.com ')
try:
urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
print(e.code)
print(e.read().decode("utf8"))

6、异常处理1

#! /usr/bin/env python3
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
req = Request("https://3water.com /")
try:
response = urlopen(req)
except HTTPError as e:
print('The server couldn't fulfill the request.')
print('Error code: ', e.code)
except URLError as e:
print('We failed to reach a server.')
print('Reason: ', e.reason)
else:
print("good!")
print(response.read().decode("utf8"))

7、异常处理2

#! /usr/bin/env python3
from urllib.request import Request, urlopen
from urllib.error import URLError
req = Request("https://3water.com /")
try:
response = urlopen(req)
except URLError as e:
if hasattr(e, 'reason'):
print('We failed to reach a server.')
print('Reason: ', e.reason)
elif hasattr(e, 'code'):
print('The server couldn't fulfill the request.')
print('Error code: ', e.code)
else:
print("good!")
print(response.read().decode("utf8"))

8、HTTP 认证

#! /usr/bin/env python3
import urllib.request
# create a password manager
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
# Add the username and password.
# If we knew the realm, we could use it instead of None.
top_level_url = "https://3water.com /"
password_mgr.add_password(None, top_level_url, 'rekfan', 'xxxxxx')
handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
# create "opener" (OpenerDirector instance)
opener = urllib.request.build_opener(handler)
# use the opener to fetch a URL
a_url = "https://3water.com /"
x = opener.open(a_url)
print(x.read())
# Install the opener.
# Now all calls to urllib.request.urlopen use our opener.
urllib.request.install_opener(opener)
a = urllib.request.urlopen(a_url).read().decode('utf8')
print(a)

9、使用代理

#! /usr/bin/env python3
import urllib.request
proxy_support = urllib.request.ProxyHandler({'sock5': 'localhost:1080'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)

a = urllib.request.urlopen("https://3water.com ").read().decode("utf8")
print(a)

10、超时

#! /usr/bin/env python3
import socket
import urllib.request
# timeout in seconds
timeout = 2
socket.setdefaulttimeout(timeout)
# this call to urllib.request.urlopen now uses the default timeout
# we have set in the socket module
req = urllib.request.Request('https://3water.com /')
a = urllib.request.urlopen(req).read()
print(a)

总结

以上就是这篇文章的全部内容，希望本文的内容对大家学习或使用python能有所帮助，如果有疑问大家可以留言交流。

Python3中使用urllib的方法详解(header,代理,超时,认证,异常处理)

- Author -

ifso

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python写的一个简单DNS服务器实例

Jun 04 Python

python抓取并保存html页面时乱码问题的解决方法

Jul 01 Python

Python3之简单搭建自带服务器的实例讲解

Jun 04 Python

使用TensorFlow实现二分类的方法示例

Feb 05 Python

Python3.5 Pandas模块之DataFrame用法实例分析

Apr 23 Python

在Django的View中使用asyncio的方法

Jul 12 Python

解决pycharm中导入自己写的.py函数出错问题

Feb 12 Python

Python 之 Json序列化嵌套类方式

Feb 27 Python

Python如何使用正则表达式爬取京东商品信息

Jun 01 Python

通过实例简单了解Python sys.argv[]使用方法

Aug 04 Python

python爬取天气数据的实例详解

Nov 20 Python

Python爬虫之Selenium设置元素等待的方法

Dec 04 Python

浅析Python中MySQLdb的事务处理功能

Sep 21 #Python

Python 爬虫学习笔记之多线程爬虫

Sep 21 #Python

Python 爬虫学习笔记之单线程爬虫

Sep 21 #Python

Python 爬虫学习笔记之正则表达式

Sep 21 #Python

Python简单实现安全开关文件的两种方式

Sep 19 #Python

Python打包可执行文件的方法详解

Sep 19 #Python

Python实现拷贝多个文件到同一目录的方法

Sep 19 #Python

You might like

怎样在UNIX系统下安装php3

2006/10/09 PHP

php下实现折线图效果的代码

2007/04/28 PHP

php tp验证表单与自动填充函数代码

2012/02/22 PHP

PHP判断文件是否存在、是否可读、目录是否存在的代码

2012/10/03 PHP

PHP中文竖排转换实现方法

2015/10/23 PHP

使用ThinkPHP的自动完成实现无限级分类实例详解

2016/09/02 PHP

laravel框架数据库操作、查询构建器、Eloquent ORM操作实例分析

2019/12/20 PHP

另类调用flash无须激活的方法

2006/12/27 Javascript

Extjs学习笔记之六面版

2010/01/08 Javascript

javascript 嵌套的函数（作用域链）

2010/03/15 Javascript

JavaScript日历实现代码

2010/09/12 Javascript

js原生态函数中使用jQuery中的 $(this)无效的解决方法

2011/05/25 Javascript

jQuery通过点击行来删除HTML表格行的实现示例

2014/09/10 Javascript

JavaScript和JQuery的鼠标mouse事件冒泡处理

2015/06/19 Javascript

JS建造者模式基本用法实例分析

2015/06/30 Javascript

jQuery事件绑定用法详解(附bind和live的区别)

2016/01/19 Javascript

js省市区级联查询（插件版&无插件版）

2017/03/21 Javascript

jquery+css实现简单的图片轮播效果

2017/08/07 jQuery

浅谈angular4.0中路由传递参数、获取参数最nice的写法

2018/03/12 Javascript

React中如何引入Angular组件详解

2018/08/09 Javascript

vue 实现在函数中触发路由跳转的示例

2018/09/01 Javascript

js设计模式之代理模式及订阅发布模式实例详解

2019/08/15 Javascript

浅谈vue中document.getElementById()拿到的是原值的问题

2020/07/26 Javascript

vue集成一个支持图片缩放拖拽的富文本编辑器

2021/01/29 Vue.js

在Python中使用HTMLParser解析HTML的教程

2015/04/29 Python

Python OpenCV之图片缩放的实现（cv2.resize）

2019/06/28 Python

python logging日志打印过程解析

2019/10/22 Python

PyCharm使用之配置SSH Interpreter的方法步骤

2019/12/26 Python

python 6种方法实现单例模式

2020/12/15 Python

python Scrapy框架原理解析

2021/01/04 Python

远程Wi-Fi宠物监控相机：Petcube

2017/04/26 全球购物

英国现代绅士品牌：Hackett

2017/12/17 全球购物

加拿大租车网站：Enterprise Rent-A-Car

2018/07/26 全球购物

教师实习的自我鉴定

2013/10/26 职场文书

大学生全国两会报告感想

2014/03/17 职场文书

写给老师的保证书

2015/05/09 职场文书