python urllib库的使用详解


Posted in Python onApril 13, 2021

相关:urllib是python内置的http请求库,本文介绍urllib三个模块:请求模块urllib.request、异常处理模块urllib.error、url解析模块urllib.parse。

1、请求模块:urllib.request

python2

import urllib2
response = urllib2.urlopen('http://httpbin.org/robots.txt')

python3

import urllib.request
res = urllib.request.urlopen('http://httpbin.org/robots.txt')
urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
urlopen()方法中的url参数可以是字符串,也可以是一个Request对象

#url可以是字符串
import urllib.request

resp = urllib.request.urlopen('http://www.baidu.com')
print(resp.read().decode('utf-8'))  # read()获取响应体的内容,内容是bytes字节流,需要转换成字符串
##url可以也是Request对象
import urllib.request

request = urllib.request.Request('http://httpbin.org')
response = urllib.request.urlopen(request)
print(response.read().decode('utf-8'))

data参数:post请求

# coding:utf8
import urllib.request, urllib.parse

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding='utf8')
resp = urllib.request.urlopen('http://httpbin.org/post', data=data)
print(resp.read())

urlopen()中的参数timeout:设置请求超时时间:

# coding:utf8
#设置请求超时时间
import urllib.request

resp = urllib.request.urlopen('http://httpbin.org/get', timeout=0.1)
print(resp.read().decode('utf-8'))

响应类型:

# coding:utf8
#响应类型
import urllib.request

resp = urllib.request.urlopen('http://httpbin.org/get')
print(type(resp))

python urllib库的使用详解

响应的状态码、响应头:

# coding:utf8
#响应的状态码、响应头
import urllib.request

resp = urllib.request.urlopen('http://www.baidu.com')
print(resp.status)
print(resp.getheaders())  # 数组(元组列表)
print(resp.getheader('Server'))  # "Server"大小写不区分

200
[('Bdpagetype', '1'), ('Bdqid', '0xa6d873bb003836ce'), ('Cache-Control', 'private'), ('Content-Type', 'text/html'), ('Cxy_all', 'baidu+b8704ff7c06fb8466a83df26d7f0ad23'), ('Date', 'Sun, 21 Apr 2019 15:18:24 GMT'), ('Expires', 'Sun, 21 Apr 2019 15:18:03 GMT'), ('P3p', 'CP=" OTI DSP COR IVA OUR IND COM "'), ('Server', 'BWS/1.1'), ('Set-Cookie', 'BAIDUID=8C61C3A67C1281B5952199E456EEC61E:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com'), ('Set-Cookie', 'BIDUPSID=8C61C3A67C1281B5952199E456EEC61E; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com'), ('Set-Cookie', 'PSTM=1555859904; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com'), ('Set-Cookie', 'delPer=0; path=/; domain=.baidu.com'), ('Set-Cookie', 'BDSVRTM=0; path=/'), ('Set-Cookie', 'BD_HOME=0; path=/'), ('Set-Cookie', 'H_PS_PSSID=1452_28777_21078_28775_28722_28557_28838_28584_28604; path=/; domain=.baidu.com'), ('Vary', 'Accept-Encoding'), ('X-Ua-Compatible', 'IE=Edge,chrome=1'), ('Connection', 'close'), ('Transfer-Encoding', 'chunked')]
BWS/1.1

使用代理:urllib.request.ProxyHandler():

# coding:utf8
proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')

opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
# This time, rather than install the OpenerDirector, we use it directly:
resp = opener.open('http://www.example.com/login.html')
print(resp.read())

2、异常处理模块:urllib.error

异常处理实例1:

# coding:utf8
from urllib import error, request

try:
    resp = request.urlopen('http://www.blueflags.cn')
except error.URLError as e:
    print(e.reason)

python urllib库的使用详解

异常处理实例2:

# coding:utf8
from urllib import error, request

try:
    resp = request.urlopen('http://www.baidu.com')
except error.HTTPError as e:
    print(e.reason, e.code, e.headers, sep='\n')
except error.URLError as e:
    print(e.reason)
else:
    print('request successfully')

python urllib库的使用详解

异常处理实例3:

# coding:utf8
import socket, urllib.request, urllib.error

try:
    resp = urllib.request.urlopen('http://www.baidu.com', timeout=0.01)
except urllib.error.URLError as e:
    print(type(e.reason))
    if isinstance(e.reason,socket.timeout):
        print('time out')

python urllib库的使用详解

3、url解析模块:urllib.parse

parse.urlencode

# coding:utf8
from urllib import request, parse

url = 'http://httpbin.org/post'
headers = {
    'Host': 'httpbin.org',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'
}
dict = {'name': 'Germey'}
data = bytes(parse.urlencode(dict), encoding='utf8')
req = request.Request(url=url, data=data, headers=headers, method='POST')
resp = request.urlopen(req)
print(resp.read().decode('utf-8'))
{
"args": {},
"data": "",
"files": {},
"form": {
"name": "Thanlon"
},
"headers": {
"Accept-Encoding": "identity",
"Content-Length": "12",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36"
},
"json": null,
"origin": "117.136.78.194, 117.136.78.194",
"url": "https://httpbin.org/post"
}

add_header方法添加请求头:

# coding:utf8
from urllib import request, parse

url = 'http://httpbin.org/post'
dict = {'name': 'Thanlon'}
data = bytes(parse.urlencode(dict), encoding='utf8')
req = request.Request(url=url, data=data, method='POST')
req.add_header('User-Agent',
               'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36')
resp = request.urlopen(req)
print(resp.read().decode('utf-8'))

parse.urlparse:

# coding:utf8
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=1#comment')
print(type(result))
print(result)

<class 'urllib.parse.ParseResult'>
ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=1', fragment='comment')

from urllib.parse import urlparse

result = urlparse('www.baidu.com/index.html;user?id=1#comment', scheme='https')
print(type(result))
print(result)

<class 'urllib.parse.ParseResult'>
ParseResult(scheme='https', netloc='', path='www.baidu.com/index.html', params='user', query='id=1', fragment='comment')

# coding:utf8
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=1#comment', scheme='https')
print(result)

ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=1', fragment='comment')

# coding:utf8
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=1#comment',allow_fragments=False)
print(result)

ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=1', fragment='comment')

parse.urlunparse:

# coding:utf8
from urllib.parse import urlunparse

data = ['http', 'www.baidu.com', 'index.html', 'user', 'name=Thanlon', 'comment']
print(urlunparse(data))

python urllib库的使用详解

parse.urljoin:

# coding:utf8
from urllib.parse import urljoin

print(urljoin('http://www.bai.com', 'index.html'))
print(urljoin('http://www.baicu.com', 'https://www.thanlon.cn/index.html'))#以后面为基准

python urllib库的使用详解

urlencode将字典对象转换成get请求的参数:

# coding:utf8
from urllib.parse import urlencode

params = {
    'name': 'Thanlon',
    'age': 22
}
baseUrl = 'http://www.thanlon.cn?'
url = baseUrl + urlencode(params)
print(url)

python urllib库的使用详解

4、Cookie

cookie的获取(保持登录会话信息):

# coding:utf8
#cookie的获取(保持登录会话信息)
import urllib.request, http.cookiejar

cookie = http.cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
res = opener.open('http://www.baidu.com')
for item in cookie:
    print(item.name + '=' + item.value)

python urllib库的使用详解

MozillaCookieJar(filename)形式保存cookie

# coding:utf8
#将cookie保存为cookie.txt
import http.cookiejar, urllib.request

filename = 'cookie.txt'
cookie = http.cookiejar.MozillaCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
res = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True, ignore_expires=True)

LWPCookieJar(filename)形式保存cookie:

# coding:utf8
import http.cookiejar, urllib.request

filename = 'cookie.txt'
cookie = http.cookiejar.LWPCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
res = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True, ignore_expires=True)

读取cookie请求,获取登陆后的信息

# coding:utf8
import http.cookiejar, urllib.request

cookie = http.cookiejar.LWPCookieJar()
cookie.load('cookie.txt', ignore_discard=True, ignore_expires=True)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
resp = opener.open('http://www.baidu.com')
print(resp.read().decode('utf-8'))

以上就是python urllib库的使用详解的详细内容,更多关于python urllib库的资料请关注三水点靠木其它相关文章!

Python 相关文章推荐
python使用pil生成图片验证码的方法
May 08 Python
解析Python中的二进制位运算符
May 13 Python
详解python字节码
Feb 07 Python
python寻找list中最大值、最小值并返回其所在位置的方法
Jun 27 Python
对python的bytes类型数据split分割切片方法
Dec 04 Python
对python指数、幂数拟合curve_fit详解
Dec 29 Python
python多继承(钻石继承)问题和解决方法简单示例
Oct 21 Python
python实现批量修改文件名
Mar 23 Python
python脚本第一行如何写
Aug 30 Python
Python Merge函数原理及用法解析
Sep 16 Python
Pycharm配置lua编译环境过程图解
Nov 28 Python
Appium+Python实现简单的自动化登录测试的实现
Jan 26 Python
用Python将库打包发布到pypi
python xlwt模块的使用解析
python 爬取豆瓣网页的示例
简述python四种分词工具,盘点哪个更好用?
Apr 13 #Python
python自动化调用百度api解决验证码
利用Python网络爬虫爬取各大音乐评论的代码
用Python制作灯光秀短视频的思路详解
You might like
PHP CURL模拟登录新浪微博抓取页面内容 基于EaglePHP框架开发
2012/01/16 PHP
PHP下获取上个月、下个月、本月的日期(strtotime,date)
2014/02/02 PHP
php计算多维数组中所有值总和的方法
2015/06/24 PHP
thinkPHP多域名情况下使用memcache方式共享session数据的实现方法
2016/07/21 PHP
Yii Framework框架开发微信公众平台示例
2020/04/26 PHP
js自动闭合html标签(自动补全html标记)
2012/10/04 Javascript
javascript四舍五入函数代码分享(保留后几位)
2013/12/10 Javascript
JavaScript的jQuery库中function的存在和参数问题
2015/08/13 Javascript
用JS实现图片轮播效果代码(一)
2016/06/26 Javascript
javascript验证手机号和实现星号(*)代替实例
2016/08/16 Javascript
jquery购物车结算功能实现方法
2020/10/29 Javascript
工作中常用的js、jquery自定义扩展函数代码片段汇总
2016/12/22 Javascript
CentOS环境中MySQL修改root密码方法
2018/01/07 Javascript
vue-cli3.0+element-ui上传组件el-upload的使用
2018/12/03 Javascript
JS温故而知新之变量提升和时间死区
2019/01/27 Javascript
JSON是什么?有哪些优点?JSON和XML的区别?
2019/04/29 Javascript
你可能从未使用过的11+个JavaScript特性(小结)
2020/01/08 Javascript
微信小程序个人中心的列表控件实现代码
2020/04/26 Javascript
jQuery中event.target和this的区别详解
2020/08/13 jQuery
[00:02]DOTA2新版本使用PA至宝后暴击展示
2014/11/19 DOTA
[01:29:46]DOTA2上海特级锦标赛C组资格赛#1 OG VS LGD第二局
2016/02/27 DOTA
[03:36]DOTA2完美大师赛coL战队趣味视频——我演你猜
2017/11/23 DOTA
[47:42]Fnatic vs Liquid 2018国际邀请赛小组赛BO2 第一场 8.16
2018/08/17 DOTA
在Python中使用M2Crypto模块实现AES加密的教程
2015/04/08 Python
Python实现破解12306图片验证码的方法分析
2017/12/29 Python
python3实现elasticsearch批量更新数据
2019/12/03 Python
css3实现二维码扫描特效的示例
2020/10/29 HTML / CSS
阿波罗盒子:Apollo Box
2017/08/14 全球购物
将"引用"作为函数参数有哪些特点
2013/04/05 面试题
法学个人求职信范文
2014/01/27 职场文书
小学中秋节活动方案
2014/02/06 职场文书
学习标兵获奖感言
2014/02/20 职场文书
学校党的群众路线教育实践活动对照检查材料
2014/09/24 职场文书
教代会闭幕词
2015/01/28 职场文书
小学信息技术教学反思
2016/02/16 职场文书
优秀共产党员事迹材料2016
2016/02/29 职场文书