Python requests模块基础使用方法实例及高级应用(自动登陆,抓取网页源码)实例详解


Posted in Python onFebruary 14, 2020

1、Python requests模块说明

requests是使用Apache2 licensed 许可证的HTTP库。

用python编写。

比urllib2模块更简洁。

Request支持HTTP连接保持和连接池,支持使用cookie保持会话,支持文件上传,支持自动响应内容的编码,支持国际化的URL和POST数据自动编码。

在python内置模块的基础上进行了高度的封装,从而使得python进行网络请求时,变得人性化,使用Requests可以轻而易举的完成浏览器可有的任何操作。

现代,国际化,友好。

requests会自动实现持久连接keep-alive

2、Python requests模块基础入门

1)导入模块

import requests

2)发送请求的简洁

示例代码:获取一个网页(个人github)

import requests
r = requests.get('https://github.com/Ranxf')    # 最基本的不带参数的get请求
r1 = requests.get(url='http://dict.baidu.com/s', params={'wd': 'python'})   # 带参数的get请求

我们还可以使用requests模块其它请求方法

1   requests.get(‘https://github.com/timeline.json')                                # GET请求

2   requests.post(“http://httpbin.org/post”)                                        # POST请求

3   requests.put(“http://httpbin.org/put”)                                          # PUT请求

4   requests.delete(“http://httpbin.org/delete”)                                    # DELETE请求

5   requests.head(“http://httpbin.org/get”)                                         # HEAD请求

6   requests.options(“http://httpbin.org/get” )                                     # OPTIONS请求

3)为url传递参数

>>> url_params = {'key':'value'}    #  字典传递参数,如果值为None的键不会被添加到url中
>>> r = requests.get('your url',params = url_params)
>>> print(r.url)

your url?key=value

4)响应的内容

r.encoding                       #获取当前的编码

r.encoding = 'utf-8'             #设置编码

r.text                           #以encoding解析返回内容。字符串方式的响应体,会自动根据响应头部的字符编码进行解码。

r.content                        #以字节形式(二进制)返回。字节方式的响应体,会自动为你解码 gzip 和 deflate 压缩。

r.headers                        #以字典对象存储服务器响应头,但是这个字典比较特殊,字典键不区分大小写,若键不存在则返回None

r.status_code                     #响应状态码

r.raw                             #返回原始响应体,也就是 urllib 的 response 对象,使用 r.raw.read()   

r.ok                              # 查看r.ok的布尔值便可以知道是否登陆成功

 #*特殊方法*#

r.json()                         #Requests中内置的JSON解码器,以json形式返回,前提返回的内容确保是json格式的,不然解析出错会抛异常

r.raise_for_status()             #失败请求(非200响应)抛出异常

post发送json请求:

import requests
import json
 
r = requests.post('https://api.github.com/some/endpoint', data=json.dumps({'some': 'data'}))

print(r.json())

5)定制头和cookie信息

header = {'user-agent': 'my-app/0.0.1''}
cookie = {'key':'value'}
 r = requests.get/post('your url',headers=header,cookies=cookie) 
data = {'some': 'data'}
headers = {'content-type': 'application/json',
      'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'}
 
r = requests.post('https://api.github.com/some/endpoint', data=data, headers=headers)
print(r.text)

6)响应状态码

使用requests方法后,会返回一个response对象,其存储了服务器响应的内容,如上实例中已经提到的 r.text、r.status_code……

获取文本方式的响应体实例:当你访问 r.text 之时,会使用其响应的文本编码进行解码,并且你可以修改其编码让 r.text 使用自定义的编码进行解码。

r = requests.get('http://www.itwhy.org')
print(r.text, '\n{}\n'.format('*'*79), r.encoding)
r.encoding = 'GBK'
print(r.text, '\n{}\n'.format('*'*79), r.encoding)

示例代码:

import requests

r = requests.get('https://github.com/Ranxf')    # 最基本的不带参数的get请求
print(r.status_code)                # 获取返回状态
r1 = requests.get(url='http://dict.baidu.com/s', params={'wd': 'python'})   # 带参数的get请求
print(r1.url)
print(r1.text)    # 打印解码后的返回数据

运行结果:

/usr/bin/python3.5 /home/rxf/python3_1000/1000/python3_server/python3_requests/demo1.py

200

http://dict.baidu.com/s?wd=python

…………

Process finished with exit code 0

 r.status_code                      #如果不是200,可以使用 r.raise_for_status() 抛出异常

7)响应

r.headers                                  #返回字典类型,头信息

r.requests.headers                         #返回发送到服务器的头信息

r.cookies                                  #返回cookie

r.history                                  #返回重定向信息,当然可以在请求是加上allow_redirects = false 阻止重定向

8)超时

r = requests.get('url',timeout=1)      #设置秒数超时,仅对于连接有效

9)会话对象,能够跨请求保持某些参数

s = requests.Session()
s.auth = ('auth','passwd')
s.headers = {'key':'value'}
r = s.get('url')
r1 = s.get('url1')

10)代理

proxies = {'http':'ip1','https':'ip2' }
requests.get('url',proxies=proxies)

汇总:

# HTTP请求类型
# get类型
r = requests.get('https://github.com/timeline.json')
# post类型
r = requests.post("http://m.ctrip.com/post")
# put类型
r = requests.put("http://m.ctrip.com/put")
# delete类型
r = requests.delete("http://m.ctrip.com/delete")
# head类型
r = requests.head("http://m.ctrip.com/head")
# options类型
r = requests.options("http://m.ctrip.com/get")

# 获取响应内容
print(r.content) #以字节的方式去显示,中文显示为字符
print(r.text) #以文本的方式去显示

#URL传递参数
payload = {'keyword': '香港', 'salecityid': '2'}
r = requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list", params=payload) 
print(r.url) #示例为http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=香港

#获取/修改网页编码
r = requests.get('https://github.com/timeline.json')
print (r.encoding)


#json处理
r = requests.get('https://github.com/timeline.json')
print(r.json()) # 需要先import json  

# 定制请求头
url = 'http://m.ctrip.com'
headers = {'User-Agent' : 'Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 4 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19'}
r = requests.post(url, headers=headers)
print (r.request.headers)

#复杂post请求
url = 'http://m.ctrip.com'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload)) #如果传递的payload是string而不是dict,需要先调用dumps方法格式化一下

# post多部分编码文件
url = 'http://m.ctrip.com'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)

# 响应状态码
r = requests.get('http://m.ctrip.com')
print(r.status_code)
  
# 响应头
r = requests.get('http://m.ctrip.com')
print (r.headers)
print (r.headers['Content-Type'])
print (r.headers.get('content-type')) #访问响应头部分内容的两种方式
  
# Cookies
url = 'http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies['example_cookie_name']  #读取cookies
  
url = 'http://m.ctrip.com/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies) #发送cookies

#设置超时时间
r = requests.get('http://m.ctrip.com', timeout=0.001)

#设置访问代理
proxies = {
      "http": "http://10.10.1.10:3128",
      "https": "http://10.10.1.100:4444",
     }
r = requests.get('http://m.ctrip.com', proxies=proxies)


#如果代理需要用户名和密码,则需要这样:
proxies = {
  "http": "http://user:pass@10.10.1.10:3128/",
}
# HTTP请求类型
# get类型
r = requests.get('https://github.com/timeline.json')
# post类型
r = requests.post("http://m.ctrip.com/post")
# put类型
r = requests.put("http://m.ctrip.com/put")
# delete类型
r = requests.delete("http://m.ctrip.com/delete")
# head类型
r = requests.head("http://m.ctrip.com/head")
# options类型
r = requests.options("http://m.ctrip.com/get")

# 获取响应内容
print(r.content) #以字节的方式去显示,中文显示为字符
print(r.text) #以文本的方式去显示

#URL传递参数
payload = {'keyword': '香港', 'salecityid': '2'}
r = requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list", params=payload) 
print(r.url) #示例为http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=香港

#获取/修改网页编码
r = requests.get('https://github.com/timeline.json')
print (r.encoding)


#json处理
r = requests.get('https://github.com/timeline.json')
print(r.json()) # 需要先import json  

# 定制请求头
url = 'http://m.ctrip.com'
headers = {'User-Agent' : 'Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 4 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19'}
r = requests.post(url, headers=headers)
print (r.request.headers)

#复杂post请求
url = 'http://m.ctrip.com'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload)) #如果传递的payload是string而不是dict,需要先调用dumps方法格式化一下

# post多部分编码文件
url = 'http://m.ctrip.com'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)

# 响应状态码
r = requests.get('http://m.ctrip.com')
print(r.status_code)
  
# 响应头
r = requests.get('http://m.ctrip.com')
print (r.headers)
print (r.headers['Content-Type'])
print (r.headers.get('content-type')) #访问响应头部分内容的两种方式
  
# Cookies
url = 'http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies['example_cookie_name']  #读取cookies
  
url = 'http://m.ctrip.com/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies) #发送cookies

#设置超时时间
r = requests.get('http://m.ctrip.com', timeout=0.001)

#设置访问代理
proxies = {
      "http": "http://10.10.1.10:3128",
      "https": "http://10.10.1.100:4444",
     }
r = requests.get('http://m.ctrip.com', proxies=proxies)


#如果代理需要用户名和密码,则需要这样:
proxies = {
  "http": "http://user:pass@10.10.1.10:3128/",
}

3、示例代码

GET请求

# 1、无参数实例
 
import requests
 
ret = requests.get('https://github.com/timeline.json')
 
print(ret.url)
print(ret.text)
 
 
 
# 2、有参数实例
 
import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.get("http://httpbin.org/get", params=payload)
 
print(ret.url)
print(ret.text)

POST请求

# 1、基本POST实例
 
import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.post("http://httpbin.org/post", data=payload)
 
print(ret.text)
 
 
# 2、发送请求头和数据实例
 
import requests
import json
 
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}
 
ret = requests.post(url, data=json.dumps(payload), headers=headers)
 
print(ret.text)
print(ret.cookies)

请求参数

def request(method, url, **kwargs):

    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object.

    :param url: URL for the new :class:`Request` object.

    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.

    :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.

    :param json: (optional) json data to send in the body of the :class:`Request`.

    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.

    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.

    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.

        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``

        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string

        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers

        to add for the file.

    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.

    :param timeout: (optional) How long to wait for the server to send data

        before giving up, as a float, or a :ref:`(connect timeout, read

        timeout) <timeouts>` tuple.

    :type timeout: float or tuple

    :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.

    :type allow_redirects: bool

    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.

    :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.

    :param stream: (optional) if ``False``, the response content will be immediately downloaded.

    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.

    :return: :class:`Response <Response>` object

    :rtype: requests.Response

    Usage::

      >>> import requests

      >>> req = requests.request('GET', 'http://httpbin.org/get')

      <Response [200]>

参数示例代码

def param_method_url():
  # requests.request(method='get', url='http://127.0.0.1:8000/test/')
  # requests.request(method='post', url='http://127.0.0.1:8000/test/')
  pass


def param_param():
  # - 可以是字典
  # - 可以是字符串
  # - 可以是字节(ascii编码以内)

  # requests.request(method='get',
  # url='http://127.0.0.1:8000/test/',
  # params={'k1': 'v1', 'k2': '水电费'})

  # requests.request(method='get',
  # url='http://127.0.0.1:8000/test/',
  # params="k1=v1&k2=水电费&k3=v3&k3=vv3")

  # requests.request(method='get',
  # url='http://127.0.0.1:8000/test/',
  # params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3", encoding='utf8'))

  # 错误
  # requests.request(method='get',
  # url='http://127.0.0.1:8000/test/',
  # params=bytes("k1=v1&k2=水电费&k3=v3&k3=vv3", encoding='utf8'))
  pass


def param_data():
  # 可以是字典
  # 可以是字符串
  # 可以是字节
  # 可以是文件对象

  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # data={'k1': 'v1', 'k2': '水电费'})

  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # data="k1=v1; k2=v2; k3=v3; k3=v4"
  # )

  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # data="k1=v1;k2=v2;k3=v3;k3=v4",
  # headers={'Content-Type': 'application/x-www-form-urlencoded'}
  # )

  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # data=open('data_file.py', mode='r', encoding='utf-8'), # 文件内容是:k1=v1;k2=v2;k3=v3;k3=v4
  # headers={'Content-Type': 'application/x-www-form-urlencoded'}
  # )
  pass


def param_json():
  # 将json中对应的数据进行序列化成一个字符串,json.dumps(...)
  # 然后发送到服务器端的body中,并且Content-Type是 {'Content-Type': 'application/json'}
  requests.request(method='POST',
           url='http://127.0.0.1:8000/test/',
           json={'k1': 'v1', 'k2': '水电费'})


def param_headers():
  # 发送请求头到服务器端
  requests.request(method='POST',
           url='http://127.0.0.1:8000/test/',
           json={'k1': 'v1', 'k2': '水电费'},
           headers={'Content-Type': 'application/x-www-form-urlencoded'}
           )


def param_cookies():
  # 发送Cookie到服务器端
  requests.request(method='POST',
           url='http://127.0.0.1:8000/test/',
           data={'k1': 'v1', 'k2': 'v2'},
           cookies={'cook1': 'value1'},
           )
  # 也可以使用CookieJar(字典形式就是在此基础上封装)
  from http.cookiejar import CookieJar
  from http.cookiejar import Cookie

  obj = CookieJar()
  obj.set_cookie(Cookie(version=0, name='c1', value='v1', port=None, domain='', path='/', secure=False, expires=None,
             discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False,
             port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False)
          )
  requests.request(method='POST',
           url='http://127.0.0.1:8000/test/',
           data={'k1': 'v1', 'k2': 'v2'},
           cookies=obj)


def param_files():
  # 发送文件
  # file_dict = {
  # 'f1': open('readme', 'rb')
  # }
  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # files=file_dict)

  # 发送文件,定制文件名
  # file_dict = {
  # 'f1': ('test.txt', open('readme', 'rb'))
  # }
  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # files=file_dict)

  # 发送文件,定制文件名
  # file_dict = {
  # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf")
  # }
  # requests.request(method='POST',
  # url='http://127.0.0.1:8000/test/',
  # files=file_dict)

  # 发送文件,定制文件名
  # file_dict = {
  #   'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf", 'application/text', {'k1': '0'})
  # }
  # requests.request(method='POST',
  #         url='http://127.0.0.1:8000/test/',
  #         files=file_dict)

  pass


def param_auth():
  from requests.auth import HTTPBasicAuth, HTTPDigestAuth

  ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'sdfasdfasdf'))
  print(ret.text)

  # ret = requests.get('http://192.168.1.1',
  # auth=HTTPBasicAuth('admin', 'admin'))
  # ret.encoding = 'gbk'
  # print(ret.text)

  # ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user', 'pass'))
  # print(ret)
  #


def param_timeout():
  # ret = requests.get('http://google.com/', timeout=1)
  # print(ret)

  # ret = requests.get('http://google.com/', timeout=(5, 1))
  # print(ret)
  pass


def param_allow_redirects():
  ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False)
  print(ret.text)


def param_proxies():
  # proxies = {
  # "http": "61.172.249.96:80",
  # "https": "http://61.185.219.126:3128",
  # }

  # proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}

  # ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)
  # print(ret.headers)


  # from requests.auth import HTTPProxyAuth
  #
  # proxyDict = {
  # 'http': '77.75.105.165',
  # 'https': '77.75.105.165'
  # }
  # auth = HTTPProxyAuth('username', 'mypassword')
  #
  # r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)
  # print(r.text)

  pass


def param_stream():
  ret = requests.get('http://127.0.0.1:8000/test/', stream=True)
  print(ret.content)
  ret.close()

  # from contextlib import closing
  # with closing(requests.get('http://httpbin.org/get', stream=True)) as r:
  # # 在此处理响应。
  # for i in r.iter_content():
  # print(i)


def requests_session():
  import requests

  session = requests.Session()

  ### 1、首先登陆任何页面,获取cookie

  i1 = session.get(url="http://dig.chouti.com/help/service")

  ### 2、用户登陆,携带上一次的cookie,后台对cookie中的 gpsd 进行授权
  i2 = session.post(
    url="http://dig.chouti.com/login",
    data={
      'phone': "8615131255089",
      'password': "xxxxxx",
      'oneMonth': ""
    }
  )

  i3 = session.post(
    url="http://dig.chouti.com/link/vote?linksId=8589623",
  )
  print(i3.text)

json请求:

#! /usr/bin/python3
import requests
import json


class url_request():
  def __init__(self):
    ''' init '''

if __name__ == '__main__':
  heard = {'Content-Type': 'application/json'}
  payload = {'CountryName': '中国',
        'ProvinceName': '四川省',
        'L1CityName': 'chengdu',
        'L2CityName': 'yibing',
        'TownName': '',
        'Longitude': '107.33393',
        'Latitude': '33.157131',
        'Language': 'CN'}
  r = requests.post("http://www.xxxxxx.com/CityLocation/json/LBSLocateCity", heards=heard, data=payload)
  data = r.json()
  if r.status_code!=200:
    print('LBSLocateCity API Error' + str(r.status_code))
  print(data['CityEntities'][0]['CityID']) # 打印返回json中的某个key的value
  print(data['ResponseStatus']['Ack'])
  print(json.dump(data, indent=4, sort_keys=True, ensure_ascii=False)) # 树形打印json,ensure_ascii必须设为False否则中文会显示为unicode

Xml请求:

#! /usr/bin/python3
import requests

class url_request():
  def __init__(self):
    """init"""

if __name__ == '__main__':
  heards = {'Content-type': 'text/xml'}
  XML = '<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><Request xmlns="http://tempuri.org/"><jme><JobClassFullName>WeChatJSTicket.JobWS.Job.JobRefreshTicket,WeChatJSTicket.JobWS</JobClassFullName><Action>RUN</Action><Param>1</Param><HostIP>127.0.0.1</HostIP><JobInfo>1</JobInfo><NeedParallel>false</NeedParallel></jme></Request></soap:Body></soap:Envelope>'
  url = 'http://jobws.push.mobile.xxxxxxxx.com/RefreshWeiXInTokenJob/RefreshService.asmx'
  r = requests.post(url=url, heards=heards, data=XML)
  data = r.text
  print(data)

状态异常处理

import requests

URL = 'http://ip.taobao.com/service/getIpInfo.php' # 淘宝IP地址库API
try:
  r = requests.get(URL, params={'ip': '8.8.8.8'}, timeout=1)
  r.raise_for_status() # 如果响应状态码不是 200,就主动抛出异常
except requests.RequestException as e:
  print(e)
else:
  result = r.json()
  print(type(result), result, sep='\n')

上传文件

使用request模块,也可以上传文件,文件的类型会自动进行处理:

import requests
 
url = 'http://127.0.0.1:8080/upload'
files = {'file': open('/home/rxf/test.jpg', 'rb')}
#files = {'file': ('report.jpg', open('/home/lyb/sjzl.mpg', 'rb'))}   #显式的设置文件名
 
r = requests.post(url, files=files)
print(r.text)

request更加方便的是,可以把字符串当作文件进行上传:

import requests
 
url = 'http://127.0.0.1:8080/upload'
files = {'file': ('test.txt', b'Hello Requests.')}   #必需显式的设置文件名
 
r = requests.post(url, files=files)
print(r.text)

身份验证

基本身份认证(HTTP Basic Auth)

import requests
from requests.auth import HTTPBasicAuth
 
r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=HTTPBasicAuth('user', 'passwd'))
# r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=('user', 'passwd'))  # 简写
print(r.json())

另一种非常流行的HTTP身份认证形式是摘要式身份认证,Requests对它的支持也是开箱即可用的:

requests.get(URL, auth=HTTPDigestAuth('user', 'pass')

Cookies与会话对象

如果某个响应中包含一些Cookie,你可以快速访问它们:

import requests
 
r = requests.get('http://www.google.com.hk/')
print(r.cookies['NID'])
print(tuple(r.cookies))

要想发送你的cookies到服务器,可以使用 cookies 参数:

import requests
 
url = 'http://httpbin.org/cookies'
cookies = {'testCookies_1': 'Hello_Python3', 'testCookies_2': 'Hello_Requests'}
# 在Cookie Version 0中规定空格、方括号、圆括号、等于号、逗号、双引号、斜杠、问号、@,冒号,分号等特殊符号都不能作为Cookie的内容。
r = requests.get(url, cookies=cookies)
print(r.json())

会话对象让你能够跨请求保持某些参数,最方便的是在同一个Session实例发出的所有请求之间保持cookies,且这些都是自动处理的,甚是方便。

下面就来一个真正的实例,如下是快盘签到脚本:

import requests
 
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Encoding': 'gzip, deflate, compress',
      'Accept-Language': 'en-us;q=0.5,en;q=0.3',
      'Cache-Control': 'max-age=0',
      'Connection': 'keep-alive',
      'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'}
 
s = requests.Session()
s.headers.update(headers)
# s.auth = ('superuser', '123')
s.get('https://www.kuaipan.cn/account_login.htm')
 
_URL = 'http://www.kuaipan.cn/index.php'
s.post(_URL, params={'ac':'account', 'op':'login'},
    data={'username':'****@foxmail.com', 'userpwd':'********', 'isajax':'yes'})
r = s.get(_URL, params={'ac':'zone', 'op':'taskdetail'})
print(r.json())
s.get(_URL, params={'ac':'common', 'op':'usersign'})

requests模块抓取网页源码并保存到文件示例

这是一个基本的文件保存操作,但这里有几个值得注意的问题:

1.安装requests包,命令行输入pip install requests即可自动安装。很多人推荐使用requests,自带的urllib.request也可以抓取网页源码

2.open方法encoding参数设为utf-8,否则保存的文件会出现乱码。

3.如果直接在cmd中输出抓取的内容,会提示各种编码错误,所以保存到文件查看。

4.with open方法是更好的写法,可以自动操作完毕后释放资源

Python requests模块抽屉自动登录

#! /urs/bin/python3
import requests

'''requests模块抓取网页源码并保存到文件示例'''
html = requests.get("http://www.baidu.com")
with open('test.txt', 'w', encoding='utf-8') as f:
  f.write(html.text)
  
'''读取一个txt文件,每次读取一行,并保存到另一个txt文件中的示例'''
ff = open('testt.txt', 'w', encoding='utf-8')
with open('test.txt', encoding="utf-8") as f:
  for line in f:
    ff.write(line)
    ff.close()

因为在命令行中打印每次读取一行的数据,中文会出现编码错误,所以每次读取一行并保存到另一个文件,这样来测试读取是否正常。(注意open的时候制定encoding编码方式)

Python requests模块自动登陆实例:

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests


# ############## 方式一 ##############
"""
# ## 1、首先登陆任何页面,获取cookie
i1 = requests.get(url="http://dig.chouti.com/help/service")
i1_cookies = i1.cookies.get_dict()

# ## 2、用户登陆,携带上一次的cookie,后台对cookie中的 gpsd 进行授权
i2 = requests.post(
  url="http://dig.chouti.com/login",
  data={
    'phone': "8615131255089",
    'password': "xxooxxoo",
    'oneMonth': ""
  },
  cookies=i1_cookies
)

# ## 3、点赞(只需要携带已经被授权的gpsd即可)
gpsd = i1_cookies['gpsd']
i3 = requests.post(
  url="http://dig.chouti.com/link/vote?linksId=8589523",
  cookies={'gpsd': gpsd}
)

print(i3.text)
"""


# ############## 方式二 ##############
"""
import requests

session = requests.Session()
i1 = session.get(url="http://dig.chouti.com/help/service")
i2 = session.post(
  url="http://dig.chouti.com/login",
  data={
    'phone': "8615131255089",
    'password': "xxooxxoo",
    'oneMonth': ""
  }
)
i3 = session.post(
  url="http://dig.chouti.com/link/vote?linksId=8589523"
)
print(i3.text)

"""

Python requests模块github自动登录

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import requests
from bs4 import BeautifulSoup

# ############## 方式一 ##############
#
# # 1. 访问登陆页面,获取 authenticity_token
# i1 = requests.get('https://github.com/login')
# soup1 = BeautifulSoup(i1.text, features='lxml')
# tag = soup1.find(name='input', attrs={'name': 'authenticity_token'})
# authenticity_token = tag.get('value')
# c1 = i1.cookies.get_dict()
# i1.close()
#
# # 1. 携带authenticity_token和用户名密码等信息,发送用户验证
# form_data = {
# "authenticity_token": authenticity_token,
#   "utf8": "",
#   "commit": "Sign in",
#   "login": "wupeiqi@live.com",
#   'password': 'xxoo'
# }
#
# i2 = requests.post('https://github.com/session', data=form_data, cookies=c1)
# c2 = i2.cookies.get_dict()
# c1.update(c2)
# i3 = requests.get('https://github.com/settings/repositories', cookies=c1)
#
# soup3 = BeautifulSoup(i3.text, features='lxml')
# list_group = soup3.find(name='div', class_='listgroup')
#
# from bs4.element import Tag
#
# for child in list_group.children:
#   if isinstance(child, Tag):
#     project_tag = child.find(name='a', class_='mr-1')
#     size_tag = child.find(name='small')
#     temp = "项目:%s(%s); 项目路径:%s" % (project_tag.get('href'), size_tag.string, project_tag.string, )
#     print(temp)



# ############## 方式二 ##############
# session = requests.Session()
# # 1. 访问登陆页面,获取 authenticity_token
# i1 = session.get('https://github.com/login')
# soup1 = BeautifulSoup(i1.text, features='lxml')
# tag = soup1.find(name='input', attrs={'name': 'authenticity_token'})
# authenticity_token = tag.get('value')
# c1 = i1.cookies.get_dict()
# i1.close()
#
# # 1. 携带authenticity_token和用户名密码等信息,发送用户验证
# form_data = {
#   "authenticity_token": authenticity_token,
#   "utf8": "",
#   "commit": "Sign in",
#   "login": "wupeiqi@live.com",
#   'password': 'xxoo'
# }
#
# i2 = session.post('https://github.com/session', data=form_data)
# c2 = i2.cookies.get_dict()
# c1.update(c2)
# i3 = session.get('https://github.com/settings/repositories')
#
# soup3 = BeautifulSoup(i3.text, features='lxml')
# list_group = soup3.find(name='div', class_='listgroup')
#
# from bs4.element import Tag
#
# for child in list_group.children:
#   if isinstance(child, Tag):
#     project_tag = child.find(name='a', class_='mr-1')
#     size_tag = child.find(name='small')
#     temp = "项目:%s(%s); 项目路径:%s" % (project_tag.get('href'), size_tag.string, project_tag.string, )
#     print(temp)

Python requests模块知乎自动登录

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import time

import requests
from bs4 import BeautifulSoup

session = requests.Session()

i1 = session.get(
  url='https://www.zhihu.com/#signin',
  headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
  }
)

soup1 = BeautifulSoup(i1.text, 'lxml')
xsrf_tag = soup1.find(name='input', attrs={'name': '_xsrf'})
xsrf = xsrf_tag.get('value')

current_time = time.time()
i2 = session.get(
  url='https://www.zhihu.com/captcha.gif',
  params={'r': current_time, 'type': 'login'},
  headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
  })

with open('zhihu.gif', 'wb') as f:
  f.write(i2.content)

captcha = input('请打开zhihu.gif文件,查看并输入验证码:')
form_data = {
  "_xsrf": xsrf,
  'password': 'xxooxxoo',
  "captcha": 'captcha',
  'email': '424662508@qq.com'
}
i3 = session.post(
  url='https://www.zhihu.com/login/email',
  data=form_data,
  headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
  }
)

i4 = session.get(
  url='https://www.zhihu.com/settings/profile',
  headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
  }
)

soup4 = BeautifulSoup(i4.text, 'lxml')
tag = soup4.find(id='rename-section')
nick_name = tag.find('span',class_='name').string
print(nick_name)

Python requests模块博客园自动登录

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import re
import json
import base64

import rsa
import requests


def js_encrypt(text):
  b64der = 'MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCp0wHYbg/NOPO3nzMD3dndwS0MccuMeXCHgVlGOoYyFwLdS24Im2e7YyhB0wrUsyYf0/nhzCzBK8ZC9eCWqd0aHbdgOQT6CuFQBMjbyGYvlVYU2ZP7kG9Ft6YV6oc9ambuO7nPZh+bvXH0zDKfi02prknrScAKC0XhadTHT3Al0QIDAQAB'
  der = base64.standard_b64decode(b64der)

  pk = rsa.PublicKey.load_pkcs1_openssl_der(der)
  v1 = rsa.encrypt(bytes(text, 'utf8'), pk)
  value = base64.encodebytes(v1).replace(b'\n', b'')
  value = value.decode('utf8')

  return value


session = requests.Session()

i1 = session.get('https://passport.cnblogs.com/user/signin')
rep = re.compile("'VerificationToken': '(.*)'")
v = re.search(rep, i1.text)
verification_token = v.group(1)

form_data = {
  'input1': js_encrypt('wptawy'),
  'input2': js_encrypt('asdfasdf'),
  'remember': False
}

i2 = session.post(url='https://passport.cnblogs.com/user/signin',
         data=json.dumps(form_data),
         headers={
           'Content-Type': 'application/json; charset=UTF-8',
           'X-Requested-With': 'XMLHttpRequest',
           'VerificationToken': verification_token}
         )

i3 = session.get(url='https://i.cnblogs.com/EditDiary.aspx')

print(i3.text)

Python requests模块拉勾网自动登录

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import requests


# 第一步:访问登陆页,拿到X_Anti_Forge_Token,X_Anti_Forge_Code
# 1、请求url:https://passport.lagou.com/login/login.html
# 2、请求方法:GET
# 3、请求头:
#  User-agent
r1 = requests.get('https://passport.lagou.com/login/login.html',
         headers={
           'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
         },
         )

X_Anti_Forge_Token = re.findall("X_Anti_Forge_Token = '(.*?)'", r1.text, re.S)[0]
X_Anti_Forge_Code = re.findall("X_Anti_Forge_Code = '(.*?)'", r1.text, re.S)[0]
print(X_Anti_Forge_Token, X_Anti_Forge_Code)
# print(r1.cookies.get_dict())
# 第二步:登陆
# 1、请求url:https://passport.lagou.com/login/login.json
# 2、请求方法:POST
# 3、请求头:
#  cookie
#  User-agent
#  Referer:https://passport.lagou.com/login/login.html
#  X-Anit-Forge-Code:53165984
#  X-Anit-Forge-Token:3b6a2f62-80f0-428b-8efb-ef72fc100d78
#  X-Requested-With:XMLHttpRequest
# 4、请求体:
# isValidate:true
# username:15131252215
# password:ab18d270d7126ea65915c50288c22c0d
# request_form_verifyCode:''
# submit:''
r2 = requests.post(
  'https://passport.lagou.com/login/login.json',
  headers={
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
    'Referer': 'https://passport.lagou.com/login/login.html',
    'X-Anit-Forge-Code': X_Anti_Forge_Code,
    'X-Anit-Forge-Token': X_Anti_Forge_Token,
    'X-Requested-With': 'XMLHttpRequest'
  },
  data={
    "isValidate": True,
    'username': '15131255089',
    'password': 'ab18d270d7126ea65915c50288c22c0d',
    'request_form_verifyCode': '',
    'submit': ''
  },
  cookies=r1.cookies.get_dict()
)
print(r2.text)

更多关于Python requests模块基础使用方法请查看下面的相关链接

Python 相关文章推荐
Window10+Python3.5安装opencv的教程推荐
Apr 02 Python
详解利用django中间件django.middleware.csrf.CsrfViewMiddleware防止csrf攻击
Oct 09 Python
python看某个模块的版本方法
Oct 16 Python
Python中logging实例讲解
Jan 17 Python
django drf框架中的user验证以及JWT拓展的介绍
Aug 12 Python
python分别打包出32位和64位应用程序
Feb 18 Python
python怎么判断模块安装完成
Jun 19 Python
Python2.6版本pip安装步骤解析
Aug 17 Python
Python解析微信dat文件的方法
Nov 30 Python
Python基于mediainfo批量重命名图片文件
Dec 29 Python
Python基础之元类详解
Apr 29 Python
手残删除python之后的补救方法
Jun 26 Python
Python实现名片管理系统
Feb 14 #Python
pycharm设置当前工作目录的操作(working directory)
Feb 14 #Python
python设置代理和添加镜像源的方法
Feb 14 #Python
pycharm 中mark directory as exclude的用法详解
Feb 14 #Python
Python读取分割压缩TXT文本文件实例
Feb 14 #Python
Pytorch .pth权重文件的使用解析
Feb 14 #Python
Python netmiko模块的使用
Feb 14 #Python
You might like
php 结果集的分页实现代码
2009/03/10 PHP
php is_file 判断给定文件名是否为一个正常的文件
2010/05/10 PHP
php实现文件下载代码分享
2014/08/19 PHP
php获取textarea的值并处理回车换行的方法
2014/10/20 PHP
基于thinkPHP3.2实现微信接入及查询token值的方法
2017/04/18 PHP
thinkphp框架实现路由重定义简化url访问地址的方法分析
2020/04/04 PHP
js 方法实现返回多个数据的代码
2009/04/30 Javascript
Javascript Throttle &amp; Debounce应用介绍
2013/03/19 Javascript
在jquery中combobox多选的不兼容问题总结
2013/12/24 Javascript
js判断一个字符串是否包含一个子串的方法
2015/01/26 Javascript
JavaScript中的acos()方法使用详解
2015/06/14 Javascript
简单实现js页面切换功能
2021/01/10 Javascript
AngularJS基础 ng-repeat 指令简单示例
2016/08/03 Javascript
xtemplate node.js 的使用方法实例解析
2016/08/22 Javascript
使用Ajax与服务器(JSON)通信实例
2016/11/04 Javascript
微信小程序实现换肤功能
2018/03/14 Javascript
iview form清除校验状态的实现
2019/09/19 Javascript
layer.open 子页面弹出层向父页面传输数据的例子
2019/09/26 Javascript
使用 Angular RouteReuseStrategy 缓存(路由)组件的实例代码
2019/11/01 Javascript
vue中实现回车键登录功能
2020/02/19 Javascript
Python open读写文件实现脚本
2008/09/06 Python
python基础教程之五种数据类型详解
2017/01/12 Python
python使用sqlite3时游标使用方法
2018/03/13 Python
python 图像平移和旋转的实例
2019/01/10 Python
Python编写打字训练小程序
2019/09/26 Python
Python如何使用vars返回对象的属性列表
2020/10/17 Python
html5使用Drag事件编辑器拖拽上传图片的示例代码
2017/08/22 HTML / CSS
Hotels.com中国区:好订网
2016/08/18 全球购物
荷兰在线啤酒店:Beerwulf
2019/08/26 全球购物
技术岗位竞聘演讲稿
2014/05/16 职场文书
元旦晚会活动总结
2014/07/09 职场文书
2014年初一班主任工作总结
2014/11/08 职场文书
标准版个人借条怎么写?以及什么是借条?
2019/08/28 职场文书
jQuery class属性操作addClass()与removeClass()、hasClass()、toggleClass()
2021/03/31 jQuery
总结Python使用过程中的bug
2021/06/18 Python
vue中控制mock在开发环境使用,在生产环境禁用方式
2022/04/06 Vue.js