python使用scrapy发送post请求的坑


Posted in Python onSeptember 04, 2018

使用requests发送post请求

先来看看使用requests来发送post请求是多少好用,发送请求

Requests 简便的 API 意味着所有 HTTP 请求类型都是显而易见的。例如,你可以这样发送一个 HTTP POST 请求:

>>>r = requests.post('http://httpbin.org/post', data = {'key':'value'})

使用data可以传递字典作为参数,同时也可以传递元祖

>>>payload = (('key1', 'value1'), ('key1', 'value2'))
>>>r = requests.post('http://httpbin.org/post', data=payload)
>>>print(r.text)
{
 ...
 "form": {
  "key1": [
   "value1",
   "value2"
  ]
 },
 ...
}

传递json是这样

>>>import json

>>>url = 'https://api.github.com/some/endpoint'
>>>payload = {'some': 'data'}

>>>r = requests.post(url, data=json.dumps(payload))

2.4.2 版的新加功能:

>>>url = 'https://api.github.com/some/endpoint'
>>>payload = {'some': 'data'}

>>>r = requests.post(url, json=payload)

也就是说,你不需要对参数做什么变化,只需要关注使用data=还是json=,其余的requests都已经帮你做好了。

使用scrapy发送post请求

通过源码可知scrapy默认发送的get请求,当我们需要发送携带参数的请求或登录时,是需要post、请求的,以下面为例

from scrapy.spider import CrawlSpider
from scrapy.selector import Selector
import scrapy
import json
class LaGou(CrawlSpider):
  name = 'myspider'
  def start_requests(self):
    yield scrapy.FormRequest(
      url='https://www.******.com/jobs/positionAjax.json?city=%E5%B9%BF%E5%B7%9E&needAddtionalResult=false',
      formdata={
        'first': 'true',#这里不能给bool类型的True,requests模块中可以
        'pn': '1',#这里不能给int类型的1,requests模块中可以
        'kd': 'python'
      },这里的formdata相当于requ模块中的data,key和value只能是键值对形式
      callback=self.parse
    )
  def parse(self, response):
    datas=json.loads(response.body.decode())['content']['positionResult']['result']
    for data in datas:
      print(data['companyFullName'] + str(data['positionId']))

官方推荐的 Using FormRequest to send data via HTTP POST

return [FormRequest(url="http://www.example.com/post/action",
          formdata={'name': 'John Doe', 'age': '27'},
          callback=self.after_post)]

这里使用的是FormRequest,并使用formdata传递参数,看到这里也是一个字典。

但是,超级坑的一点来了,今天折腾了一下午,使用这种方法发送请求,怎么发都会出问题,返回的数据一直都不是我想要的

return scrapy.FormRequest(url, formdata=(payload))

在网上找了很久,最终找到一种方法,使用scrapy.Request发送请求,就可以正常的获取数据。

return scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},)

参考:Send Post Request in Scrapy

my_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST', 
             body=json.dumps(my_data), 
             headers={'Content-Type':'application/json'} )

FormRequest 与 Request 区别

在文档中,几乎看不到差别,

The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) ? is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.

说FormRequest新增加了一个参数formdata,接受包含表单数据的字典或者可迭代的元组,并将其转化为请求的body。并且FormRequest是继承Request的

class FormRequest(Request):

  def __init__(self, *args, **kwargs):
    formdata = kwargs.pop('formdata', None)
    if formdata and kwargs.get('method') is None:
      kwargs['method'] = 'POST'

    super(FormRequest, self).__init__(*args, **kwargs)

    if formdata:
      items = formdata.items() if isinstance(formdata, dict) else formdata
      querystr = _urlencode(items, self.encoding)
      if self.method == 'POST':
        self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded')
        self._set_body(querystr)
      else:
        self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr)
      ###


def _urlencode(seq, enc):
  values = [(to_bytes(k, enc), to_bytes(v, enc))
       for k, vs in seq
       for v in (vs if is_listlike(vs) else [vs])]
  return urlencode(values, doseq=1)

最终我们传递的{‘key': ‘value', ‘k': ‘v'}会被转化为'key=value&k=v' 并且默认的method是POST,再来看看Request

class Request(object_ref):

  def __init__(self, url, callback=None, method='GET', headers=None, body=None,
         cookies=None, meta=None, encoding='utf-8', priority=0,
         dont_filter=False, errback=None, flags=None):

    self._encoding = encoding # this one has to be set first
    self.method = str(method).upper()

默认的方法是GET,其实并不影响。仍然可以发送post请求。这让我想起来requests中的request用法,这是定义请求的基础方法。

def request(method, url, **kwargs):
  """Constructs and sends a :class:`Request <Request>`.

  :param method: method for the new :class:`Request` object.
  :param url: URL for the new :class:`Request` object.
  :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
  :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
  :param json: (optional) json data to send in the body of the :class:`Request`.
  :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
  :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
  :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
    ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
    or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
    defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
    to add for the file.
  :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
  :param timeout: (optional) How many seconds to wait for the server to send data
    before giving up, as a float, or a :ref:`(connect timeout, read
    timeout) <timeouts>` tuple.
  :type timeout: float or tuple
  :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
  :type allow_redirects: bool
  :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
  :param verify: (optional) Either a boolean, in which case it controls whether we verify
      the server's TLS certificate, or a string, in which case it must be a path
      to a CA bundle to use. Defaults to ``True``.
  :param stream: (optional) if ``False``, the response content will be immediately downloaded.
  :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
  :return: :class:`Response <Response>` object
  :rtype: requests.Response

  Usage::

   >>> import requests
   >>> req = requests.request('GET', 'http://httpbin.org/get')
   <Response [200]>
  """

  # By using the 'with' statement we are sure the session is closed, thus we
  # avoid leaving sockets open which can trigger a ResourceWarning in some
  # cases, and look like a memory leak in others.
  with sessions.Session() as session:
    return session.request(method=method, url=url, **kwargs)

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
Python中replace方法实例分析
Aug 20 Python
修改Python的pyxmpp2中的主循环使其提高性能
Apr 24 Python
Python 结巴分词实现关键词抽取分析
Oct 21 Python
机器学习python实战之决策树
Nov 01 Python
Python实现PS滤镜Fish lens图像扭曲效果示例
Jan 29 Python
Python之列表的插入&amp;替换修改方法
Jun 28 Python
Python操作qml对象过程详解
Sep 26 Python
pycharm sciview的图片另存为操作
Jun 01 Python
.img/.hdr格式转.nii格式的操作
Jul 01 Python
python GUI计算器的实现
Oct 09 Python
opencv-python图像配准(匹配和叠加)的实现
Jun 23 Python
Python使用Opencv打开笔记本电脑摄像头报错解问题及解决
Jun 21 Python
解决win64 Python下安装PIL出错问题(图解)
Sep 03 #Python
Python全局变量与局部变量区别及用法分析
Sep 03 #Python
Python wxPython库Core组件BoxSizer用法示例
Sep 03 #Python
深入浅析Python中list的复制及深拷贝与浅拷贝
Sep 03 #Python
Python wxPython库使用wx.ListBox创建列表框示例
Sep 03 #Python
Python wxPython库消息对话框MessageDialog用法示例
Sep 03 #Python
Python中关键字global和nonlocal的区别详解
Sep 03 #Python
You might like
在WAMP环境下搭建ZendDebugger php调试工具的方法
2011/07/18 PHP
php实现的pdo公共类定义与用法示例
2017/07/19 PHP
php自定义排序uasort函数示例【二维数组按指定键值排序】
2019/06/19 PHP
激活 ActiveX 控件
2006/10/09 Javascript
Extjs学习笔记之一 初识Extjs之MessageBox
2010/01/07 Javascript
前端开发的开始---基于面向对象的Ajax类
2010/09/17 Javascript
Extjs4.0设置Ext.data.Store传参的请求方式(默认为GET)
2013/04/02 Javascript
jqGrid用法汇总(全经典)
2016/06/28 Javascript
利用Vue.js框架实现火车票查询系统(附源码)
2017/02/27 Javascript
基于JavaScript实现百度搜索框效果
2020/06/28 Javascript
浅谈React Event实现原理
2018/09/20 Javascript
JavaScript数组、json对象、eval()函数用法实例分析
2019/02/21 Javascript
JavaScript数据结构与算法之检索算法实例分析【顺序查找、最大最小值、自组织查询】
2019/02/22 Javascript
CKEditor扩展插件:自动排版功能autoformat插件实现方法详解
2020/02/06 Javascript
Mac OS X10.9安装的Python2.7升级Python3.3步骤详解
2013/12/04 Python
python基础教程之简单入门说明(变量和控制语言使用方法)
2014/03/25 Python
python处理文本文件实现生成指定格式文件的方法
2014/07/31 Python
Python中使用装饰器时需要注意的一些问题
2015/05/11 Python
Python语言描述KNN算法与Kd树
2017/12/13 Python
利用pyinstaller打包exe文件的基本教程
2019/05/02 Python
django表单的Widgets使用详解
2019/07/22 Python
使用Python调取任意数字资产钱包余额功能
2019/08/15 Python
Python实现自动访问网页的例子
2020/02/21 Python
Jmeter调用Python脚本实现参数互相传递的实现
2021/01/22 Python
Python 求向量的余弦值操作
2021/03/04 Python
css3 中实现炫酷的loading效果
2019/04/26 HTML / CSS
canvas之万花筒效果的简单实现(推荐)
2016/08/16 HTML / CSS
浅谈HTML5新增和废弃的标签
2019/04/28 HTML / CSS
英国女性时尚精品店:THE DRESSING ROOM
2018/05/23 全球购物
机电一体化大学生求职信
2013/11/08 职场文书
村官工作鉴定评语
2014/01/27 职场文书
高中生的自我评价
2014/03/04 职场文书
大学运动会通讯稿
2015/07/18 职场文书
十一月早安语录:把心放轻,人生就是一朵自在的云
2019/11/04 职场文书
Django 实现jwt认证的示例
2021/04/30 Python
如何使用SQL Server语句创建表
2022/04/12 SQL Server