python使用scrapy发送post请求的坑


Posted in Python onSeptember 04, 2018

使用requests发送post请求

先来看看使用requests来发送post请求是多少好用,发送请求

Requests 简便的 API 意味着所有 HTTP 请求类型都是显而易见的。例如,你可以这样发送一个 HTTP POST 请求:

>>>r = requests.post('http://httpbin.org/post', data = {'key':'value'})

使用data可以传递字典作为参数,同时也可以传递元祖

>>>payload = (('key1', 'value1'), ('key1', 'value2'))
>>>r = requests.post('http://httpbin.org/post', data=payload)
>>>print(r.text)
{
 ...
 "form": {
  "key1": [
   "value1",
   "value2"
  ]
 },
 ...
}

传递json是这样

>>>import json

>>>url = 'https://api.github.com/some/endpoint'
>>>payload = {'some': 'data'}

>>>r = requests.post(url, data=json.dumps(payload))

2.4.2 版的新加功能:

>>>url = 'https://api.github.com/some/endpoint'
>>>payload = {'some': 'data'}

>>>r = requests.post(url, json=payload)

也就是说,你不需要对参数做什么变化,只需要关注使用data=还是json=,其余的requests都已经帮你做好了。

使用scrapy发送post请求

通过源码可知scrapy默认发送的get请求,当我们需要发送携带参数的请求或登录时,是需要post、请求的,以下面为例

from scrapy.spider import CrawlSpider
from scrapy.selector import Selector
import scrapy
import json
class LaGou(CrawlSpider):
  name = 'myspider'
  def start_requests(self):
    yield scrapy.FormRequest(
      url='https://www.******.com/jobs/positionAjax.json?city=%E5%B9%BF%E5%B7%9E&needAddtionalResult=false',
      formdata={
        'first': 'true',#这里不能给bool类型的True,requests模块中可以
        'pn': '1',#这里不能给int类型的1,requests模块中可以
        'kd': 'python'
      },这里的formdata相当于requ模块中的data,key和value只能是键值对形式
      callback=self.parse
    )
  def parse(self, response):
    datas=json.loads(response.body.decode())['content']['positionResult']['result']
    for data in datas:
      print(data['companyFullName'] + str(data['positionId']))

官方推荐的 Using FormRequest to send data via HTTP POST

return [FormRequest(url="http://www.example.com/post/action",
          formdata={'name': 'John Doe', 'age': '27'},
          callback=self.after_post)]

这里使用的是FormRequest,并使用formdata传递参数,看到这里也是一个字典。

但是,超级坑的一点来了,今天折腾了一下午,使用这种方法发送请求,怎么发都会出问题,返回的数据一直都不是我想要的

return scrapy.FormRequest(url, formdata=(payload))

在网上找了很久,最终找到一种方法,使用scrapy.Request发送请求,就可以正常的获取数据。

return scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},)

参考:Send Post Request in Scrapy

my_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST', 
             body=json.dumps(my_data), 
             headers={'Content-Type':'application/json'} )

FormRequest 与 Request 区别

在文档中,几乎看不到差别,

The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) ? is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.

说FormRequest新增加了一个参数formdata,接受包含表单数据的字典或者可迭代的元组,并将其转化为请求的body。并且FormRequest是继承Request的

class FormRequest(Request):

  def __init__(self, *args, **kwargs):
    formdata = kwargs.pop('formdata', None)
    if formdata and kwargs.get('method') is None:
      kwargs['method'] = 'POST'

    super(FormRequest, self).__init__(*args, **kwargs)

    if formdata:
      items = formdata.items() if isinstance(formdata, dict) else formdata
      querystr = _urlencode(items, self.encoding)
      if self.method == 'POST':
        self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded')
        self._set_body(querystr)
      else:
        self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr)
      ###


def _urlencode(seq, enc):
  values = [(to_bytes(k, enc), to_bytes(v, enc))
       for k, vs in seq
       for v in (vs if is_listlike(vs) else [vs])]
  return urlencode(values, doseq=1)

最终我们传递的{‘key': ‘value', ‘k': ‘v'}会被转化为'key=value&k=v' 并且默认的method是POST,再来看看Request

class Request(object_ref):

  def __init__(self, url, callback=None, method='GET', headers=None, body=None,
         cookies=None, meta=None, encoding='utf-8', priority=0,
         dont_filter=False, errback=None, flags=None):

    self._encoding = encoding # this one has to be set first
    self.method = str(method).upper()

默认的方法是GET,其实并不影响。仍然可以发送post请求。这让我想起来requests中的request用法,这是定义请求的基础方法。

def request(method, url, **kwargs):
  """Constructs and sends a :class:`Request <Request>`.

  :param method: method for the new :class:`Request` object.
  :param url: URL for the new :class:`Request` object.
  :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
  :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
  :param json: (optional) json data to send in the body of the :class:`Request`.
  :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
  :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
  :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
    ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
    or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
    defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
    to add for the file.
  :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
  :param timeout: (optional) How many seconds to wait for the server to send data
    before giving up, as a float, or a :ref:`(connect timeout, read
    timeout) <timeouts>` tuple.
  :type timeout: float or tuple
  :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
  :type allow_redirects: bool
  :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
  :param verify: (optional) Either a boolean, in which case it controls whether we verify
      the server's TLS certificate, or a string, in which case it must be a path
      to a CA bundle to use. Defaults to ``True``.
  :param stream: (optional) if ``False``, the response content will be immediately downloaded.
  :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
  :return: :class:`Response <Response>` object
  :rtype: requests.Response

  Usage::

   >>> import requests
   >>> req = requests.request('GET', 'http://httpbin.org/get')
   <Response [200]>
  """

  # By using the 'with' statement we are sure the session is closed, thus we
  # avoid leaving sockets open which can trigger a ResourceWarning in some
  # cases, and look like a memory leak in others.
  with sessions.Session() as session:
    return session.request(method=method, url=url, **kwargs)

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
Python中实现参数类型检查的简单方法
Apr 21 Python
Python实现Linux命令xxd -i功能
Mar 06 Python
Python遍历文件夹和读写文件的实现代码
Aug 28 Python
Python 中的with关键字使用详解
Sep 11 Python
python解决汉字编码问题:Unicode Decode Error
Jan 19 Python
Python入门_浅谈逻辑判断与运算符
May 16 Python
基于Django模板中的数字自增(详解)
Sep 05 Python
python中利用Future对象回调别的函数示例代码
Sep 07 Python
Python中numpy模块常见用法demo实例小结
Mar 16 Python
python实现根据文件格式分类
Oct 31 Python
PyTorch 对应点相乘、矩阵相乘实例
Dec 27 Python
基于opencv实现简单画板功能
Aug 02 Python
解决win64 Python下安装PIL出错问题(图解)
Sep 03 #Python
Python全局变量与局部变量区别及用法分析
Sep 03 #Python
Python wxPython库Core组件BoxSizer用法示例
Sep 03 #Python
深入浅析Python中list的复制及深拷贝与浅拷贝
Sep 03 #Python
Python wxPython库使用wx.ListBox创建列表框示例
Sep 03 #Python
Python wxPython库消息对话框MessageDialog用法示例
Sep 03 #Python
Python中关键字global和nonlocal的区别详解
Sep 03 #Python
You might like
盘点被央视点名过的日本动画电影 一部比一部强
2020/03/08 日漫
php通过字符串调用函数示例
2014/03/02 PHP
php获取文件夹路径内的图片以及分页显示示例
2014/03/11 PHP
PHP中类属性与类静态变量的访问方法示例
2016/07/13 PHP
Thinkphp事务操作实例(推荐)
2017/04/01 PHP
将CKfinder整合进CKEditor3.0的新方法
2010/01/10 Javascript
JavaScript初学者需要了解10个小技巧
2010/08/25 Javascript
Javascript this 的一些学习总结
2012/08/02 Javascript
PHP abstract与interface之间的区别
2013/11/11 Javascript
js中的caller和callee属性介绍和例子
2014/06/07 Javascript
自定义百度分享的分享按钮
2015/03/18 Javascript
JavaScript调用浏览器打印功能实例分析
2015/07/17 Javascript
有关json_decode乱码及NULL的问题
2015/10/13 Javascript
JavaScript数据推送Comet技术详解
2016/04/07 Javascript
基于jQuery实现顶部导航栏功能
2016/12/27 Javascript
nodeJS实现简单网页爬虫功能的实例(分享)
2017/06/08 NodeJs
js实现图片旋转 js滚动鼠标中间对图片放大缩小
2017/07/05 Javascript
Electron中实现大文件上传和断点续传功能
2018/10/28 Javascript
js中call()和apply()改变指针问题的讲解
2019/01/17 Javascript
浅析微信小程序modal弹窗关闭默认会执行cancel问题
2019/10/14 Javascript
详解微信小程序入门从这里出发(登录注册、开发工具、文件及结构介绍)
2020/07/21 Javascript
通过数据库向Django模型添加字段的示例
2015/07/21 Python
解决Python中字符串和数字拼接报错的方法
2016/10/23 Python
Python 利用scrapy爬虫通过短短50行代码下载整站短视频
2018/10/29 Python
快速排序的四种python实现(推荐)
2019/04/03 Python
基于Python实现ComicReaper漫画自动爬取脚本过程解析
2019/11/11 Python
tensorflow 初始化未初始化的变量实例
2020/02/06 Python
Python3基本输入与输出操作实例分析
2020/02/14 Python
使用 Python ssh 远程登陆服务器的最佳方案
2020/03/06 Python
Django修改app名称和数据表迁移方案实现
2020/09/17 Python
python中的对数log函数表示及用法
2020/12/09 Python
美国益智玩具购物网站:Fat Brain Toys
2017/11/03 全球购物
纪伊国屋泰国网上书店:Kinokuniya泰国
2017/12/24 全球购物
超市总经理岗位职责
2014/02/02 职场文书
2014物价局民主生活会对照检查材料思想汇报
2014/09/24 职场文书
经典祝酒词大全
2015/08/12 职场文书