Python requests上传文件实现步骤


Posted in Python onSeptember 15, 2020

官方文档:https://2.python-requests.org//en/master/

工作中涉及到一个功能,需要上传附件到一个接口,接口参数如下:

使用http post提交附件 multipart/form-data 格式,url : http://test.com/flow/upload,

字段列表:
md5:      //md5加密(随机值_当时时间戳)
filesize:  //文件大小
file:       //文件内容(须含文件名)
返回值:
{"success":true,"uploadName":"tmp.xml","uploadPath":"uploads\/201311\/758e875fb7c7a508feef6b5036119b9f"}

由于工作中主要用python,并且项目中已有使用requests库的地方,所以计划使用requests来实现,本来以为是很简单的一个小功能,结果花费了大量的时间,requests官方的例子只提到了上传文件,并不需要传额外的参数:

https://2.python-requests.org//en/master/user/quickstart/#post-a-multipart-encoded-file

>>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

>>> r = requests.post(url, files=files)
>>> r.text
{
 ...
 "files": {
  "file": "<censored...binary...data>"
 },
 ...
}

但是如果涉及到了参数的传递时,其实就要用到requests的两个参数:data、files,将要上传的文件传入files,将其他参数传入data,request库会将两者合并到一起做一个multi part,然后发送给服务器。

最终实现的代码是这样的:

with open(file_name) as f:
content = f.read()
request_data = {
  'md5':md5.md5('%d_%d' % (0, int(time.time()))).hexdigest(),
  'filesize':len(content),
}
files = {'file':(file_name, open(file_name, 'rb'))}
MyLogger().getlogger().info('url:%s' % (request_url))
resp = requests.post(request_url, data=request_data, files=files)

虽然最终代码可能看起来很简单,但是其实我费了好大功夫才确认这样是OK的,中间还翻了requests的源码,下面记录一下翻阅源码的过程:

首先,找到post方法的实现,在requests.api.py中:

def post(url, data=None, json=None, **kwargs):
  r"""Sends a POST request.

  :param url: URL for the new :class:`Request` object.
  :param data: (optional) Dictionary, list of tuples, bytes, or file-like
    object to send in the body of the :class:`Request`.
  :param json: (optional) json data to send in the body of the :class:`Request`.
  :param \*\*kwargs: Optional arguments that ``request`` takes.
  :return: :class:`Response <Response>` object
  :rtype: requests.Response
  """

  return request('post', url, data=data, json=json, **kwargs)

这里可以看到它调用了request方法,咱们继续跟进request方法,在requests.api.py中:

def request(method, url, **kwargs):
  """Constructs and sends a :class:`Request <Request>`.

  :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
  :param url: URL for the new :class:`Request` object.
  :param params: (optional) Dictionary, list of tuples or bytes to send
    in the query string for the :class:`Request`.
  :param data: (optional) Dictionary, list of tuples, bytes, or file-like
    object to send in the body of the :class:`Request`.
  :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
  :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
  :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
  :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
    ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
    or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
    defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
    to add for the file.
  :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
  :param timeout: (optional) How many seconds to wait for the server to send data
    before giving up, as a float, or a :ref:`(connect timeout, read
    timeout) <timeouts>` tuple.
  :type timeout: float or tuple
  :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
  :type allow_redirects: bool
  :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
  :param verify: (optional) Either a boolean, in which case it controls whether we verify
      the server's TLS certificate, or a string, in which case it must be a path
      to a CA bundle to use. Defaults to ``True``.
  :param stream: (optional) if ``False``, the response content will be immediately downloaded.
  :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
  :return: :class:`Response <Response>` object
  :rtype: requests.Response

  Usage::

   >>> import requests
   >>> req = requests.request('GET', 'https://httpbin.org/get')
   <Response [200]>
  """

  # By using the 'with' statement we are sure the session is closed, thus we
  # avoid leaving sockets open which can trigger a ResourceWarning in some
  # cases, and look like a memory leak in others.
  with sessions.Session() as session:
    return session.request(method=method, url=url, **kwargs)

这个方法的注释比较多,从注释里其实已经可以看到files参数使用传送文件,但是还是无法知道当需要同时传递参数和文件时该如何处理,继续跟进session.request方法,在requests.session.py中:

def request(self, method, url,
      params=None, data=None, headers=None, cookies=None, files=None,
      auth=None, timeout=None, allow_redirects=True, proxies=None,
      hooks=None, stream=None, verify=None, cert=None, json=None):
    """Constructs a :class:`Request <Request>`, prepares it and sends it.
    Returns :class:`Response <Response>` object.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query
      string for the :class:`Request`.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
      object to send in the body of the :class:`Request`.
    :param json: (optional) json to send in the body of the
      :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the
      :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the
      :class:`Request`.
    :param files: (optional) Dictionary of ``'filename': file-like-objects``
      for multipart encoding upload.
    :param auth: (optional) Auth tuple or callable to enable
      Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How long to wait for the server to send
      data before giving up, as a float, or a :ref:`(connect timeout,
      read timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Set to True by default.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol or protocol and
      hostname to the URL of the proxy.
    :param stream: (optional) whether to immediately download the response
      content. Defaults to ``False``.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
      the server's TLS certificate, or a string, in which case it must be a path
      to a CA bundle to use. Defaults to ``True``.
    :param cert: (optional) if String, path to ssl client cert file (.pem).
      If Tuple, ('cert', 'key') pair.
    :rtype: requests.Response
    """
    # Create the Request.
    req = Request(
      method=method.upper(),
      url=url,
      headers=headers,
      files=files,
      data=data or {},
      json=json,
      params=params or {},
      auth=auth,
      cookies=cookies,
      hooks=hooks,
    )
    prep = self.prepare_request(req)

    proxies = proxies or {}

    settings = self.merge_environment_settings(
      prep.url, proxies, stream, verify, cert
    )

    # Send the request.
    send_kwargs = {
      'timeout': timeout,
      'allow_redirects': allow_redirects,
    }
    send_kwargs.update(settings)
    resp = self.send(prep, **send_kwargs)

    return resp

先大概看一下这个方法,先是准备request,最后一步是调用send,推测应该是发送请求了,所以我们需要跟进到prepare_request方法中,在requests.session.py中:

def prepare_request(self, request):
    """Constructs a :class:`PreparedRequest <PreparedRequest>` for
    transmission and returns it. The :class:`PreparedRequest` has settings
    merged from the :class:`Request <Request>` instance and those of the
    :class:`Session`.

    :param request: :class:`Request` instance to prepare with this
      session's settings.
    :rtype: requests.PreparedRequest
    """
    cookies = request.cookies or {}

    # Bootstrap CookieJar.
    if not isinstance(cookies, cookielib.CookieJar):
      cookies = cookiejar_from_dict(cookies)

    # Merge with session cookies
    merged_cookies = merge_cookies(
      merge_cookies(RequestsCookieJar(), self.cookies), cookies)

    # Set environment's basic authentication if not explicitly set.
    auth = request.auth
    if self.trust_env and not auth and not self.auth:
      auth = get_netrc_auth(request.url)

    p = PreparedRequest()
    p.prepare(
      method=request.method.upper(),
      url=request.url,
      files=request.files,
      data=request.data,
      json=request.json,
      headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),
      params=merge_setting(request.params, self.params),
      auth=merge_setting(auth, self.auth),
      cookies=merged_cookies,
      hooks=merge_hooks(request.hooks, self.hooks),
    )
    return p

在prepare_request中,生成了一个PreparedRequest对象,并调用其prepare方法,跟进到prepare方法中,在requests.models.py中:

def prepare(self,
      method=None, url=None, headers=None, files=None, data=None,
      params=None, auth=None, cookies=None, hooks=None, json=None):
    """Prepares the entire request with the given parameters."""

    self.prepare_method(method)
    self.prepare_url(url, params)
    self.prepare_headers(headers)
    self.prepare_cookies(cookies)
    self.prepare_body(data, files, json)
    self.prepare_auth(auth, url)

    # Note that prepare_auth must be last to enable authentication schemes
    # such as OAuth to work on a fully prepared request.

    # This MUST go after prepare_auth. Authenticators could add a hook
    self.prepare_hooks(hooks)

这里调用许多prepare_xx方法,这里我们只关心处理了data、files、json的方法,跟进到prepare_body中,在requests.models.py中:

def prepare_body(self, data, files, json=None):
    """Prepares the given HTTP body data."""

    # Check if file, fo, generator, iterator.
    # If not, run through normal process.

    # Nottin' on you.
    body = None
    content_type = None

    if not data and json is not None:
      # urllib3 requires a bytes-like body. Python 2's json.dumps
      # provides this natively, but Python 3 gives a Unicode string.
      content_type = 'application/json'
      body = complexjson.dumps(json)
      if not isinstance(body, bytes):
        body = body.encode('utf-8')

    is_stream = all([
      hasattr(data, '__iter__'),
      not isinstance(data, (basestring, list, tuple, Mapping))
    ])

    try:
      length = super_len(data)
    except (TypeError, AttributeError, UnsupportedOperation):
      length = None

    if is_stream:
      body = data

      if getattr(body, 'tell', None) is not None:
        # Record the current file position before reading.
        # This will allow us to rewind a file in the event
        # of a redirect.
        try:
          self._body_position = body.tell()
        except (IOError, OSError):
          # This differentiates from None, allowing us to catch
          # a failed `tell()` later when trying to rewind the body
          self._body_position = object()

      if files:
        raise NotImplementedError('Streamed bodies and files are mutually exclusive.')

      if length:
        self.headers['Content-Length'] = builtin_str(length)
      else:
        self.headers['Transfer-Encoding'] = 'chunked'
    else:
      # Multi-part file uploads.
      if files:
        (body, content_type) = self._encode_files(files, data)
      else:
        if data:
          body = self._encode_params(data)
          if isinstance(data, basestring) or hasattr(data, 'read'):
            content_type = None
          else:
            content_type = 'application/x-www-form-urlencoded'

      self.prepare_content_length(body)

      # Add content-type if it wasn't explicitly provided.
      if content_type and ('content-type' not in self.headers):
        self.headers['Content-Type'] = content_type

    self.body = body

这个函数比较长,需要重点关注L52,这里调用了_encode_files方法,我们跟进这个方法:

def _encode_files(files, data):
    """Build the body for a multipart/form-data request.

    Will successfully encode files when passed as a dict or a list of
    tuples. Order is retained if data is a list of tuples but arbitrary
    if parameters are supplied as a dict.
    The tuples may be 2-tuples (filename, fileobj), 3-tuples (filename, fileobj, contentype)
    or 4-tuples (filename, fileobj, contentype, custom_headers).
    """
    if (not files):
      raise ValueError("Files must be provided.")
    elif isinstance(data, basestring):
      raise ValueError("Data must not be a string.")

    new_fields = []
    fields = to_key_val_list(data or {})
    files = to_key_val_list(files or {})

    for field, val in fields:
      if isinstance(val, basestring) or not hasattr(val, '__iter__'):
        val = [val]
      for v in val:
        if v is not None:
          # Don't call str() on bytestrings: in Py3 it all goes wrong.
          if not isinstance(v, bytes):
            v = str(v)

          new_fields.append(
            (field.decode('utf-8') if isinstance(field, bytes) else field,
             v.encode('utf-8') if isinstance(v, str) else v))

    for (k, v) in files:
      # support for explicit filename
      ft = None
      fh = None
      if isinstance(v, (tuple, list)):
        if len(v) == 2:
          fn, fp = v
        elif len(v) == 3:
          fn, fp, ft = v
        else:
          fn, fp, ft, fh = v
      else:
        fn = guess_filename(v) or k
        fp = v

      if isinstance(fp, (str, bytes, bytearray)):
        fdata = fp
      elif hasattr(fp, 'read'):
        fdata = fp.read()
      elif fp is None:
        continue
      else:
        fdata = fp

      rf = RequestField(name=k, data=fdata, filename=fn, headers=fh)
      rf.make_multipart(content_type=ft)
      new_fields.append(rf)

    body, content_type = encode_multipart_formdata(new_fields)

    return body, content_type

OK,到此为止,仔细阅读完这个段代码,就可以搞明白requests.post方法传入的data、files两个参数的作用了,其实requests在这里把它俩合并在一起了,作为post的body。

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python实现将内容分行输出
Nov 05 Python
Python自定义线程类简单示例
Mar 23 Python
使用python爬取微博数据打造一颗“心”
Jun 28 Python
Python SELENIUM上传文件或图片实现过程
Oct 28 Python
Python 中如何实现参数化测试的方法示例
Dec 10 Python
基于h5py的使用及数据封装代码
Dec 26 Python
详解基于Jupyter notebooks采用sklearn库实现多元回归方程编程
Mar 25 Python
利用matplotlib为图片上添加触发事件进行交互
Apr 23 Python
Django模板获取field的verbose_name实例
May 19 Python
详解Tensorflow不同版本要求与CUDA及CUDNN版本对应关系
Aug 04 Python
Python如何爬取b站热门视频并导入Excel
Aug 10 Python
深入理解python协程
Jun 15 Python
python -v 报错问题的解决方法
Sep 15 #Python
基于Python正确读取资源文件
Sep 14 #Python
Django框架安装及项目创建过程解析
Sep 14 #Python
通过代码实例了解Python sys模块
Sep 14 #Python
基于python实现简单C/S模式代码实例
Sep 14 #Python
Elasticsearch py客户端库安装及使用方法解析
Sep 14 #Python
基于python实现简单网页服务器代码实例
Sep 14 #Python
You might like
PHP计算一年多少个星期和每周的开始和结束日期
2014/07/01 PHP
php通过array_unshift函数添加多个变量到数组前端的方法
2015/03/18 PHP
Prototype 1.5.0_rc1 及 Prototype 1.5.0 Pre0小抄本
2006/09/22 Javascript
取得一定长度的内容,处理中文
2006/12/20 Javascript
JavaScript高级程序设计 学习笔记 js高级技巧
2011/09/20 Javascript
jQuery CSS()方法改变现有的CSS样式表
2014/09/09 Javascript
js判断登陆用户名及密码是否为空的简单实例
2016/05/16 Javascript
JS实现焦点图轮播效果的方法详解
2016/12/19 Javascript
深入理解在JS中通过四种设置事件处理程序的方法
2017/03/02 Javascript
angular过滤器实现排序功能
2017/06/27 Javascript
浅谈vue项目可以从哪些方面进行优化
2018/05/05 Javascript
Vue cli构建及项目打包以及出现的问题解决
2018/08/27 Javascript
layui 实现表格某一列显示图标
2019/09/19 Javascript
vue 更改连接后台的api示例
2019/11/11 Javascript
jQuery cookie的公共方法封装和使用示例
2020/06/01 jQuery
JS制作简易计算器的实例代码
2020/07/04 Javascript
js实现带积分弹球小游戏
2020/07/21 Javascript
python在windows命令行下输出彩色文字的方法
2015/03/19 Python
利用Python自动监控网站并发送邮件告警的方法
2016/08/24 Python
python实现对指定输入的字符串逆序输出的6种方法
2018/04/26 Python
Python3.6日志Logging模块简单用法示例
2018/06/14 Python
python爬虫租房信息在地图上显示的方法
2019/05/13 Python
Python openpyxl读取单元格字体颜色过程解析
2019/09/03 Python
python global关键字的用法详解
2019/09/05 Python
详解HTML5中ol标签的用法
2015/09/08 HTML / CSS
企划经理的岗位职责
2013/11/17 职场文书
材料会计岗位职责
2014/03/06 职场文书
公务员平时考核实施方案
2014/03/11 职场文书
户外活动总结范文
2014/04/30 职场文书
教师群众路线心得体会
2014/11/04 职场文书
2015新员工试用期工作总结
2014/12/12 职场文书
工程部岗位职责
2015/02/10 职场文书
辩护词范文大全
2015/05/21 职场文书
董事会决议范本
2015/07/01 职场文书
Golang表示枚举类型的详细讲解
2021/09/04 Golang
win10清理dns缓存
2022/04/19 数码科技