Python requests上传文件实现步骤


Posted in Python onSeptember 15, 2020

官方文档:https://2.python-requests.org//en/master/

工作中涉及到一个功能,需要上传附件到一个接口,接口参数如下:

使用http post提交附件 multipart/form-data 格式,url : http://test.com/flow/upload,

字段列表:
md5:      //md5加密(随机值_当时时间戳)
filesize:  //文件大小
file:       //文件内容(须含文件名)
返回值:
{"success":true,"uploadName":"tmp.xml","uploadPath":"uploads\/201311\/758e875fb7c7a508feef6b5036119b9f"}

由于工作中主要用python,并且项目中已有使用requests库的地方,所以计划使用requests来实现,本来以为是很简单的一个小功能,结果花费了大量的时间,requests官方的例子只提到了上传文件,并不需要传额外的参数:

https://2.python-requests.org//en/master/user/quickstart/#post-a-multipart-encoded-file

>>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

>>> r = requests.post(url, files=files)
>>> r.text
{
 ...
 "files": {
  "file": "<censored...binary...data>"
 },
 ...
}

但是如果涉及到了参数的传递时,其实就要用到requests的两个参数:data、files,将要上传的文件传入files,将其他参数传入data,request库会将两者合并到一起做一个multi part,然后发送给服务器。

最终实现的代码是这样的:

with open(file_name) as f:
content = f.read()
request_data = {
  'md5':md5.md5('%d_%d' % (0, int(time.time()))).hexdigest(),
  'filesize':len(content),
}
files = {'file':(file_name, open(file_name, 'rb'))}
MyLogger().getlogger().info('url:%s' % (request_url))
resp = requests.post(request_url, data=request_data, files=files)

虽然最终代码可能看起来很简单,但是其实我费了好大功夫才确认这样是OK的,中间还翻了requests的源码,下面记录一下翻阅源码的过程:

首先,找到post方法的实现,在requests.api.py中:

def post(url, data=None, json=None, **kwargs):
  r"""Sends a POST request.

  :param url: URL for the new :class:`Request` object.
  :param data: (optional) Dictionary, list of tuples, bytes, or file-like
    object to send in the body of the :class:`Request`.
  :param json: (optional) json data to send in the body of the :class:`Request`.
  :param \*\*kwargs: Optional arguments that ``request`` takes.
  :return: :class:`Response <Response>` object
  :rtype: requests.Response
  """

  return request('post', url, data=data, json=json, **kwargs)

这里可以看到它调用了request方法,咱们继续跟进request方法,在requests.api.py中:

def request(method, url, **kwargs):
  """Constructs and sends a :class:`Request <Request>`.

  :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
  :param url: URL for the new :class:`Request` object.
  :param params: (optional) Dictionary, list of tuples or bytes to send
    in the query string for the :class:`Request`.
  :param data: (optional) Dictionary, list of tuples, bytes, or file-like
    object to send in the body of the :class:`Request`.
  :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
  :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
  :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
  :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
    ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
    or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
    defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
    to add for the file.
  :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
  :param timeout: (optional) How many seconds to wait for the server to send data
    before giving up, as a float, or a :ref:`(connect timeout, read
    timeout) <timeouts>` tuple.
  :type timeout: float or tuple
  :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
  :type allow_redirects: bool
  :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
  :param verify: (optional) Either a boolean, in which case it controls whether we verify
      the server's TLS certificate, or a string, in which case it must be a path
      to a CA bundle to use. Defaults to ``True``.
  :param stream: (optional) if ``False``, the response content will be immediately downloaded.
  :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
  :return: :class:`Response <Response>` object
  :rtype: requests.Response

  Usage::

   >>> import requests
   >>> req = requests.request('GET', 'https://httpbin.org/get')
   <Response [200]>
  """

  # By using the 'with' statement we are sure the session is closed, thus we
  # avoid leaving sockets open which can trigger a ResourceWarning in some
  # cases, and look like a memory leak in others.
  with sessions.Session() as session:
    return session.request(method=method, url=url, **kwargs)

这个方法的注释比较多,从注释里其实已经可以看到files参数使用传送文件,但是还是无法知道当需要同时传递参数和文件时该如何处理,继续跟进session.request方法,在requests.session.py中:

def request(self, method, url,
      params=None, data=None, headers=None, cookies=None, files=None,
      auth=None, timeout=None, allow_redirects=True, proxies=None,
      hooks=None, stream=None, verify=None, cert=None, json=None):
    """Constructs a :class:`Request <Request>`, prepares it and sends it.
    Returns :class:`Response <Response>` object.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query
      string for the :class:`Request`.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
      object to send in the body of the :class:`Request`.
    :param json: (optional) json to send in the body of the
      :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the
      :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the
      :class:`Request`.
    :param files: (optional) Dictionary of ``'filename': file-like-objects``
      for multipart encoding upload.
    :param auth: (optional) Auth tuple or callable to enable
      Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How long to wait for the server to send
      data before giving up, as a float, or a :ref:`(connect timeout,
      read timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Set to True by default.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol or protocol and
      hostname to the URL of the proxy.
    :param stream: (optional) whether to immediately download the response
      content. Defaults to ``False``.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
      the server's TLS certificate, or a string, in which case it must be a path
      to a CA bundle to use. Defaults to ``True``.
    :param cert: (optional) if String, path to ssl client cert file (.pem).
      If Tuple, ('cert', 'key') pair.
    :rtype: requests.Response
    """
    # Create the Request.
    req = Request(
      method=method.upper(),
      url=url,
      headers=headers,
      files=files,
      data=data or {},
      json=json,
      params=params or {},
      auth=auth,
      cookies=cookies,
      hooks=hooks,
    )
    prep = self.prepare_request(req)

    proxies = proxies or {}

    settings = self.merge_environment_settings(
      prep.url, proxies, stream, verify, cert
    )

    # Send the request.
    send_kwargs = {
      'timeout': timeout,
      'allow_redirects': allow_redirects,
    }
    send_kwargs.update(settings)
    resp = self.send(prep, **send_kwargs)

    return resp

先大概看一下这个方法,先是准备request,最后一步是调用send,推测应该是发送请求了,所以我们需要跟进到prepare_request方法中,在requests.session.py中:

def prepare_request(self, request):
    """Constructs a :class:`PreparedRequest <PreparedRequest>` for
    transmission and returns it. The :class:`PreparedRequest` has settings
    merged from the :class:`Request <Request>` instance and those of the
    :class:`Session`.

    :param request: :class:`Request` instance to prepare with this
      session's settings.
    :rtype: requests.PreparedRequest
    """
    cookies = request.cookies or {}

    # Bootstrap CookieJar.
    if not isinstance(cookies, cookielib.CookieJar):
      cookies = cookiejar_from_dict(cookies)

    # Merge with session cookies
    merged_cookies = merge_cookies(
      merge_cookies(RequestsCookieJar(), self.cookies), cookies)

    # Set environment's basic authentication if not explicitly set.
    auth = request.auth
    if self.trust_env and not auth and not self.auth:
      auth = get_netrc_auth(request.url)

    p = PreparedRequest()
    p.prepare(
      method=request.method.upper(),
      url=request.url,
      files=request.files,
      data=request.data,
      json=request.json,
      headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),
      params=merge_setting(request.params, self.params),
      auth=merge_setting(auth, self.auth),
      cookies=merged_cookies,
      hooks=merge_hooks(request.hooks, self.hooks),
    )
    return p

在prepare_request中,生成了一个PreparedRequest对象,并调用其prepare方法,跟进到prepare方法中,在requests.models.py中:

def prepare(self,
      method=None, url=None, headers=None, files=None, data=None,
      params=None, auth=None, cookies=None, hooks=None, json=None):
    """Prepares the entire request with the given parameters."""

    self.prepare_method(method)
    self.prepare_url(url, params)
    self.prepare_headers(headers)
    self.prepare_cookies(cookies)
    self.prepare_body(data, files, json)
    self.prepare_auth(auth, url)

    # Note that prepare_auth must be last to enable authentication schemes
    # such as OAuth to work on a fully prepared request.

    # This MUST go after prepare_auth. Authenticators could add a hook
    self.prepare_hooks(hooks)

这里调用许多prepare_xx方法,这里我们只关心处理了data、files、json的方法,跟进到prepare_body中,在requests.models.py中:

def prepare_body(self, data, files, json=None):
    """Prepares the given HTTP body data."""

    # Check if file, fo, generator, iterator.
    # If not, run through normal process.

    # Nottin' on you.
    body = None
    content_type = None

    if not data and json is not None:
      # urllib3 requires a bytes-like body. Python 2's json.dumps
      # provides this natively, but Python 3 gives a Unicode string.
      content_type = 'application/json'
      body = complexjson.dumps(json)
      if not isinstance(body, bytes):
        body = body.encode('utf-8')

    is_stream = all([
      hasattr(data, '__iter__'),
      not isinstance(data, (basestring, list, tuple, Mapping))
    ])

    try:
      length = super_len(data)
    except (TypeError, AttributeError, UnsupportedOperation):
      length = None

    if is_stream:
      body = data

      if getattr(body, 'tell', None) is not None:
        # Record the current file position before reading.
        # This will allow us to rewind a file in the event
        # of a redirect.
        try:
          self._body_position = body.tell()
        except (IOError, OSError):
          # This differentiates from None, allowing us to catch
          # a failed `tell()` later when trying to rewind the body
          self._body_position = object()

      if files:
        raise NotImplementedError('Streamed bodies and files are mutually exclusive.')

      if length:
        self.headers['Content-Length'] = builtin_str(length)
      else:
        self.headers['Transfer-Encoding'] = 'chunked'
    else:
      # Multi-part file uploads.
      if files:
        (body, content_type) = self._encode_files(files, data)
      else:
        if data:
          body = self._encode_params(data)
          if isinstance(data, basestring) or hasattr(data, 'read'):
            content_type = None
          else:
            content_type = 'application/x-www-form-urlencoded'

      self.prepare_content_length(body)

      # Add content-type if it wasn't explicitly provided.
      if content_type and ('content-type' not in self.headers):
        self.headers['Content-Type'] = content_type

    self.body = body

这个函数比较长,需要重点关注L52,这里调用了_encode_files方法,我们跟进这个方法:

def _encode_files(files, data):
    """Build the body for a multipart/form-data request.

    Will successfully encode files when passed as a dict or a list of
    tuples. Order is retained if data is a list of tuples but arbitrary
    if parameters are supplied as a dict.
    The tuples may be 2-tuples (filename, fileobj), 3-tuples (filename, fileobj, contentype)
    or 4-tuples (filename, fileobj, contentype, custom_headers).
    """
    if (not files):
      raise ValueError("Files must be provided.")
    elif isinstance(data, basestring):
      raise ValueError("Data must not be a string.")

    new_fields = []
    fields = to_key_val_list(data or {})
    files = to_key_val_list(files or {})

    for field, val in fields:
      if isinstance(val, basestring) or not hasattr(val, '__iter__'):
        val = [val]
      for v in val:
        if v is not None:
          # Don't call str() on bytestrings: in Py3 it all goes wrong.
          if not isinstance(v, bytes):
            v = str(v)

          new_fields.append(
            (field.decode('utf-8') if isinstance(field, bytes) else field,
             v.encode('utf-8') if isinstance(v, str) else v))

    for (k, v) in files:
      # support for explicit filename
      ft = None
      fh = None
      if isinstance(v, (tuple, list)):
        if len(v) == 2:
          fn, fp = v
        elif len(v) == 3:
          fn, fp, ft = v
        else:
          fn, fp, ft, fh = v
      else:
        fn = guess_filename(v) or k
        fp = v

      if isinstance(fp, (str, bytes, bytearray)):
        fdata = fp
      elif hasattr(fp, 'read'):
        fdata = fp.read()
      elif fp is None:
        continue
      else:
        fdata = fp

      rf = RequestField(name=k, data=fdata, filename=fn, headers=fh)
      rf.make_multipart(content_type=ft)
      new_fields.append(rf)

    body, content_type = encode_multipart_formdata(new_fields)

    return body, content_type

OK,到此为止,仔细阅读完这个段代码,就可以搞明白requests.post方法传入的data、files两个参数的作用了,其实requests在这里把它俩合并在一起了,作为post的body。

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
Python常用正则表达式符号浅析
Aug 13 Python
python基于multiprocessing的多进程创建方法
Jun 04 Python
python日志记录模块实例及改进
Feb 12 Python
Python找出最小的K个数实例代码
Jan 04 Python
Selenium 模拟浏览器动态加载页面的实现方法
May 16 Python
Django JWT Token RestfulAPI用户认证详解
Jan 23 Python
对python中if语句的真假判断实例详解
Feb 18 Python
基于Python的Post请求数据爬取的方法详解
Jun 14 Python
浅析Python与Mongodb数据库之间的操作方法
Jul 01 Python
pytorch动态网络以及权重共享实例
Jan 06 Python
python爬取豆瓣电影排行榜(requests)的示例代码
Feb 18 Python
python爬虫beautifulsoup库使用操作教程全解(python爬虫基础入门)
Feb 19 Python
python -v 报错问题的解决方法
Sep 15 #Python
基于Python正确读取资源文件
Sep 14 #Python
Django框架安装及项目创建过程解析
Sep 14 #Python
通过代码实例了解Python sys模块
Sep 14 #Python
基于python实现简单C/S模式代码实例
Sep 14 #Python
Elasticsearch py客户端库安装及使用方法解析
Sep 14 #Python
基于python实现简单网页服务器代码实例
Sep 14 #Python
You might like
这东西价格,可以买几台TECSUN S-2000
2021/03/02 无线电
php图片加中文水印实现代码分享
2012/10/31 PHP
php生成图片验证码的方法
2016/04/15 PHP
Zend Framework校验器Zend_Validate用法详解
2016/12/09 PHP
IE浏览器PNG图片透明效果代码
2008/09/02 Javascript
jQuery大于号(&gt;)选择器的作用解释
2015/01/13 Javascript
javascript模拟C#格式化字符串
2015/08/26 Javascript
解决js图片加载时出现404的问题
2020/11/30 Javascript
javascript实现的全国省市县无刷新多级关联菜单效果代码
2016/08/01 Javascript
正则表达式替换html元素属性的方法
2016/11/26 Javascript
利用jquery实现验证输入的是否是数字、小数,包含保留几位小数
2016/12/07 Javascript
JavaScript无操作后屏保功能的实现方法
2017/07/04 Javascript
使用canvas进行图像编辑的实例
2017/08/29 Javascript
Three.js利用Detector.js插件如何实现兼容性检测详解
2017/09/26 Javascript
详解node child_process模块学习笔记
2018/01/24 Javascript
javascript实现文件拖拽事件
2018/03/29 Javascript
js实现中文实时时钟
2020/01/15 Javascript
[01:23:24]DOTA2-DPC中国联赛 正赛 PSG.LGD vs Elephant BO3 第三场 2月7日
2021/03/11 DOTA
布同 统计英文单词的个数的python代码
2011/03/13 Python
简单理解Python中基于生成器的状态机
2015/04/13 Python
在Python的web框架中编写创建日志的程序的教程
2015/04/30 Python
Python实现爬虫从网络上下载文档的实例代码
2018/06/13 Python
把csv文件转化为数组及数组的切片方法
2018/07/04 Python
django+xadmin+djcelery实现后台管理定时任务
2018/08/14 Python
Python+OpenCV感兴趣区域ROI提取方法
2019/01/10 Python
TensorBoard 计算图的查看方式
2020/02/15 Python
使用python爬取抖音app视频的实例代码
2020/12/01 Python
canvas离屏技术与放大镜实现代码示例
2018/08/31 HTML / CSS
个人评价范文分享
2014/01/11 职场文书
企业宣传策划方案
2014/05/29 职场文书
移交协议书
2014/08/19 职场文书
乡党政领导班子群众路线教育实践活动个人对照检查材料
2014/09/20 职场文书
2014年法务工作总结
2014/12/11 职场文书
2015年安全生产月活动总结
2015/03/26 职场文书
学习雷锋主题班会
2015/08/14 职场文书
vue实现Toast组件轻提示
2022/04/10 Vue.js