编程 Python

Python利用命名空间解析XML文档

Posted in Python onAugust 10, 2020

问题

你想解析某个XML文档，文档中使用了XML命名空间。

解决方案

考虑下面这个使用了命名空间的文档：

<?xml version="1.0" encoding="utf-8"?>
<top>
  <author>David Beazley</author>
  <content>
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title>Hello World</title>
      </head>
      <body>
        <h1>Hello World!</h1>
      </body>
    </html>
  </content>
</top>

如果你解析这个文档并执行普通的查询，你会发现这个并不是那么容易，因为所有步骤都变得相当的繁琐。

>>> # Some queries that work
>>> doc.findtext('author')
'David Beazley'
>>> doc.find('content')
<Element 'content' at 0x100776ec0>
>>> # A query involving a namespace (doesn't work)
>>> doc.find('content/html')
>>> # Works if fully qualified
>>> doc.find('content/{http://www.w3.org/1999/xhtml}html')
<Element '{http://www.w3.org/1999/xhtml}html' at 0x1007767e0>
>>> # Doesn't work
>>> doc.findtext('content/{http://www.w3.org/1999/xhtml}html/head/title')
>>> # Fully qualified
>>> doc.findtext('content/{http://www.w3.org/1999/xhtml}html/'
... '{http://www.w3.org/1999/xhtml}head/{http://www.w3.org/1999/xhtml}title')
'Hello World'
>>>

你可以通过将命名空间处理逻辑包装为一个工具类来简化这个过程：

class XMLNamespaces:
  def __init__(self, **kwargs):
    self.namespaces = {}
    for name, uri in kwargs.items():
      self.register(name, uri)
  def register(self, name, uri):
    self.namespaces[name] = '{'+uri+'}'
  def __call__(self, path):
    return path.format_map(self.namespaces)

通过下面的方式使用这个类：

>>> ns = XMLNamespaces(html='http://www.w3.org/1999/xhtml')
>>> doc.find(ns('content/{html}html'))
<Element '{http://www.w3.org/1999/xhtml}html' at 0x1007767e0>
>>> doc.findtext(ns('content/{html}html/{html}head/{html}title'))
'Hello World'
>>>

讨论

解析含有命名空间的XML文档会比较繁琐。上面的 XMLNamespaces 仅仅是允许你使用缩略名代替完整的URI将其变得稍微简洁一点。

很不幸的是，在基本的 ElementTree 解析中没有任何途径获取命名空间的信息。但是，如果你使用 iterparse() 函数的话就可以获取更多关于命名空间处理范围的信息。例如：

>>> from xml.etree.ElementTree import iterparse
>>> for evt, elem in iterparse('ns2.xml', ('end', 'start-ns', 'end-ns')):
... print(evt, elem)
...
end <Element 'author' at 0x10110de10>
start-ns ('', 'http://www.w3.org/1999/xhtml')
end <Element '{http://www.w3.org/1999/xhtml}title' at 0x1011131b0>
end <Element '{http://www.w3.org/1999/xhtml}head' at 0x1011130a8>
end <Element '{http://www.w3.org/1999/xhtml}h1' at 0x101113310>
end <Element '{http://www.w3.org/1999/xhtml}body' at 0x101113260>
end <Element '{http://www.w3.org/1999/xhtml}html' at 0x10110df70>
end-ns None
end <Element 'content' at 0x10110de68>
end <Element 'top' at 0x10110dd60>
>>> elem # This is the topmost element
<Element 'top' at 0x10110dd60>
>>>

最后一点，如果你要处理的XML文本除了要使用到其他高级XML特性外，还要使用到命名空间，建议你最好是使用 lxml 函数库来代替 ElementTree 。例如，lxml 对利用DTD验证文档、更好的XPath支持和一些其他高级XML特性等都提供了更好的支持。这一小节其实只是教你如何让XML解析稍微简单一点。

以上就是Python利用命名空间解析XML文档的详细内容，更多关于Python命名空间解析XML文档的资料请关注三水点靠木其它相关文章！

Python利用命名空间解析XML文档

- Author -

David Beazley

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python中用startswith()函数判断字符串开头的教程

Apr 07 Python

人脸识别经典算法一特征脸方法（Eigenface）

Mar 13 Python

python 借助numpy保存数据为csv格式的实现方法

Jul 04 Python

python实现贪吃蛇小游戏

Mar 21 Python

Python 日期区间处理 (本周本月上周上月...)

Aug 08 Python

python 列表推导式使用详解

Aug 29 Python

python爬虫开发之urllib模块详细使用方法与实例全解

Mar 09 Python

Python中的Cookie模块如何使用

Jun 04 Python

Pycharm中使用git进行合作开发的教程详解

Nov 17 Python

Python 实现一个简单的web服务器

Jan 03 Python

python基础之类属性和实例属性

Oct 24 Python

python井字棋游戏实现人机对战

Apr 28 Python

Python如何定义有默认参数的函数

Aug 10 #Python

如何更换python默认编辑器的背景色

Aug 10 #Python

django前端页面下拉选择框默认值设置方式

Aug 09 #Python

解决Django响应JsonResponse返回json格式数据报错问题

Aug 09 #Python

django 获取字段最大值,最新的记录操作

Aug 09 #Python

在django中查询获取数据,get, filter,all(),values()操作

Aug 09 #Python

Python 使用双重循环打印图形菱形操作

Aug 09 #Python

You might like

php木马攻击防御之道

2008/03/24 PHP

php 远程关机操作的代码

2008/12/05 PHP

PHP实现APP微信支付的实例讲解

2018/02/10 PHP

Javascript UrlDecode函数代码

2010/01/09 Javascript

分享一个自定义的console类让你不再纠结JS中的调试代码的兼容

2012/04/20 Javascript

Extjs4 GridPanel的主要配置参数详细介绍

2013/04/18 Javascript

js实现绿白相间竖向网页百叶窗动画切换效果

2015/03/02 Javascript

使用jquery实现鼠标滑过弹出更多相关信息层附源码下载

2015/11/23 Javascript

javascript 中的console.log和弹出窗口alert

2016/08/30 Javascript

详解如何使用Vue2做服务端渲染

2017/03/29 Javascript

基于原生js运动方式关键点的总结(推荐)

2017/10/01 Javascript

ES6学习教程之模板字符串详解

2017/10/09 Javascript

Vue实现左右菜单联动实现代码

2018/08/12 Javascript

mpvue项目中使用第三方UI组件库的方法

2018/09/30 Javascript

JS中使用new Option()实现时间联动效果

2018/12/10 Javascript

微信小程序事件 bindtap bindinput代码实例

2019/08/26 Javascript

详解js location.href和window.open的几种用法和区别

2019/12/02 Javascript

JS实现简单的表格增删

2020/01/16 Javascript

微信小程序使用GoEasy实现websocket实时通讯

2020/05/19 Javascript

Vue2.x和Vue3.x的双向绑定原理详解

2020/11/05 Javascript

[01:34]传奇从这开始 2016国际邀请赛中国区预选赛震撼开启

2016/06/26 DOTA

[01:01:43]EG vs VP 2018国际邀请赛淘汰赛BO3 第二场 8.24

2018/08/25 DOTA

[01:46]2018完美盛典章节片——坚守

2018/12/17 DOTA

python实现划词翻译

2020/04/23 Python

在Python中用split()方法分割字符串的使用介绍

2015/05/20 Python

浅谈Python基础—判断和循环

2019/03/22 Python

python变量的存储原理详解

2019/07/10 Python

python爬虫用request库处理cookie的实例讲解

2021/02/20 Python

乌克兰在线商店的价格比较：Price.ua

2019/07/26 全球购物

英国最大的在线快递公司之一：ParcelHero

2019/11/04 全球购物

Vuori官网：运动服装的终级表现

2021/01/27 全球购物

美术师范毕业生自荐信

2013/11/16 职场文书

简单租房协议书（范本）

2014/10/13 职场文书

2014年幼儿园教学工作总结

2014/12/04 职场文书

大学体育课感想

2015/08/10 职场文书

2019年中学生的思想品德评语集锦

2019/12/19 职场文书