编程 Python

Python中类似于jquery的pyquery库用法分析

Posted in Python onDecember 02, 2019

本文实例讲述了Python中类似于jquery的pyquery库用法。分享给大家供大家参考，具体如下：

pyquery：一个类似于jquery的Python库

pyquery可以使你在xml文档上做jquery查询，它的API尽可能地类似于jquery。pyquery使用lxml执行快速的xml和html操作。

这并非（至少目前还不是）一个生成javascript代码或者与javascript代码做交互的库。pyquery的作者只是由于非常喜欢jquery的API因而将其用python实现。

该项目目前托管在Github仓库中并且处于活跃开发状态。作者可以为任何想要贡献源码的开发者赋予push权限，并且会对其做的变更做回顾。如果你想要贡献源码，可以发Email给项目作者。

项目的Bug可以通过Github Issue Tracker进行提交。

快速入门

你可以使用PyQuery类从一个字符串，一个lxml文档，一个文件或者一个url钟载入一个xml文档：

>>> from pyquery import PyQuery as pq
>>> from lxml import etree
>>> import urllib
>>> d = pq("<html></html>")
>>> d = pq(etree.fromstring("<html></html>"))
>>> d = pq(url=your_url)
>>> d = pq(url=your_url,
...    opener=lambda url, **kw: urlopen(url).read())
>>> d = pq(filename=path_to_html_file)

现在，d就相当于jquery里的$：

>>> d("#hello")
[<p#hello.hello>]
>>> p = d("#hello")
>>> print(p.html())
Hello world !
>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
[<p#hello.hello>]
>>> print(p.html())
you know <a href="http://python.org/" rel="external nofollow" >Python</a> rocks
>>> print(p.text())
you know Python rocks

你也可以使用某些jQuery中可用而并非css标准的伪类，诸如 :first :last :even :odd :eq :lt :gt :checked :selected :file:等

>>> d('p:first')
[<p#hello.hello>]

参见http://pyquery.rtfd.org/查看全部文档

CSS

你可以像这样添加、切换、移除CSS：

>>> p.addClass("toto")
[<p#hello.hello.toto>]
>>> p.toggleClass("titi toto")
[<p#hello.hello.titi>]
>>> p.removeClass("titi")
[<p#hello.hello>]

或者操作CSS样式：

>>> p.css("font-size", "15px")
[<p#hello.hello>]
>>> p.attr("style")
'font-size: 15px'
>>> p.css({"font-size": "17px"})
[<p#hello.hello>]
>>> p.attr("style")
'font-size: 17px'

使用更加Pythonic的方式完成同样的功能 (‘_' 字符转换为 ‘-‘)：

>>> p.css.font_size = "16px"
>>> p.attr.style
'font-size: 16px'
>>> p.css['font-size'] = "15px"
>>> p.attr.style
'font-size: 15px'
>>> p.css(font_size="16px")
[<p#hello.hello>]
>>> p.attr.style
'font-size: 16px'
>>> p.css = {"font-size": "17px"}
>>> p.attr.style
'font-size: 17px'

使用伪类：

:button

匹配所有按钮输入元素和按钮元素 Matches all button input elements and the button element

:checkbox

匹配所有复选框输入元素 Matches all checkbox input elements

:checked

匹配选中的元素，下标从0开始 Matches odd elements, zero-indexed

:child

右边是左边的直接子元素 right is an immediate child of left

:contains()

包含元素 Matches all elements that contain the given text

:descendant

右边是左边的子元素、孙元素或者更远的后继元素 right is a child, grand-child or further descendant of left

:disabled

匹配所有被禁用的元素 Matches all elements that are disabled

:empty

匹配所有不包括任何其他元素的元素 Match all elements that do not contain other elements

:enabled

匹配所有启用的元素 Matches all elements that are enabled

:eq()

使用下标匹配 Matches a single element by its index

:even

从下标0开始，匹配所有偶数元素 Matches even elements, zero-indexed

:file

匹配所有文件类型的输入元素 Matches all input elements of type file

:first

匹配第一个被选择的元素 Matches the first selected element

:gt()

匹配下标大于指定值的元素 Matches all elements with an index over the given one

:header

匹配所有标题元素 Matches all header elelements (h1, ..., h6)

:image

匹配所有图像输入元素 Matches all image input elements

:input

匹配所有输入元素 Matches all input elements

:last

匹配最后一个选择的元素 Matches the last selected element

:lt()

匹配所有下标小于指定值的元素 Matches all elements with an index below the given one

:odd

匹配奇元素，下标从0开始 Matches odd elements, zero-indexed

:parent

匹配所有包含其他元素的元素 Match all elements that contain other elements

:password

匹配所有密码输入元素 Matches all password input elements

:radio

匹配单选按钮输入元素 Matches all radio input elements

:reset

匹配所有重置输入元素 Matches all reset input elements

:selected

匹配所有被选中的元素 Matches all elements that are selected

:submit

匹配所有提交输入元素 Matches all submit input elements

:text¶

匹配所有文本输入元素 Matches all text input elements

操作

你也可以向标签的尾部追加元素：

>>> d = pq('<p class="hello" id="hello">you know Python rocks</p>')
>>> d('p').append(' check out <a href="http://reddit.com/r/python" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ><span>reddit</span></a>')
[<p#hello.hello>]
>>> print(d)
<p class="hello" id="hello">you know Python rocks check out <a href="http://reddit.com/r/python" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ><span>reddit</span></a></p>

或者加至开头：

>>> p = d('p')
>>> p.prepend('check out <a href="http://reddit.com/r/python" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >reddit</a>')
[<p#hello.hello>]
>>> print(p.html())
check out <a href="http://reddit.com/r/python" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >reddit</a>you know ...

在其他元素之前或者之后追加元素：

>>> d = pq('<html><body><div id="test"><a href="http://python.org" rel="external nofollow" rel="external nofollow" >python</a> !</div></body></html>')
>>> p.prependTo(d('#test'))
[<p#hello.hello>]
>>> print(d('#test').html())
<p class="hello" ...

在其他元素之后插入元素：

>>> p.insertAfter(d('#test'))
[<p#hello.hello>]
>>> print(d('#test').html())
<a href="http://python.org" rel="external nofollow" rel="external nofollow" >python</a> !

或者插入其他元素之前：

>>> p.insertBefore(d('#test'))
[<p#hello.hello>]
>>> print(d('body').html())
<p class="hello" id="hello">...

对每个元素做一些事情：

>>> p.each(lambda i, e: pq(e).addClass('hello2'))
[<p#hello.hello.hello2>]

移除一个元素：

>>> d = pq('<html><body><p id="id">Yeah!</p><p>python rocks !</p></div></html>')
>>> d.remove('p#id')
[<html>]
>>> d('p#id')
[]

移除选中元素的内容：

>>> d('p').empty()
[<p>]

你可以获得修改后的html内容：

>>> print(d)
<html><body><p/></body></html>

你可以生成html片段：

>>> from pyquery import PyQuery as pq
>>> print(pq('<div>Yeah !</div>').addClass('myclass') + pq('<b>cool</b>'))
<div class="myclass">Yeah !</div><b>cool</b>

移除所有命名空间：

>>> d = pq('<foo xmlns="http://example.com/foo"></foo>')
>>> d
[<{http://example.com/foo}foo>]
>>> d.remove_namespaces()
[<foo>]

遍历

一些jQuery遍历方法也可以支持。这里有几个例子。

你可以使用字符串选择器过滤选择列表：

>>> d = pq('<p id="hello" class="hello"><a/></p><p id="test"><a/></p>')
>>> d('p').filter('.hello')
[<p#hello.hello>]

可以使用eq选择器选中单个元素：

>>> d('p').eq(0)
[<p#hello.hello>]

你可以找出嵌套元素：

>>> d('p').find('a')
[<a>, <a>]
>>> d('p').eq(1).find('a')
[<a>]

也支持使用end从一级遍历中跳出：

>>> d('p').find('a').end()
[<p#hello.hello>, <p#test>]
>>> d('p').eq(0).end()
[<p#hello.hello>, <p#test>]
>>> d('p').filter(lambda i: i == 1).end()
[<p#hello.hello>, <p#test>]

网络 Scraping

pyquery也可以从一个url载入html文档：

>>> pq(your_url)
[<html>]

缺省使用的是python的urllib。

如果安装了requests就使用requests。你可以使用大部分requests的参数。

>>> pq(your_url, headers={'user-agent': 'pyquery'})
[<html>]
>>> pq(your_url, {'q': 'foo'}, method='post', verify=True)
[<html>]

pyquery ? PyQuery完整API参见：http://pyquery.readthedocs.org/en/latest/api.html

pyquery.ajax ? PyQuery AJAX 扩展

如果安装了WebOb（它并不是pyquery的依赖项目），你可以查询一些wsgi app。在本例中，测试app在/处返回一个简单的输入，在/submit处返回一个提交按钮： IN this example the test app returns a simple input at / and a submit button at /submit:

>>> d = pq('<form></form>', app=input_app)
>>> d.append(d.get('/'))
[<form>]
>>> print(d)
<form><input name="youyou" type="text" value=""/></form>

app在新节点中也可用： The app is also available in new nodes:

>>> d.get('/').app is d.app is d('form').app
True

你也可以请求另外一个路径：

>>> d.append(d.get('/submit'))
[<form>]
>>> print(d)
<form><input name="youyou" type="text" value=""/><input type="submit" value="OK"/></form>

如果安装了restkit，你就可以直接从一个HostProxy app获取url：

>>> a = d.get(your_url)
>>> a
[<html>]

你可以获取到app的响应：

>>> print(a.response.status)
200 OK

小贴士 Tips

你可以使链接转化为绝对链，在屏幕抓取时还会比较有用： You can make links absolute which can be usefull for screen scrapping:

>>> d = pq(url=your_url, parser='html')
>>> d('form').attr('action')
'/form-submit'
>>> d.make_links_absolute()
[<html>]

使用不同的解析器

缺省情况下，pyquery使用lxml xml解析器并且如果它不能工作的话，继续尝试lxml.html中的html解析器。xml解析器在解析xhtml页面时可能出现一些问题，因为解析器不会抛出一个错误，而是给出一个不能用的树。 The xml parser can sometimes be problematic when parsing xhtml pages because the parser will not raise an error but give an unusable tree (on w3c.org for example).

你也可以显式地声明使用哪一个解析器：

>>> pq('<html><body><p>toto</p></body></html>', parser='xml')
[<html>]
>>> pq('<html><body><p>toto</p></body></html>', parser='html')
[<html>]
>>> pq('<html><body><p>toto</p></body></html>', parser='html_fragments')
[<p>]

html和html_fragments解析器都在lxml.html当中。

希望本文所述对大家Python程序设计有所帮助。

Python中类似于jquery的pyquery库用法分析

- Author -

在线疯狂

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python写的PHPMyAdmin暴力破解工具代码

Aug 06 Python

Python中实现对list做减法操作介绍

Jan 09 Python

python自动zip压缩目录的方法

Jun 28 Python

Python基于pygame实现的font游戏字体(附源码)

Nov 11 Python

Python 获取div标签中的文字实例

Dec 20 Python

Python list列表中删除多个重复元素操作示例

Feb 27 Python

django自带调试服务器的使用详解

Aug 29 Python

Python全面分析系统的时域特性和频率域特性

Feb 26 Python

基于Python正确读取资源文件

Sep 14 Python

Python pathlib模块使用方法及实例解析

Oct 05 Python

python脚本使用阿里云slb对恶意攻击进行封堵的实现

Feb 04 Python

C站最全Python标准库总结,你想要的都在这里

Jul 03 Python

python 检查数据中是否有缺失值,删除缺失值的方式

Dec 02 #Python

python实现两个字典合并,两个list合并

Dec 02 #Python

Python:合并两个numpy矩阵的实现

Dec 02 #Python

DataFrame.to_excel多次写入不同Sheet的实例

Dec 02 #Python

python实现数据清洗(缺失值与异常值处理)

Dec 02 #Python

python字符串反转的四种方法详解

Dec 02 #Python

python实现两个一维列表合并成一个二维列表

Dec 02 #Python