编程 Python

python使用BeautifulSoup分析网页信息的方法

Posted in Python onApril 04, 2015

本文实例讲述了python使用BeautifulSoup分析网页信息的方法。分享给大家供大家参考。具体如下：

这段python代码查找网页上的所有链接，分析所有的span标签，并查找class包含titletext的span的内容

#import the library used to query a website

import urllib2
#specify the url you want to query

url = "http://www.python.org"
#Query the website and return the html to the variable 'page'

page = urllib2.urlopen(url)
#import the Beautiful soup functions to parse the data returned from the website

from BeautifulSoup import BeautifulSoup
#Parse the html in the 'page' variable, and store it in Beautiful Soup format

soup = BeautifulSoup(page)
#to print the soup.head is the head tag and soup.head.title is the title tag

print soup.head

print soup.head.title
#to print the length of the page, use the len function

print len(page)
#create a new variable to store the data you want to find.

tags = soup.findAll('a')
#to print all the links

print tags
#to get all titles and print the contents of each title

titles = soup.findAll('span', attrs = { 'class' : 'titletext' })

for title in allTitles:

print title.contents

希望本文所述对大家的Python程序设计有所帮助。

python使用BeautifulSoup分析网页信息的方法

- Author -

令狐不聪

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python 正则表达式操作指南

May 04 Python

Python中实现字符串类型与字典类型相互转换的方法

Aug 18 Python

Python中常用操作字符串的函数与方法总结

Feb 04 Python

Python全局变量用法实例分析

Jul 19 Python

Python 使用os.remove删除文件夹时报错的解决方法

Jan 13 Python

Python之Web框架Django项目搭建全过程

May 02 Python

如何在python字符串中输入纯粹的{}

Aug 22 Python

Python3.5基础之函数的定义与使用实例详解【参数、作用域、递归、重载等】

Apr 26 Python

python/golang 删除链表中的元素

Sep 14 Python

Pycharm安装Qt Design快捷工具的详细教程

Nov 18 Python

Python爬虫基础之爬虫的分类知识总结

May 13 Python

解决numpy和torch数据类型转化的问题

May 23 Python

python使用webbrowser浏览指定url的方法

Apr 04 #Python

用Python编写一个简单的俄罗斯方块游戏的教程

Apr 03 #Python

用Python代码来绘制彭罗斯点阵的教程

Apr 03 #Python

利用Python演示数型数据结构的教程

Apr 03 #Python

简洁的十分钟Python入门教程

Apr 03 #Python

初步解析Python中的yield函数的用法

Apr 03 #Python

几个提升Python运行效率的方法之间的对比

Apr 03 #Python

You might like

整合了前面的PHP数据库连接类~~做成一个分页类!

2006/11/25 PHP

PHP 实现代码复用的一个方法 traits新特性

2015/02/22 PHP

php通过rmdir删除目录的简单用法

2015/03/18 PHP

PHP邮箱验证示例教程

2016/06/01 PHP

Laravel学习教程之View模块详解

2017/09/18 PHP

php中的钩子理解及应用实例分析

2019/08/30 PHP

php使用gearman进行任务分发操作实例详解

2020/02/26 PHP

jquery实现类似淘宝星星评分功能有截图

2014/09/15 Javascript

jQuery制作拼图小游戏

2015/01/12 Javascript

超赞的动手创建JavaScript框架的详细教程

2015/06/30 Javascript

js实现iPhone界面风格的单选框和复选框按钮实例

2015/08/18 Javascript

vue2.0开发实践总结之入门篇

2016/12/06 Javascript

谈谈JavaScript数组常用方法总结

2017/01/24 Javascript

JavaScript Canvas绘制圆形时钟效果

2020/08/20 Javascript

angular.js指令中transclude选项及ng-transclude指令详解

2017/05/24 Javascript

jQuery实现表格冻结顶栏效果

2017/08/20 jQuery

微信小程序实现流程进度的图样式功能

2018/01/16 Javascript

vue-router路由懒加载的实现（解决vue项目首次加载慢）

2018/08/28 Javascript

详解vue 不同环境配置不同的打包命令

2019/04/07 Javascript

小程序如何支持使用 async/await详解

2019/09/12 Javascript

基于node+vue实现简单的WebSocket聊天功能

2020/02/01 Javascript

Python深入学习之内存管理

2014/08/31 Python

Python使用multiprocessing创建进程的方法

2015/06/04 Python

python 实现上传图片并预览的3种方法(推荐)

2017/07/14 Python

python实现数据图表

2017/07/29 Python

Pandas之ReIndex重新索引的实现

2019/06/25 Python

python plotly绘制直方图实例详解

2019/07/22 Python

应届护士推荐信

2013/11/16 职场文书

行政人事岗位职责

2014/03/17 职场文书

学校门卫岗位职责范本

2014/06/30 职场文书

2014年学习厉行节约反对浪费思想汇报

2014/09/10 职场文书

大专毕业生自我鉴定范文（2篇）

2014/09/27 职场文书

2014年审计工作总结

2014/11/17 职场文书

学校艾滋病宣传活动总结

2015/05/09 职场文书

文案策划岗位个人自我评价（范文）

2019/08/08 职场文书

世界十大狙击步枪排行榜

2022/03/20 杂记