编程 Python

python 3利用BeautifulSoup抓取div标签的方法示例

Posted in Python onMay 28, 2017

前言

本文主要介绍的是关于python 3用BeautifulSoup抓取div标签的方法示例，分享出来供大家参考学习，下面来看看详细的介绍：

示例代码：

# -*- coding:utf-8 -*-
#python 2.7
#XiaoDeng
#http://tieba.baidu.com/p/2460150866
#标签操作


from bs4 import BeautifulSoup
import urllib.request
import re


#如果是网址，可以用这个办法来读取网页
#html_doc = "http://tieba.baidu.com/p/2460150866"
#req = urllib.request.Request(html_doc) 
#webpage = urllib.request.urlopen(req) 
#html = webpage.read()



html="""
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" rel="external nofollow" class="sister" id="xiaodeng"><!-- Elsie --></a>,
<a href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" rel="external nofollow" class="sister" id="link3">Tillie</a>;
<a href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" class="sister" id="xiaodeng">Lacie</a>
and they lived at the bottom of a well.</p>
<div class="ntopbar_loading"><img src="http://simg.sinajs.cn/blog7style/images/common/loading.gif">加载中…</div>

<div class="SG_connHead">
   <span class="title" comp_title="个人资料">个人资料</span>
   <span class="edit">
      </span>
<div class="info_list">  
         <ul class="info_list1">
     <li><span class="SG_txtc">博客等级：</span><span id="comp_901_grade"><img src="http://simg.sinajs.cn/blog7style/images/common/sg_trans.gif" real_src="http://simg.sinajs.cn/blog7style/images/common/number/9.gif" /></span></li>
     <li><span class="SG_txtc">博客积分：</span><span id="comp_901_score"><strong>0</strong></span></li>
     </ul>
     <ul class="info_list2">
     <li><span class="SG_txtc">博客访问：</span><span id="comp_901_pv"><strong>3,971</strong></span></li>
     <li><span class="SG_txtc">关注人气：</span><span id="comp_901_attention"><strong>0</strong></span></li>
     <li><span class="SG_txtc">获赠金笔：</span><strong id="comp_901_d_goldpen">0支</strong></li>
     <li><span class="SG_txtc">赠出金笔：</span><strong id="comp_901_r_goldpen">0支</strong></li>
     <li class="lisp" id="comp_901_badge"><span class="SG_txtc">荣誉徽章：</span></li>
     </ul>
     </div>
<div class="atcTit_more"><span class="SG_more"><a href="http://blog.sina.com.cn/" rel="external nofollow" rel="external nofollow" target="_blank">更多>></a></span></div>     
<p class="story">...</p>
"""
soup = BeautifulSoup(html, 'html.parser') #文档对象



# 类名为xxx而且文本内容为hahaha的div
for k in soup.find_all('div',class_='atcTit_more'):#,string='更多'
 print(k)
 #<div class="atcTit_more"><span class="SG_more"><a href="http://blog.sina.com.cn/" rel="external nofollow" rel="external nofollow" target="_blank">更多>></a></span></div>

总结

以上就是这篇文章的全部内容了，希望本文的内容对大家的学习或者工作能带来一定的帮助，如果有疑问大家可以留言交流，谢谢大家三水点靠木的支持。

python 3利用BeautifulSoup抓取div标签的方法示例

- Author -

Xiao|Deng

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python实现的Kmeans++算法实例

Apr 26 Python

从Python的源码来解析Python下的freeblock

May 11 Python

详解Swift中属性的声明与作用

Jun 30 Python

Python实现Sqlite将字段当做索引进行查询的方法

Jul 21 Python

Python实现读取文件最后n行的方法

Feb 23 Python

老生常谈Python startswith()函数与endswith函数

Sep 08 Python

Python 调用PIL库失败的解决方法

Jan 08 Python

详解Python Matplot中文显示完美解决方案

Mar 07 Python

python实现Dijkstra算法的最短路径问题

Jun 21 Python

Python tensorflow实现mnist手写数字识别示例【非卷积与卷积实现】

Dec 19 Python

TensorFlow2.1.0最新版本安装详细教程

Apr 08 Python

python 发送邮件的示例代码(Python2/3都可以直接使用)

Dec 03 Python

Python虚拟环境virtualenv的安装与使用详解

May 28 #Python

python 调用win32pai 操作cmd的方法

May 28 #Python

Python 稀疏矩阵-sparse 存储和转换

May 27 #Python

Django基础之Model操作步骤(介绍)

May 27 #Python

python之PyMongo使用总结

May 26 #Python

Python3安装Pymongo详细步骤

May 26 #Python

Python计时相关操作详解【time,datetime】

May 26 #Python

You might like

BBS(php & mysql)完整版(七)

2006/10/09 PHP

一个简单的MySQL数据浏览器

2006/10/09 PHP

PHP实现数字补零功能的2个函数介绍

2014/05/12 PHP

PHP处理Oracle的CLOB实例

2014/11/03 PHP

PHP使用正则表达式实现过滤非法字符串功能示例

2018/06/04 PHP

jquery $.fn $.fx是什么意思有什么用

2013/11/04 Javascript

JavaScript实现twitter puddles算法实例

2014/12/06 Javascript

node.js中的buffer.length方法使用说明

2014/12/14 Javascript

JS实现日期时间动态显示的方法

2015/12/07 Javascript

node.js连接mongoDB数据库快速搭建自己的web服务

2016/04/17 Javascript

jQuery插件扩展实例【添加回调函数】

2016/11/26 Javascript

基于jQuery实现弹幕APP

2017/02/10 Javascript

jQuery在header中设置请求信息的方法

2017/03/06 Javascript

angular+webpack2实战例子

2017/05/23 Javascript

JavaScript用二分法查找数据的实例代码

2017/06/17 Javascript

微信小程序-getUserInfo回调的实例详解

2017/10/27 Javascript

JS实现留言板功能[楼层效果展示]

2017/12/27 Javascript

JS简单实现查看文档创建日期、修改日期和文档大小的方法示例

2018/04/08 Javascript

JavaScript new对象的四个过程实例浅析

2018/07/31 Javascript

JS数组转字符串实现方法解析

2020/09/04 Javascript

[46:14]VGJ.T vs Liquid 2018国际邀请赛小组赛BO2 第一场 8.19

2018/08/21 DOTA

python实现apahce网站日志分析示例

2014/04/02 Python

Python常用的日期时间处理方法示例

2015/02/08 Python

python使用pil库实现图片合成实例代码

2018/01/20 Python

火车票抢票python代码公开揭秘!

2018/03/08 Python

快速解决Django关闭Debug模式无法加载media图片与static静态文件

2020/04/07 Python

python3.6.5基于kerberos认证的hive和hdfs连接调用方式

2020/06/06 Python

[原创]赚疯了!转手立赚800+?大佬的python「抢茅台脚本」使用教程

2021/01/12 Python

简单介绍CSS3中Media Query的使用

2015/07/07 HTML / CSS

优秀的茶餐厅创业计划书

2014/01/03 职场文书

中学运动会广播稿

2014/01/19 职场文书

成品仓库管理员岗位职责

2015/04/09 职场文书

给老婆的保证书怎么写

2015/05/08 职场文书

2016年大学生寒假社会实践心得体会

2015/10/09 职场文书

《静夜思》教学反思

2016/02/17 职场文书

详解Node.js如何处理ES6模块

2021/05/15 Javascript