编程 Python

Python基于requests库爬取网站信息

Posted in Python onMarch 02, 2020

requests库是一个简介且简单的处理HTTP请求的第三方库

get()是获取网页最常用的方式，其基本使用方式如下

使用requests库获取HTML页面并将其转换成字符串后，需要进一步解析HTML页面格式，这里我们常用的就是beautifulsoup4库，用于解析和处理HTML和XML

下面这段代码便是爬取百度的信息并简单输出百度的界面信息

import requests
from bs4 import BeautifulSoup
r=requests.get('http://www.baidu.com')
r.encoding=None
result=r.text
bs=BeautifulSoup(result,'html.parser')
print(bs.title)
print(bs.title.text)

import requests
from bs4 import BeautifulSoup

#用来解决乱码现象，所以编写爬取信息的代码最好带上（输出出现乱码或者UnicodeEncodeError：'gbk'codec can't encode character） 
import io   
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')

#用来防止反爬取，可以了解一下
headers={"User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.1.6)",
"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",

"Accept-Language" : "en-us",

"Connection" : "keep-alive",

"Accept-Charset" : "GB2312,utf-8;q=0.7,*;q=0.7"
}

#获取51job网站的基本信息
r=requests.get('https://search.51job.com/list/000000,000000,0000,00,9,99,python,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=')
r.encoding=r.apparent_encoding
result=r.text
bs=BeautifulSoup(result,'html.parser')
print(bs.prettify())

u1=bs.find_all('u1',attrs={'class':'item_con_list'})  #这部分代码便是我们爬取的目标，51job网站上关于python职业的薪资
print(len(u1))
li=bs.find_all('span',attrs={'class':'t4'})
for l in li:
  print(l.text)

上面这段代码便是爬取51job网站上的与python相关职业的薪资

Python基于requests库爬取网站信息

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持三水点靠木。

Python基于requests库爬取网站信息

- Author -

江武555

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python线程创建和终止实例代码

Jan 20 Python

Python将文本去空格并保存到txt文件中的实例

Jul 24 Python

OpenCV2从摄像头获取帧并写入视频文件的方法

Aug 03 Python

使用python实现http及ftp服务进行数据传输的方法

Oct 26 Python

用python爬取租房网站信息的代码

Dec 14 Python

强悍的Python读取大文件的解决方案

Feb 16 Python

Python操作MySQL数据库的两种方式实例分析【pymysql和pandas】

Mar 18 Python

python GUI库图形界面开发之PyQt5窗口类QMainWindow详细使用方法

Feb 26 Python

Python tkinter实现简单加法计算器代码实例

May 13 Python

Python 读取位于包中的数据文件

Aug 07 Python

class类在python中获取金融数据的实例方法

Dec 10 Python

Python pandas之求和运算和非空值个数统计

Aug 07 Python

使用python3 实现插入数据到mysql

Mar 02 #Python

python数字类型math库原理解析

Mar 02 #Python

Python如何实现在字符串里嵌入双引号或者单引号

Mar 02 #Python

Python random库使用方法及异常处理方案

Mar 02 #Python

python 实现人和电脑猜拳的示例代码

Mar 02 #Python

解决python3插入mysql时内容带有引号的问题

Mar 02 #Python

python统计字符串中字母出现次数代码实例

Mar 02 #Python