编程 Python

Python爬虫实现获取动态gif格式搞笑图片的方法示例

Posted in Python onDecember 24, 2018

本文实例讲述了Python爬虫实现获取动态gif格式搞笑图片的方法。分享给大家供大家参考，具体如下：

有时候看到一些喜欢的动图，如果一个个取保存挺麻烦，有的网站还不支持右键保存，因此使用python来获取动态图，就看看就很有意思了

本次爬取的网站是居然搞笑网 http://www.zbjuran.com/dongtai/list_4_1.html

思路：

获取当前页面内容

查找页面中动图所代表的url地址

保存这个地址内容到本地

如果想爬取多页，就可以加上一个循环条件

代码：

#!/usr/bin/python
#coding:utf-8
import urllib2,time,uuid,urllib,os,sys,re
from bs4 import BeautifulSoup
reload(sys)
sys.setdefaultencoding('utf-8')
#获取页面内容
def getHtml(url):
    try:
        print url
        html = urllib2.urlopen(url).read()#.decode('utf-8')#解码为utf-8
    except:
        return
    return html
#获取动图所代表的url列表
def getImagUrl(html):
    if not html:
        print 'nothing can be found'
        return
    ImagUrlList=[]
    soup=BeautifulSoup(html,'lxml')
    #获取item列表
    items=soup.find("div",{"class":"main"}).find_all('div',{'class':'item'})
    for item in items:
        target={}
        #通过if语句，过滤广告项
        if item.find('div',{"class":"text"}):
            #获取url
            imgurl=item.find('div',{"class":"text"}).find('img').get('src')
            target['url']=imgurl
            #获取名字
            target['name']=item.find('h3').text
            ImagUrlList.append(target)
    return ImagUrlList
#下载图片到本地
def download(author,imgurl,typename,pageNo):
    #定义文件夹的名字
    x = time.localtime(time.time())
    foldername = str(x.__getattribute__("tm_year"))+"-"+str(x.__getattribute__("tm_mon"))+"-"+str(x.__getattribute__("tm_mday"))
    download_img=None
    picpath = 'Jimy/%s/%s/%s' % (foldername,typename,str(pageNo))
    filename = author+str(uuid.uuid1())
    pic_type=imgurl[-3:]
    if not os.path.exists(picpath):
        os.makedirs(picpath)
    target = picpath+"/%s.%s" % (filename,pic_type)
    print "动图存贮位置:"+target
    download_img = urllib.urlretrieve(imgurl, target)#将图片下载到指定路径中
    print "图片出处为："+imgurl
    return download_img
#退出函数
def myquit():
    print "Bye Bye!"
    exit(0)
def start(pageNo):
    targeturl="http://www.zbjuran.com/dongtai/list_4_%s.html" % str(pageNo)
    html = getHtml(targeturl)
    urllist=getImagUrl(html)
    for imgurl in urllist:
        download(imgurl['name'],imgurl['url'],'搞笑动图',pageNo)
if __name__ == '__main__':
    print '''
            *****************************************
            **  Welcome to Spider of GIF     **
            **   Created on 2017-3-16      **
            **   @author: Jimy         **
            *****************************************'''
    pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit\n\
请输入要爬取的页面，范围为（1-100），如果退出，请输入Q>\n>")
    while not pageNo.isdigit() or int(pageNo) > 50 or int(pageNo) < 1:
        if pageNo == 'Q':
            myquit()
        print "Param is invalid , please try again."
        pageNo = raw_input("Input the page number you want to scratch >")
    print pageNo
    start(pageNo)
    #第一次爬取结束
    pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit\n\
请输入总共需要爬取的页面，范围为（1-5000），如果退出，请输入Q>\n>")
    while not pageNo.isdigit() or int(pageNo) > 5000 or int(pageNo) < 1:
        if pageNo == 'Q':
            myquit()
        print "Param is invalid , please try again."
        pageNo = raw_input("Input the page number you want to scratch >")
    #循环遍历，爬取多页
    for num in xrange(int(pageNo)):
        start(str(num+1))

结果如下：

                        *****************************************
                        **    Welcome to Spider of GIF         **
                        **      Created on 2017-3-16           **
                        **      @author: Jimy                  **
                        *****************************************
Input the page number you want to scratch (1-50),please input 'quit' if you want to quit
请输入要爬取的页面，范围为（1-100），如果退出，请输入Q>
>1
1
http://www.zbjuran.com/dongtai/list_4_1.html
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/真是艰难的选择。3f0fe8f6-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为：http://www.zbjuran.com/uploads/allimg/170206/10-1F206135ZHJ.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/这么贱会被打死吧……3fa9da88-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为：http://www.zbjuran.com/uploads/allimg/170206/10-1F206135H35U.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/一看就是印度……4064e60c-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为：http://www.zbjuran.com/uploads/allimg/170206/10-1F20613543c50.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/新垣结衣的正经工作脸414b4f52-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为：http://www.zbjuran.com/uploads/allimg/170206/10-1F206135250553.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/妹子这是在摇什么的421afa86-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为：http://www.zbjuran.com/uploads/allimg/170206/10-1F20613493N03.gif
Input the page number you want to scratch (1-50),please input 'quit' if you want to quit
请输入总共需要爬取的页面，范围为（1-5000），如果退出，请输入Q>
>Q
Bye Bye!

最终就能够获得动态图了

更多关于Python相关内容可查看本站专题：《Python Socket编程技巧总结》、《Python正则表达式用法总结》、《Python数据结构与算法教程》、《Python函数使用技巧总结》、《Python字符串操作技巧汇总》、《Python入门与进阶经典教程》及《Python文件与目录操作技巧汇总》

希望本文所述对大家Python程序设计有所帮助。

Python爬虫实现获取动态gif格式搞笑图片的方法示例

- Author -

枫奇

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python实现的简单万年历例子分享

Apr 25 Python

Python中Collections模块的Counter容器类使用教程

May 31 Python

Python中struct模块对字节流/二进制流的操作教程

Jan 21 Python

Django实现简单分页功能的方法详解

Dec 05 Python

Python下调用Linux的Shell命令的方法

Jun 12 Python

python 中xpath爬虫实例详解

Aug 26 Python

python装饰器原理与用法深入详解

Dec 19 Python

Python内建序列通用操作6种实现方法

Mar 26 Python

Python3.7将普通图片(png)转换为SVG图片格式(网站logo图标)动起来

Apr 21 Python

用Python开发app后端有优势吗

Jun 29 Python

用Python 执行cmd命令

Dec 18 Python

如何使用Python对NetCDF数据做空间相关分析

Apr 21 Python

python 在屏幕上逐字显示一行字的实例

Dec 24 #Python

python之Flask实现简单登录功能的示例代码

Dec 24 #Python

python实现逐个读取txt字符并修改

Dec 24 #Python

Python判断一个list中是否包含另一个list全部元素的方法分析

Dec 24 #Python

python读取txt文件中特定位置字符的方法

Dec 24 #Python

python进行文件对比的方法

Dec 24 #Python

Python二叉树的遍历操作示例【前序遍历,中序遍历,后序遍历,层序遍历】

Dec 24 #Python

You might like

php 数组的创建、调用和更新实现代码

2009/03/09 PHP

php中unlink()、mkdir()、rmdir()等方法的使用介绍

2012/12/21 PHP

PHP中IP地址与整型数字互相转换详解

2014/08/20 PHP

Laravel 5框架学习之Laravel入门和新建项目

2015/04/07 PHP

PHP中的switch语句的用法实例详解

2015/10/21 PHP

js对数字的格式化使用说明

2011/01/12 Javascript

javascript 使td内容不换行不撑开

2012/11/29 Javascript

深入领悟JavaScript中的面向对象

2013/11/18 Javascript

jQuery实现复选框批量选择与反选的方法

2015/06/17 Javascript

JavaScript实现输入框(密码框)出现提示语

2016/01/12 Javascript

如何高效率去掉js数组中的重复项

2016/04/12 Javascript

设置点击文本框或图片弹出日历控件的实现代码

2016/05/12 Javascript

JQuery对ASP.NET MVC数据进行更新删除

2016/07/13 Javascript

浅谈js数组和splice的用法

2016/12/04 Javascript

bootstrap栅格系统示例代码分享

2017/05/22 Javascript

浅谈vue.js中v-for循环渲染

2017/07/26 Javascript

深入理解requireJS-实现一个简单的模块加载器

2018/01/15 Javascript

如何使用puppet替换文件中的string

2018/12/06 Javascript

基于elementUI使用v-model实现经纬度输入的vue组件

2019/05/12 Javascript

JQuery事件冒泡和默认行为代码实例

2020/05/13 jQuery

Vue将props值实时传递并可修改的操作

2020/08/09 Javascript

解决nuxt页面中mounted、created、watch执行两遍的问题

2020/11/05 Javascript

[48:44]2014 DOTA2国际邀请赛中国区预选赛5.21 TongFu VS HGT

2014/05/22 DOTA

Python实现基于权重的随机数2种方法

2015/04/28 Python

详解将Python程序(.py)转换为Windows可执行文件(.exe)

2019/07/19 Python

使用python turtle画高达

2020/01/19 Python

浅谈Python中的生成器和迭代器

2020/06/19 Python

pytorch判断是否cuda 判断变量类型方式

2020/06/23 Python

如何清空Session

2015/02/23 面试题

单位未婚证明范本

2014/01/18 职场文书

自考生自我评价分享

2014/01/18 职场文书

关于廉洁的广播稿

2014/01/30 职场文书

ktv筹备计划书

2014/05/03 职场文书

Vue+TypeScript中处理computed方式

2022/04/02 Vue.js

Golang Elasticsearches 批量修改查询及发送MQ

2022/04/19 Golang

JDK8中String的intern()方法实例详细解读

2022/09/23 Java/Android