编程 Python

python 爬虫一键爬取淘宝天猫宝贝页面主图颜色图和详情图的教程

Posted in Python onMay 22, 2018

实例如下所示：

import requests
import re,sys,os
import json
import threading
import pprint
class spider:
 def __init__(self,sid,name):
 
 self.id = sid
 self.headers = { "Accept":"text/html,application/xhtml+xml,application/xml;",
  "Accept-Encoding":"gzip",
  "Accept-Language":"zh-CN,zh;q=0.8",
  "Referer":"http://www.example.com/",
  "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
  }
 
 self.name=name
 def openurl(self,url):
 
 self.request = requests.get(url,headers = self.headers) 
 if self.request.ok:
  return self.request.text
  
 def matchs(self):
 
 tmall_exp = r"Setup\(([\s\S]+?)\);"### 匹配商品数据的正则
 detail= r"src=\"(https://img\S+?[jpgifn]+?)\"" ###匹配 商品详情图的正则
 html = self.openurl("https://detail.tmall.com/item.htm?id=%s"%self.id)
 data = re.findall(tmall_exp,html)
 data = json.loads(data[0])
 main_img = data['propertyPics'] ## 这里包括了主图和颜色图的地址
 color_data =data['valItemInfo'] ['skuList'] ### 这里获得商品的颜色信息列表 包括颜色编码 颜色名称,商品skuID
 detail_html = self.openurl("http:"+data['api']["httpsDescUrl"])
 detail_image = re.findall(detail,detail_html)
 self.newdata={"MAIN":main_img['default'],"DETAIL":detail_image,"id":self.id,}
 
 psvs = []
 self.newdata['COLOR']=[]
 
 for v in range(len(color_data)):
  if ";"in color_data[v]["pvs"]:
  psv = color_data[v]['pvs'][color_data[v]['pvs'].find(";")+1:]
  else:
  psv = color_data[v]['pvs']
  if psv in psvs:
  
  continue
  psvs.append(psv)
 
  self.newdata['COLOR'].append({color_data[v]["names"]:main_img[";"+psv+";"]})
  
 pprint.pprint(self.newdata)
 
 return self.newdata
 
 def download(self):
 if len(self.newdata)>0:
  for x in range(len(self.newdata['MAIN'])):
  
  threading.Thread(target=self.download_main,args=(self.newdata['MAIN'][x],x)).start()
  
  for x in self.newdata['COLOR']:
  
  threading.Thread(target=self.download_color,args=(x,)).start()
  for x in range(len(self.newdata['DETAIL'])):
  
  threading.Thread(target=self.download_detail,args=(self.newdata['DETAIL'][x],x)).start()
 return
 def download_main(self,url,index):
 try:
  img = requests.get("http:"+url,stream=True,headers = self.headers,timeout=10)
 except:
  print(sys.exc_info())
  return
 if img.ok:
  if not os.path.exists(self.name+"/main"):
  try:
   os.makedirs(self.name+"/main")
  except:
   pass
  imgs = open(self.name+"/main/%s.jpg"%index,"wb")
  imgs.write(img.content)
  imgs.close()
  
 def download_color(self,url):
  
 try:
  img = requests.get("http:"+url[list(url.keys())[0]][0],stream=True,headers = self.headers,timeout=10)
 except:
  print(sys.exc_info())
  return
 if img.ok:
  if not os.path.exists(self.name+"/color"):
  try:
   os.makedirs(self.name+"/color")
  except:
   pass
  if "/"in list(url.keys())[0]:
  color = list(url.keys())[0].replace("/","_")
  elif "\\" in list(url.keys())[0]:
  color = list(url.keys())[0].replace("\\","_")
  else:
  color = list(url.keys())[0]
  imgs = open(self.name+"/color/%s.jpg"%color,"wb")
  imgs.write(img.content)
  imgs.close()
 def download_detail(self,url,index):
 try:
  img = requests.get(url,stream=True,headers = self.headers,timeout=10)
 except:
  print(sys.exc_info())
  return
 if img.ok:
  if not os.path.exists(self.name+"/detail"):
  try:
   os.makedirs(self.name+"/detail")
  except:
   pass
  
  imgs = open(self.name+"/detail/%s.jpg"%index,"wb")
  imgs.write(img.content)
  imgs.close()
  
if __name__ =="__main__":
 
 sid = 528766269341 ## 这里输入天猫宝贝ID
 taobao = spider(sid,"下载图片/T")
 taobao.matchs()
 taobao.download()

以上这篇python 爬虫一键爬取淘宝天猫宝贝页面主图颜色图和详情图的教程就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持三水点靠木。

- Author -

mzbqhbc12

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

在Python中使用zlib模块进行数据压缩的教程

Jun 26 Python

python+matplotlib演示电偶极子实例代码

Jan 12 Python

Python中的pack和unpack的使用

Mar 12 Python

pandas apply 函数实现多进程的示例讲解

Apr 20 Python

利用Python将数值型特征进行离散化操作的方法

Nov 06 Python

django使用admin站点上传图片的实例

Jul 28 Python

Python解压 rar、zip、tar文件的方法

Nov 19 Python

Python unittest单元测试openpyxl实现过程解析

May 27 Python

Python爬取YY评级分数并保存数据实现过程解析

Jun 01 Python

python3让print输出不换行的方法

Aug 24 Python

基于tensorflow __init__、build 和call的使用小结

Feb 26 Python

linux中nohup和后台运行进程查看及终止

Jun 24 Python

python3.x实现发送邮件功能

May 22 #Python

python 爬虫批量获取代理ip的实例代码

May 22 #Python

python 获取当天每个准点时间戳的实例

May 22 #Python

selenium+python 去除启动的黑色cmd窗口方法

May 22 #Python

python3实现163邮箱SMTP发送邮件

May 22 #Python

django请求返回不同的类型图片json,xml,html的实例

May 22 #Python

Django使用HttpResponse返回图片并显示的方法

May 22 #Python

You might like

php中使用DOM类读取XML文件的实现代码

2011/12/14 PHP

基于PHP读取csv文件内容的详解

2013/06/18 PHP

php设计模式之简单工厂模式详解

2014/09/04 PHP

javascript中运用闭包和自执行函数解决大量的全局变量问题

2010/12/30 Javascript

window.location.href中url中数据量太大时的解决方法

2013/12/23 Javascript

nodejs npm install全局安装和本地安装的区别

2014/06/05 NodeJs

浅谈JSON和JSONP区别及jQuery的ajax jsonp的使用

2014/11/23 Javascript

纯JS实现本地图片预览的方法

2015/07/31 Javascript

javascript实现多栏闭合展开式广告位菜单效果实例

2015/08/05 Javascript

js文本框走动跑马灯效果代码分享

2015/08/25 Javascript

基于jQuery实现动态搜索显示功能

2016/05/05 Javascript

浅谈js的ajax的异步和同步请求的问题

2016/10/07 Javascript

javascript读取文本节点方法小结

2016/12/15 Javascript

自学实现angularjs依赖注入

2016/12/20 Javascript

JavaScript日期对象(Date)基本用法示例

2017/01/18 Javascript

vue操作dom元素的3种方法示例

2020/09/20 Javascript

Vue3不支持Filters过滤器的问题

2020/09/24 Javascript

[50:21]Liquid vs Winstrike 2018国际邀请赛小组赛BO2 第二场

2018/08/19 DOTA

[34:39]Secret vs VG 2018国际邀请赛淘汰赛BO3 第二场 8.23

2018/08/24 DOTA

用实例分析Python中method的参数传递过程

2015/04/02 Python

python删除指定类型（或非指定）的文件实例详解

2015/07/06 Python

Python更新数据库脚本两种方法及对比介绍

2017/07/27 Python

解决Python3 被PHP程序调用执行返回乱码的问题

2019/02/16 Python

Python处理时间日期坐标轴过程详解

2019/06/25 Python

解决Jupyter notebook中.py与.ipynb文件的import问题

2020/04/21 Python

基于SpringBoot构造器注入循环依赖及解决方式

2020/04/26 Python

Python暴力破解Mysql数据的示例

2020/11/09 Python

西班牙灯具网上商店：Lampara.es

2018/06/05 全球购物

Under Armour安德玛法国官网：美国高端运动科技品牌

2018/06/29 全球购物

大一新生军训时的自我评价分享

2013/12/05 职场文书

实习生自我鉴定

2013/12/12 职场文书

《阳光》教学反思

2014/02/23 职场文书

洗发水广告词

2014/03/13 职场文书

美国留学经济担保书

2014/05/20 职场文书

毕业班工作总结

2015/08/10 职场文书

JS前端监控采集用户行为的N种姿势

2022/07/23 Javascript

python 爬虫一键爬取 淘宝天猫宝贝页面主图颜色图和详情图的教程

python 爬虫一键爬取淘宝天猫宝贝页面主图颜色图和详情图的教程