编程 Python

python爬虫爬取淘宝商品信息（selenum+phontomjs）

Posted in Python onFebruary 24, 2018

本文实例为大家分享了python爬虫爬取淘宝商品的具体代码，供大家参考，具体内容如下

1、需求目标 ：

进去淘宝页面，搜索耐克关键词，抓取商品的标题，链接，价格，城市，旺旺号，付款人数，进去第二层，抓取商品的销售量，款号等。

2、结果展示

python爬虫爬取淘宝商品信息（selenum+phontomjs）

3、源代码

# encoding: utf-8
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import time
import pandas as pd
time1=time.time()
from lxml import etree
from selenium import webdriver
#########自动模拟
driver=webdriver.PhantomJS(executable_path='D:/Python27/Scripts/phantomjs.exe')
import re

#################定义列表存储#############
title=[]
price=[]
city=[]
shop_name=[]
num=[]
link=[]
sale=[]
number=[]

#####输入关键词耐克(这里必须用unicode)
keyword="%E8%80%90%E5%85%8B"


for i in range(0,1):

  try:
    print "...............正在抓取第"+str(i)+"页..........................."

    url="https://s.taobao.com/search?q=%E8%80%90%E5%85%8B&imgfile=&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_20170710&ie=utf8&bcoffset=4&ntoffset=4&p4ppushleft=1%2C48&s="+str(i*44)
    driver.get(url)
    time.sleep(5)
    html=driver.page_source

    selector=etree.HTML(html)
    title1=selector.xpath('//div[@class="row row-2 title"]/a')
    for each in title1:
      print each.xpath('string(.)').strip()
      title.append(each.xpath('string(.)').strip())


    price1=selector.xpath('//div[@class="price g_price g_price-highlight"]/strong/text()')
    for each in price1:
      print each
      price.append(each)


    city1=selector.xpath('//div[@class="location"]/text()')
    for each in city1:
      print each
      city.append(each)


    num1=selector.xpath('//div[@class="deal-cnt"]/text()')
    for each in num1:
      print each
      num.append(each)


    shop_name1=selector.xpath('//div[@class="shop"]/a/span[2]/text()')
    for each in shop_name1:
      print each
      shop_name.append(each)


    link1=selector.xpath('//div[@class="row row-2 title"]/a/@href')
    for each in link1:
      kk="https://" + each


      link.append("https://" + each)
      if "https" in each:
        print each

        driver.get(each)
      else:
        print "https://" + each
        driver.get("https://" + each)
      time.sleep(3)
      html2=driver.page_source
      selector2=etree.HTML(html2)

      sale1=selector2.xpath('//*[@id="J_DetailMeta"]/div[1]/div[1]/div/ul/li[1]/div/span[2]/text()')
      for each in sale1:
        print each
        sale.append(each)

      sale2=selector2.xpath('//strong[@id="J_SellCounter"]/text()')
      for each in sale2:
        print each
        sale.append(each)

      if "tmall" in kk:
        number1 = re.findall('<ul id="J_AttrUL">(.*?)</ul>', html2, re.S)
        for each in number1:
          m = re.findall('>*号: (.*?)</li>', str(each).strip(), re.S)
          if len(m) > 0:
            for each1 in m:
              print each1
              number.append(each1)

          else:
            number.append("NULL")

      if "taobao" in kk:
        number2=re.findall('<ul class="attributes-list">(.*?)</ul>',html2,re.S)
        for each in number2:
          h=re.findall('>*号: (.*?)</li>', str(each).strip(), re.S)
          if len(m) > 0:
            for each2 in h:
              print each2
              number.append(each2)

          else:
            number.append("NULL")

      if "click" in kk:
        number.append("NULL")

  except:
    pass


print len(title),len(city),len(price),len(num),len(shop_name),len(link),len(sale),len(number)

# #
# ######数据框
data1=pd.DataFrame({"标题":title,"价格":price,"旺旺":shop_name,"城市":city,"付款人数":num,"链接":link,"销量":sale,"款号":number})
print data1
# 写出excel
writer = pd.ExcelWriter(r'C:\\taobao_spider2.xlsx', engine='xlsxwriter', options={'strings_to_urls': False})
data1.to_excel(writer, index=False)
writer.close()

time2 = time.time()
print u'ok,爬虫结束!'
print u'总共耗时：' + str(time2 - time1) + 's'
####关闭浏览器
driver.close()

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持三水点靠木。

python爬虫爬取淘宝商品信息（selenum+phontomjs）

- Author -

开心果汁

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

python中sets模块的用法实例

Sep 30 Python

Python中List.index()方法的使用教程

May 20 Python

Python中使用支持向量机SVM实践

Dec 27 Python

python测试mysql写入性能完整实例

Jan 18 Python

基于Python实现用户管理系统

Feb 26 Python

Numpy中对向量、矩阵的使用详解

Oct 29 Python

django框架中ajax的使用及避开CSRF 验证的方式详解

Dec 11 Python

借助Paramiko通过Python实现linux远程登陆及sftp的操作

Mar 16 Python

python实现数字炸弹游戏程序

Jul 17 Python

用于ETL的Python数据转换工具详解

Jul 21 Python

Python Parser的用法

May 12 Python

Python爬虫基础初探selenium

May 31 Python

python正则表达式爬取猫眼电影top100

Feb 24 #Python

python爬虫获取淘宝天猫商品详细参数

Jun 23 #Python

python按综合、销量排序抓取100页的淘宝商品列表信息

Feb 24 #Python

python2.7+selenium2实现淘宝滑块自动认证功能

Feb 24 #Python

Python 中Pickle库的使用详解

Feb 24 #Python

Python使用Selenium+BeautifulSoup爬取淘宝搜索页

Feb 24 #Python

python3+mysql查询数据并通过邮件群发excel附件

Feb 24 #Python

You might like

Cappuccino 卡布其诺咖啡之制作

2021/03/03 冲泡冲煮

PHP引用返回用法示例

2016/05/28 PHP

CentOS7编译安装php7.1的教程详解

2019/04/18 PHP

设为首页加入收藏兼容360/火狐/谷歌/IE等主流浏览器的代码

2013/03/26 Javascript

Chrome扩展页面动态绑定JS事件提示错误

2014/02/11 Javascript

Egret引擎开发指南之发布项目

2014/09/03 Javascript

每天一篇javascript学习小结（属性定义方法）

2015/11/19 Javascript

jquery制作图片时钟特效

2020/03/30 Javascript

JS获取子窗口中返回的数据实现方法

2016/05/28 Javascript

AngularJS 表达式详细讲解及实例代码

2016/07/26 Javascript

基于JavaScript实现鼠标箭头移动图片跟着移动

2016/08/30 Javascript

JS优化与惰性载入函数实例分析

2017/04/06 Javascript

js下载文件并修改文件名

2017/05/08 Javascript

JS创建Tag标签的方法详解

2017/06/09 Javascript

vue.js 底部导航栏一级路由显示子路由不显示的解决方法

2018/03/09 Javascript

在vue中使用防抖函数组件操作

2020/07/26 Javascript

[04:13]2018国际邀请赛典藏宝瓶Ⅱ饰品一览

2018/07/21 DOTA

Django框架中的对象列表视图使用示例

2015/07/21 Python

Python设计实现的计算器功能完整实例

2017/08/18 Python

itchat和matplotlib的结合使用爬取微信信息的实例

2017/08/25 Python

Python文件监听工具pyinotify与watchdog实例

2018/10/15 Python

基于Numpy.convolve使用Python实现滑动平均滤波的思路详解

2019/05/16 Python

Mac中PyCharm配置Anaconda环境的方法

2020/03/04 Python

CSS3实现网站商品展示效果图

2020/01/18 HTML / CSS

阿迪达斯奥地利官方商城：adidas.at

2016/10/16 全球购物

芭比波朗加拿大官方网站：Bobbi Brown Cosmetics CA

2020/11/05 全球购物

工作个人的自我评价

2014/01/14 职场文书

网上快餐厅创业计划书

2014/02/01 职场文书

公安民警正风肃纪剖析材料

2014/10/10 职场文书

党的群众路线教育实践活动个人整改措施范文

2014/11/04 职场文书

自主招生专家推荐信

2015/03/26 职场文书

青春雷锋观后感

2015/06/10 职场文书

离职信范本

2015/06/23 职场文书

四群教育工作总结

2015/08/10 职场文书

Eclipse+Java+Swing+Mysql实现电影购票系统（详细代码）

2022/01/18 Java/Android

python的html标准库

2022/04/29 Python