Python 批量刷博客园访问量脚本过程解析


Posted in Python onAugust 30, 2019

今早无聊。。。7点起来突然想写个刷访问量的。。那就动手吧

仅供测试,不建议刷访问量哦~~

很简单的思路,第一步提取代理ip,第二步模拟访问。

提取HTTP代理IP

网上很多收费的代理和免费的代理IP

如:

Python 批量刷博客园访问量脚本过程解析

无论哪个网站,我们需要的就是爬取上面的ip和端口号,整理到一起。

具体的网站根据具体的结构爬取 比如上面那个网站,ip和端口在td标签

Python 批量刷博客园访问量脚本过程解析

这里利用bs4爬取即可。贴上脚本

##获取代理ip
def Get_proxy_ip():
  print("==========批量提取ip刷博客园访问量 By 卿=========")
  print("     Blogs:https://www.cnblogs.com/-qing-/")
  print("     Started!   ")
  global proxy_list
  proxy_list = []
  url = "https://www.kuaidaili.com/free/inha/"
  headers = {
      "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
      "Accept-Encoding":"gzip, deflate, sdch, br",
      "Accept-Language":"zh-CN,zh;q=0.8",
      "Cache-Control":"max-age=0",
      "Connection":"keep-alive",
      "Cookie":"channelid=0; sid=1561681200472193; _ga=GA1.2.762166746.1561681203; _gid=GA1.2.971407760.1561681203; _gat=1; Hm_lvt_7ed65b1cc4b810e9fd37959c9bb51b31=1561681203; Hm_lpvt_7ed65b1cc4b810e9fd37959c9bb51b31=1561681203",
      "Host":"www.kuaidaili.com",
      "Upgrade-Insecure-Requests":"1",
      "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0",
      "Referrer Policy":"no-referrer-when-downgrade",
      }
  for i in range(1,100):
    url = url = "https://www.kuaidaili.com/free/inha/"+str(i)
    html = requests.get(url = url,headers = headers).content
    soup = BeautifulSoup(html,'html.parser')
    ip_list = '';
    port_list = '';
    protocol_list = '';
    for ip in soup.find_all('td'):
      if "IP" in ip.get('data-title') :
        ip_list = ip.get_text()##获取ip       
      if "PORT" in ip.get('data-title'):
        port_list = ip.get_text()##获取port
      if ip_list != '' and port_list != '':
        proxy = ip_list+":"+port_list
        ip_list = '';
        port_list = '';
        proxy_list.append(proxy)
    iv_main()
    time.sleep(2)
    proxy_list = []

这样就把 提取的ip和端口放到列表里

模拟访问刷博客园文章

这里就很简单了 ,遍历上面那个代理ip的列表,使用requests模块取访问就是了

def iv_main():
  proxies = {}
  requests.packages.urllib3.disable_warnings()
  #proxy_ip = random.choice(proxy_list)
  url = 'https://www.cnblogs.com/-qing-/p/11080845.html'
  for proxy_ip in proxy_list:
    headers2 = {
      'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
      'accept-encoding':'gzip, deflate, sdch, br',
      'accept-language':'zh-CN,zh;q=0.8',
      'cache-control':'max-age=0',
      'cookie':'__gads=ID=8c6fd85d91262bb1:T=1561554219:S=ALNI_MZwz0CMKQJK-L19DrX5DPDtYvp63Q; _gat=1; _ga=GA1.2.359634670.1561535095; _gid=GA1.2.1087331661.1561535095',
      'if-modified-since':'Fri, 28 Jun 2019 02:10:23 GMT',
      'referer':'https://www.cnblogs.com/',
      'upgrade-insecure-requests':'1',
      'user-agent':random.choice(user_agent_list),
      }
    proxies['HTTP'] = proxy_ip
    #user_agent = random.choice(user_agent_list)
    try:
      r = requests.get(url,headers=headers2,proxies=proxies,verify=False) #verify是否验证服务器的SSL证书
      print("[*]"+proxy_ip+"访问成功!")
    except:
      print("[-]"+proxy_ip+"访问失败!")

最好带上随机的ua请求头

user_agent_list = [
  "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) "
           "Chrome/45.0.2454.85 Safari/537.36 115Browser/6.0.3",
  "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
  "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
  "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
  "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
  "Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SE 2.X MetaSr 1.0; SE 2.X MetaSr 1.0; .NET CLR 2.0.50727; SE 2.X MetaSr 1.0)",
  "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0",
  "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
]

优化整合

这里可以稍微优化下,加入队列线程优化(虽然python这个没啥用)

最终代码整合:

# -*- coding:utf-8 -*-
#By 卿 
#Blog:https://www.cnblogs.com/-qing-/

import requests
from bs4 import BeautifulSoup
import re
import time
import random
import threading

print("==========批量提取ip刷博客园访问量 By 卿=========")
print("     Blogs:https://www.cnblogs.com/-qing-/")
print("     Started!   ")
user_agent_list = [
  "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) "
           "Chrome/45.0.2454.85 Safari/537.36 115Browser/6.0.3",
  "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
  "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
  "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
  "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
  "Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SE 2.X MetaSr 1.0; SE 2.X MetaSr 1.0; .NET CLR 2.0.50727; SE 2.X MetaSr 1.0)",
  "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0",
  "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
]
def iv_main():
  proxies = {}
  requests.packages.urllib3.disable_warnings()
  #proxy_ip = random.choice(proxy_list)
  url = 'https://www.cnblogs.com/-qing-/p/11080845.html'
  for proxy_ip in proxy_list:
    headers2 = {
      'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
      'accept-encoding':'gzip, deflate, sdch, br',
      'accept-language':'zh-CN,zh;q=0.8',
      'cache-control':'max-age=0',
      'cookie':'__gads=ID=8c6fd85d91262bb1:T=1561554219:S=ALNI_MZwz0CMKQJK-L19DrX5DPDtYvp63Q; _gat=1; _ga=GA1.2.359634670.1561535095; _gid=GA1.2.1087331661.1561535095',
      'if-modified-since':'Fri, 28 Jun 2019 02:10:23 GMT',
      'referer':'https://www.cnblogs.com/',
      'upgrade-insecure-requests':'1',
      'user-agent':random.choice(user_agent_list),
      }
    proxies['HTTP'] = proxy_ip
    #user_agent = random.choice(user_agent_list)
    try:
      r = requests.get(url,headers=headers2,proxies=proxies,verify=False) #verify是否验证服务器的SSL证书
      print("[*]"+proxy_ip+"访问成功!")
    except:
      print("[-]"+proxy_ip+"访问失败!")

##获取代理ip
def Get_proxy_ip():
  global proxy_list
  proxy_list = []
  url = "https://www.kuaidaili.com/free/inha/"
  headers = {
      "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
      "Accept-Encoding":"gzip, deflate, sdch, br",
      "Accept-Language":"zh-CN,zh;q=0.8",
      "Cache-Control":"max-age=0",
      "Connection":"keep-alive",
      "Cookie":"channelid=0; sid=1561681200472193; _ga=GA1.2.762166746.1561681203; _gid=GA1.2.971407760.1561681203; _gat=1; Hm_lvt_7ed65b1cc4b810e9fd37959c9bb51b31=1561681203; Hm_lpvt_7ed65b1cc4b810e9fd37959c9bb51b31=1561681203",
      "Host":"www.kuaidaili.com",
      "Upgrade-Insecure-Requests":"1",
      "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0",
      "Referrer Policy":"no-referrer-when-downgrade",
      }
  for i in range(1,100):
    url = url = "https://www.kuaidaili.com/free/inha/"+str(i)
    html = requests.get(url = url,headers = headers).content
    soup = BeautifulSoup(html,'html.parser')
    ip_list = '';
    port_list = '';
    protocol_list = '';
    for ip in soup.find_all('td'):
      if "IP" in ip.get('data-title') :
        ip_list = ip.get_text()##获取ip       
      if "PORT" in ip.get('data-title'):
        port_list = ip.get_text()##获取port
      if ip_list != '' and port_list != '':
        proxy = ip_list+":"+port_list
        ip_list = '';
        port_list = '';
        proxy_list.append(proxy)
    iv_main()
    time.sleep(2)
    proxy_list = []    
th=[]
th_num=10
for x in range(th_num):
    t=threading.Thread(target=Get_proxy_ip)
    th.append(t)
for x in range(th_num):
    th[x].start()
for x in range(th_num):
    th[x].join()

结果

Python 批量刷博客园访问量脚本过程解析

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python3访问sina首页中文的处理方法
Feb 24 Python
python获取文件真实链接的方法,针对于302返回码
May 14 Python
TensorFlow数据输入的方法示例
Jun 19 Python
Numpy之random函数使用学习
Jan 29 Python
Python 利用切片从列表中取出一部分使用的方法
Feb 01 Python
Pandas读写CSV文件的方法示例
Mar 27 Python
PyQt5响应回车事件的方法
Jun 25 Python
python批量修改图片尺寸,并保存指定路径的实现方法
Jul 04 Python
Django 查询数据库并返回页面的例子
Aug 12 Python
解决更改AUTH_USER_MODEL后出现的问题
May 14 Python
python 批量下载bilibili视频的gui程序
Nov 20 Python
Python实现灰色关联分析与结果可视化的详细代码
Mar 25 Python
快速解决docker-py api版本不兼容的问题
Aug 30 #Python
Python 使用 Pillow 模块给图片添加文字水印的方法
Aug 30 #Python
python pillow模块使用方法详解
Aug 30 #Python
docker-py 用Python调用Docker接口的方法
Aug 30 #Python
tesserocr与pytesseract模块的使用方法解析
Aug 30 #Python
Django获取应用下的所有models的例子
Aug 30 #Python
Django自带日志 settings.py文件配置方法
Aug 30 #Python
You might like
php下删除字符串中HTML标签的函数
2008/08/27 PHP
解决163/sohu/sina不能够收到PHP MAIL函数发出邮件的问题
2009/03/13 PHP
基于PHP CURL获取邮箱地址的详解
2013/06/03 PHP
php输入流php://input使用浅析
2014/09/02 PHP
PHP实现的分页类定义与用法示例
2017/07/05 PHP
简单实用的PHP文本缓存类实例
2019/03/22 PHP
nodejs 后缀名判断限制代码
2011/03/31 NodeJs
js变换显示图片的实例
2013/04/16 Javascript
jquery获取div距离窗口和父级dv的距离示例
2013/10/10 Javascript
自动完成的搜索框javascript实现
2016/02/26 Javascript
使用jQuery的toggle()方法对HTML标签进行显示、隐藏的方法(示例)
2016/09/01 Javascript
js实现用户输入的小写字母自动转大写字母的方法
2017/01/21 Javascript
input获取焦点时底部菜单被顶上来问题的解决办法
2017/01/24 Javascript
SVG动画vivus.js库使用小结(实例代码)
2017/09/14 Javascript
详解webpack提取第三方库的正确姿势
2017/12/22 Javascript
jQuery第一次运行页面默认触发点击事件的实例
2018/01/10 jQuery
VUE DOM加载后执行自定义事件的方法
2018/09/07 Javascript
用VueJS写一个Chrome浏览器插件的实现方法
2019/02/27 Javascript
微信小程序12行js代码自己写个滑块功能(推荐)
2020/07/15 Javascript
[45:59]完美世界DOTA2联赛PWL S2 FTD vs GXR 第二场 11.22
2020/11/24 DOTA
python操作ie登陆土豆网的方法
2015/05/09 Python
Python实现图像几何变换
2015/07/06 Python
python安装与使用redis的方法
2016/04/19 Python
python+selenium实现163邮箱自动登陆的方法
2017/12/31 Python
TensorFlow神经网络优化策略学习
2018/03/09 Python
特征脸(Eigenface)理论基础之PCA主成分分析法
2018/03/13 Python
python如何在循环引用中管理内存
2018/03/20 Python
Python简单定义与使用二叉树示例
2018/05/11 Python
selenium跳过webdriver检测并模拟登录淘宝
2019/06/12 Python
使用Django的JsonResponse返回数据的实现
2021/01/15 Python
python实现图片转字符画的完整代码
2021/02/21 Python
使用HTML5进行SVG矢量图形绘制的入门教程
2016/02/19 HTML / CSS
Tirendo比利时:在线购买轮胎
2018/10/22 全球购物
美国快时尚彩妆品牌:Winky Lux(透明花瓣润唇膏)
2018/11/06 全球购物
四年大学生活的个人自我评价
2013/12/11 职场文书
雷锋的观后感
2015/06/10 职场文书