win7 x64系统中安装Scrapy的方法


Posted in Python onNovember 18, 2018

scrapy是用python开发的爬虫框架,从网上查了安装方法,感觉都说的挺复杂,而且很多教程都很有年头了,于是记录了自己的安装过程。

首先安装python,地址:https://www.python.org/downloads/release/python-2710/,注意根据你的系统下64位(Windows x86-64 MSI installer)还是32位的(Windows x86 MSI installer)。

现在是python3.6的天下了,建议大家安装python3版本。

装完以后就可以安装scrapy了,推荐使用pip方式安装,因为scrapy需要调用很多额外的库,pip会全部帮你安装好,不需要你在到处翻找了。

pip在python安装完后就已经有了,不需要额外安装,下面只要按照scrapy官网推荐的方法在命令提示符中输入pip installscrapy(图1),然后只需静静等待即可大功告成。

win7 x64系统中安装Scrapy的方法

图1

装完以后可以敲入命令pip list看看已安装的库(图2),出来很多啊,pip真是好东西。

win7 x64系统中安装Scrapy的方法

图2

现在试下看看建个爬虫项目,按照说明文档键入命令scrapy startproject tutorial,目录已经出来(图3),看来是没问题了。但为了验证是否安装成功,还得跑一下看看,第一次创建项目的时候,系统会提示可以跑个例子看看(图4)。按照提示键入命令

win7 x64系统中安装Scrapy的方法


图3

win7 x64系统中安装Scrapy的方法

图4

scrapy genspider example example.com创建一个爬虫,再键入命令scrapy crawl example

运行爬虫,结果如下(图5),报错了,貌似是缺少win32api,立即上网下了一个(http://sourceforge.net/projects/pywin32/files/pywin32/Build%20219/),

win7 x64系统中安装Scrapy的方法

图5

下的时候注意对应的python版本。win32api装好以后再运行一次爬虫(图6),这次成功了,应该是没问题了。

win7 x64系统中安装Scrapy的方法

图6

总结一下,其实刚开始网上找资料的时候看到上面写的要先装这个库那个库的时候心中很忐忑,结果发现不是很复杂,大多数问题pip都给解决了,剩下的就是具体问题具体研究,不过也没碰到很复杂解决不了的问题。另外吐下槽就是网上的教程互抄的太厉害,看着一搜一堆,其实多数都大同小异,真正有价值的没几个,没大腿抱就是辛苦呀。

最后说一下,scrapy目前还不支持python3.x版本,我用的是python2.7,如果你碰到莫名其妙的问题时请先看看自己有没有装错python版本。

下面是其他网友补充的文章

环境

Windows7 64位
Python2.7.6 64位

Python的安装:

  • 打开http://www.python.org/getit/releases/2.7.6/页面,下载Python-2.7.6.amd64.msi 进行安装,安装完成后,需要配置环境变量,环境变量的配置可以参考该文章
  • 测试python是否安装成功,如果python成功安装并且配置好环境变量,那么在cmd中输入python,就能得到python版本的详细信息(如32位或64位)
C:\Users\Administrator>python
Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win
32

easy_install的安装

保存ez_setup.py至本地,如D盘(如果失效了,可以参考下https://3water.com/article/151027.htm

#!/usr/bin/env python

"""
Setuptools bootstrapping installer.

Maintained at https://github.com/pypa/setuptools/tree/bootstrap.

Run this script to install or upgrade setuptools.

This method is DEPRECATED. Check https://github.com/pypa/setuptools/issues/581 for more details.
"""

import os
import shutil
import sys
import tempfile
import zipfile
import optparse
import subprocess
import platform
import textwrap
import contextlib

from distutils import log

try:
 from urllib.request import urlopen
except ImportError:
 from urllib2 import urlopen

try:
 from site import USER_SITE
except ImportError:
 USER_SITE = None

# 33.1.1 is the last version that supports setuptools self upgrade/installation.
DEFAULT_VERSION = "33.1.1"
DEFAULT_URL = "https://pypi.io/packages/source/s/setuptools/"
DEFAULT_SAVE_DIR = os.curdir
DEFAULT_DEPRECATION_MESSAGE = "ez_setup.py is deprecated and when using it setuptools will be pinned to {0} since it's the last version that supports setuptools self upgrade/installation, check https://github.com/pypa/setuptools/issues/581 for more info; use pip to install setuptools"

MEANINGFUL_INVALID_ZIP_ERR_MSG = 'Maybe {0} is corrupted, delete it and try again.'

log.warn(DEFAULT_DEPRECATION_MESSAGE.format(DEFAULT_VERSION))


def _python_cmd(*args):
 """
 Execute a command.

 Return True if the command succeeded.
 """
 args = (sys.executable,) + args
 return subprocess.call(args) == 0


def _install(archive_filename, install_args=()):
 """Install Setuptools."""
 with archive_context(archive_filename):
 # installing
 log.warn('Installing Setuptools')
 if not _python_cmd('setup.py', 'install', *install_args):
  log.warn('Something went wrong during the installation.')
  log.warn('See the error message above.')
  # exitcode will be 2
  return 2


def _build_egg(egg, archive_filename, to_dir):
 """Build Setuptools egg."""
 with archive_context(archive_filename):
 # building an egg
 log.warn('Building a Setuptools egg in %s', to_dir)
 _python_cmd('setup.py', '-q', 'bdist_egg', '--dist-dir', to_dir)
 # returning the result
 log.warn(egg)
 if not os.path.exists(egg):
 raise IOError('Could not build the egg.')


class ContextualZipFile(zipfile.ZipFile):

 """Supplement ZipFile class to support context manager for Python 2.6."""

 def __enter__(self):
 return self

 def __exit__(self, type, value, traceback):
 self.close()

 def __new__(cls, *args, **kwargs):
 """Construct a ZipFile or ContextualZipFile as appropriate."""
 if hasattr(zipfile.ZipFile, '__exit__'):
  return zipfile.ZipFile(*args, **kwargs)
 return super(ContextualZipFile, cls).__new__(cls)


@contextlib.contextmanager
def archive_context(filename):
 """
 Unzip filename to a temporary directory, set to the cwd.

 The unzipped target is cleaned up after.
 """
 tmpdir = tempfile.mkdtemp()
 log.warn('Extracting in %s', tmpdir)
 old_wd = os.getcwd()
 try:
 os.chdir(tmpdir)
 try:
  with ContextualZipFile(filename) as archive:
  archive.extractall()
 except zipfile.BadZipfile as err:
  if not err.args:
  err.args = ('', )
  err.args = err.args + (
  MEANINGFUL_INVALID_ZIP_ERR_MSG.format(filename),
  )
  raise

 # going in the directory
 subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0])
 os.chdir(subdir)
 log.warn('Now working in %s', subdir)
 yield

 finally:
 os.chdir(old_wd)
 shutil.rmtree(tmpdir)


def _do_download(version, download_base, to_dir, download_delay):
 """Download Setuptools."""
 py_desig = 'py{sys.version_info[0]}.{sys.version_info[1]}'.format(sys=sys)
 tp = 'setuptools-{version}-{py_desig}.egg'
 egg = os.path.join(to_dir, tp.format(**locals()))
 if not os.path.exists(egg):
 archive = download_setuptools(version, download_base,
  to_dir, download_delay)
 _build_egg(egg, archive, to_dir)
 sys.path.insert(0, egg)

 # Remove previously-imported pkg_resources if present (see
 # https://bitbucket.org/pypa/setuptools/pull-request/7/ for details).
 if 'pkg_resources' in sys.modules:
 _unload_pkg_resources()

 import setuptools
 setuptools.bootstrap_install_from = egg


def use_setuptools(
 version=DEFAULT_VERSION, download_base=DEFAULT_URL,
 to_dir=DEFAULT_SAVE_DIR, download_delay=15):
 """
 Ensure that a setuptools version is installed.

 Return None. Raise SystemExit if the requested version
 or later cannot be installed.
 """
 to_dir = os.path.abspath(to_dir)

 # prior to importing, capture the module state for
 # representative modules.
 rep_modules = 'pkg_resources', 'setuptools'
 imported = set(sys.modules).intersection(rep_modules)

 try:
 import pkg_resources
 pkg_resources.require("setuptools>=" + version)
 # a suitable version is already installed
 return
 except ImportError:
 # pkg_resources not available; setuptools is not installed; download
 pass
 except pkg_resources.DistributionNotFound:
 # no version of setuptools was found; allow download
 pass
 except pkg_resources.VersionConflict as VC_err:
 if imported:
  _conflict_bail(VC_err, version)

 # otherwise, unload pkg_resources to allow the downloaded version to
 # take precedence.
 del pkg_resources
 _unload_pkg_resources()

 return _do_download(version, download_base, to_dir, download_delay)


def _conflict_bail(VC_err, version):
 """
 Setuptools was imported prior to invocation, so it is
 unsafe to unload it. Bail out.
 """
 conflict_tmpl = textwrap.dedent("""
 The required version of setuptools (>={version}) is not available,
 and can't be installed while this script is running. Please
 install a more recent version first, using
 'easy_install -U setuptools'.

 (Currently using {VC_err.args[0]!r})
 """)
 msg = conflict_tmpl.format(**locals())
 sys.stderr.write(msg)
 sys.exit(2)


def _unload_pkg_resources():
 sys.meta_path = [
 importer
 for importer in sys.meta_path
 if importer.__class__.__module__ != 'pkg_resources.extern'
 ]
 del_modules = [
 name for name in sys.modules
 if name.startswith('pkg_resources')
 ]
 for mod_name in del_modules:
 del sys.modules[mod_name]


def _clean_check(cmd, target):
 """
 Run the command to download target.

 If the command fails, clean up before re-raising the error.
 """
 try:
 subprocess.check_call(cmd)
 except subprocess.CalledProcessError:
 if os.access(target, os.F_OK):
  os.unlink(target)
 raise


def download_file_powershell(url, target):
 """
 Download the file at url to target using Powershell.

 Powershell will validate trust.
 Raise an exception if the command cannot complete.
 """
 target = os.path.abspath(target)
 ps_cmd = (
 "[System.Net.WebRequest]::DefaultWebProxy.Credentials = "
 "[System.Net.CredentialCache]::DefaultCredentials; "
 '(new-object System.Net.WebClient).DownloadFile("%(url)s", "%(target)s")'
 % locals()
 )
 cmd = [
 'powershell',
 '-Command',
 ps_cmd,
 ]
 _clean_check(cmd, target)


def has_powershell():
 """Determine if Powershell is available."""
 if platform.system() != 'Windows':
 return False
 cmd = ['powershell', '-Command', 'echo test']
 with open(os.path.devnull, 'wb') as devnull:
 try:
  subprocess.check_call(cmd, stdout=devnull, stderr=devnull)
 except Exception:
  return False
 return True
download_file_powershell.viable = has_powershell


def download_file_curl(url, target):
 cmd = ['curl', url, '--location', '--silent', '--output', target]
 _clean_check(cmd, target)


def has_curl():
 cmd = ['curl', '--version']
 with open(os.path.devnull, 'wb') as devnull:
 try:
  subprocess.check_call(cmd, stdout=devnull, stderr=devnull)
 except Exception:
  return False
 return True
download_file_curl.viable = has_curl


def download_file_wget(url, target):
 cmd = ['wget', url, '--quiet', '--output-document', target]
 _clean_check(cmd, target)


def has_wget():
 cmd = ['wget', '--version']
 with open(os.path.devnull, 'wb') as devnull:
 try:
  subprocess.check_call(cmd, stdout=devnull, stderr=devnull)
 except Exception:
  return False
 return True
download_file_wget.viable = has_wget


def download_file_insecure(url, target):
 """Use Python to download the file, without connection authentication."""
 src = urlopen(url)
 try:
 # Read all the data in one block.
 data = src.read()
 finally:
 src.close()

 # Write all the data in one block to avoid creating a partial file.
 with open(target, "wb") as dst:
 dst.write(data)
download_file_insecure.viable = lambda: True


def get_best_downloader():
 downloaders = (
 download_file_powershell,
 download_file_curl,
 download_file_wget,
 download_file_insecure,
 )
 viable_downloaders = (dl for dl in downloaders if dl.viable())
 return next(viable_downloaders, None)


def download_setuptools(
 version=DEFAULT_VERSION, download_base=DEFAULT_URL,
 to_dir=DEFAULT_SAVE_DIR, delay=15,
 downloader_factory=get_best_downloader):
 """
 Download setuptools from a specified location and return its filename.

 `version` should be a valid setuptools version number that is available
 as an sdist for download under the `download_base` URL (which should end
 with a '/'). `to_dir` is the directory where the egg will be downloaded.
 `delay` is the number of seconds to pause before an actual download
 attempt.

 ``downloader_factory`` should be a function taking no arguments and
 returning a function for downloading a URL to a target.
 """
 # making sure we use the absolute path
 to_dir = os.path.abspath(to_dir)
 zip_name = "setuptools-%s.zip" % version
 url = download_base + zip_name
 saveto = os.path.join(to_dir, zip_name)
 if not os.path.exists(saveto): # Avoid repeated downloads
 log.warn("Downloading %s", url)
 downloader = downloader_factory()
 downloader(url, saveto)
 return os.path.realpath(saveto)


def _build_install_args(options):
 """
 Build the arguments to 'python setup.py install' on the setuptools package.

 Returns list of command line arguments.
 """
 return ['--user'] if options.user_install else []


def _parse_args():
 """Parse the command line for options."""
 parser = optparse.OptionParser()
 parser.add_option(
 '--user', dest='user_install', action='store_true', default=False,
 help='install in user site package')
 parser.add_option(
 '--download-base', dest='download_base', metavar="URL",
 default=DEFAULT_URL,
 help='alternative URL from where to download the setuptools package')
 parser.add_option(
 '--insecure', dest='downloader_factory', action='store_const',
 const=lambda: download_file_insecure, default=get_best_downloader,
 help='Use internal, non-validating downloader'
 )
 parser.add_option(
 '--version', help="Specify which version to download",
 default=DEFAULT_VERSION,
 )
 parser.add_option(
 '--to-dir',
 help="Directory to save (and re-use) package",
 default=DEFAULT_SAVE_DIR,
 )
 options, args = parser.parse_args()
 # positional arguments are ignored
 return options


def _download_args(options):
 """Return args for download_setuptools function from cmdline args."""
 return dict(
 version=options.version,
 download_base=options.download_base,
 downloader_factory=options.downloader_factory,
 to_dir=options.to_dir,
 )


def main():
 """Install or upgrade setuptools and EasyInstall."""
 options = _parse_args()
 archive = download_setuptools(**_download_args(options))
 return _install(archive, _build_install_args(options))

if __name__ == '__main__':
 sys.exit(main())

在cmd中运行:

d:\>python ez_setup.py

进行SetupTools的安装

在运行的时候会发生一个错误,该错误为"ascii codec can't decode byte 0xe8 in position 0:ordinal not in range(128)",大意为ascii编码不能解析byte 0xe8。
解决方法:找到并打开python根目录/Lib/mimetypes.py文件,在import urllib后,添加代码:

reload(sys)
sys.setdefaultencoding('gbk')

把默认编码方式改为gbk(网上有写用utf8的,在这个脚本中是无效的,需要改成gbk格式)。重新执行python ez_setup.py,如果出现刷屏的安装信息,则说明安装成功了。此时,在python目录下多了一个Script文件夹,easy_install就在里面

Scrapy依赖项的安装

Scrapy的依赖项

安装lxml-3.2.4.win32-py2.7.exe(64位系统需要安装lxml-3.2.4.win-amd64-py2.7.exe)
安装pywin32-218.win32-py2.7.exe(64位系统需要安装pywin32-218.win-amd64-py2.7.exe)
安装Twisted-13.2.0.win32-py2.7.exe(64位系统需要安装Twisted-13.2.0.win-amd64-py2.7.exe)
安装pyOpenSSL-0.13.1.win32-py2.7.exe(64位系统需要安装pyOpenSSL-0.13.1.win-amd64-py2.7.exe)
将zope.interface-4.0.5-py2.7-win32.egg拷贝到C:\Python27\Scripts目录下,执行$ easy_install.exe zope.interface-4.0.5-py2.7-win32.egg

验证scrapy依赖项是否安装成功的方法:

cmd执行$ python进入python控制台

执行import lxml,如果没报错,则说明lxml安装成功
执行import twisted,如果没报错,则说明twisted安装成功
执行import OpenSSL,如果没报错,则说明OpenSSL安装成功
执行import zope.interface,如果没报错,则说明zope.interface安装成功
如果安装成功,那么在cmd中执行& python,然后执行import lxml,如果没有报错,则说明lxml安装成功。

安装Scrapy

方法1: 控制台输入:easy_install scrapy
方法2:解压缩Scrapy-0.22.2.tar.gz,在其目录下执行$ python setup.py install进行Scrapy的安装。

检查Scrapy是否安装成功的方法:可以在cmd控制台执行 $ scrapy ,如果没有报错,说明安装成功。

相关文章

  • Windows下Scrapy环境搭建
  • Windows 8.1 (64bit) 下搭建 Scrapy 0.22 环境
  • Python第三方Window模块安装文件

这篇文章就介绍到这了,需要的朋友可以参考一下。

Python 相关文章推荐
Python编程之属性和方法实例详解
May 19 Python
Python中用altzone()方法处理时区的教程
May 22 Python
Python实现的爬虫功能代码
Jun 24 Python
Python之inspect模块实现获取加载模块路径的方法
Oct 16 Python
Python多项式回归的实现方法
Mar 11 Python
python中数组和矩阵乘法及使用总结(推荐)
May 18 Python
django rest framework vue 实现用户登录详解
Jul 29 Python
Python3 中sorted() 函数的用法
Mar 24 Python
详解如何在PyCharm控制台中输出彩色文字和背景
Aug 17 Python
python中append函数用法讲解
Dec 11 Python
在Python中如何使用yield
Jun 07 Python
python解析照片拍摄时间进行图片整理
Jul 23 Python
python实现简易数码时钟
Feb 19 #Python
python爬取淘宝商品销量信息
Nov 16 #Python
python爬取网易云音乐评论
Nov 16 #Python
python实现将汉字保存成文本的方法
Nov 16 #Python
python 字符串只保留汉字的方法
Nov 16 #Python
Python 创建新文件时避免覆盖已有的同名文件的解决方法
Nov 16 #Python
用python标准库difflib比较两份文件的异同详解
Nov 16 #Python
You might like
一个php作的文本留言本的例子(三)
2006/10/09 PHP
用PHP和ACCESS写聊天室(二)
2006/10/09 PHP
php站内搜索并高亮显示关键字的实现代码
2011/12/29 PHP
详解yii2使用多个数据库的案例
2017/06/16 PHP
js 浮动层菜单收藏
2009/01/16 Javascript
jQuery 遍历json数组的实现代码
2020/09/22 Javascript
JavaScript中的类继承
2010/11/25 Javascript
JS冒泡事件的快速解决方法
2013/12/16 Javascript
JS遍历数组及打印数组实例分析
2016/01/21 Javascript
jQuery点击其他地方时菜单消失的实现方法
2016/04/22 Javascript
基于jquery实现最简单的选项卡切换效果
2016/05/08 Javascript
jQuery表单验证插件解析(推荐)
2016/07/21 Javascript
jq实现左滑显示删除按钮,点击删除实现删除数据功能(推荐)
2016/08/23 Javascript
Js实现京东无延迟菜单效果实例(demo)
2017/06/02 Javascript
浅谈关于axios和session的一些事
2017/07/13 Javascript
微信小程序使用wx.request请求服务器json数据并渲染到页面操作示例
2019/03/30 Javascript
Vue CLI3移动端适配(px2rem或postcss-plugin-px2rem)
2020/04/27 Javascript
uni-app实现获取验证码倒计时功能
2020/11/01 Javascript
[38:51]2014 DOTA2国际邀请赛中国区预选赛 Orenda VS LGD-CDEC
2014/05/22 DOTA
Python按行读取文件的简单实现方法
2016/06/22 Python
Python实现接受任意个数参数的函数方法
2018/04/21 Python
在pycharm中使用git版本管理以及同步github的方法
2019/01/16 Python
Python多进程入门、分布式进程数据共享实例详解
2019/06/03 Python
python cv2截取不规则区域图片实例
2019/12/21 Python
python GUI库图形界面开发之PyQt5选项卡控件QTabWidget详细使用方法与实例
2020/03/01 Python
Window系统下Python如何安装OpenCV库
2020/03/05 Python
Python flask框架端口失效解决方案
2020/06/04 Python
python文件操作seek()偏移量,读取指正到指定位置操作
2020/07/05 Python
浅析Python中字符串的intern机制
2020/10/03 Python
Biblibili视频投稿接口分析并以Python实现自动投稿功能
2021/02/05 Python
印度购物网站:TATA CLiQ
2017/11/23 全球购物
Nike爱尔兰官方网站:Nike.com (IE)
2018/03/12 全球购物
房屋授权委托书范本
2014/10/07 职场文书
会议开幕致辞怎么写
2016/03/03 职场文书
php 原生分页
2021/04/01 PHP
Java获取e.printStackTrace()打印的信息方式
2021/08/07 Java/Android