详解scrapy内置中间件的顺序


Posted in Python onSeptember 28, 2020

1. 内置下载器中间件顺序

{'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
 'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500}

2. 内置爬虫中间件顺序

{'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
 'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800}

3. 内置scrapy的settings

{'AJAXCRAWL_ENABLED': False,
 'AUTOTHROTTLE_DEBUG': False,
 'AUTOTHROTTLE_ENABLED': False,
 'AUTOTHROTTLE_MAX_DELAY': 60.0,
 'AUTOTHROTTLE_START_DELAY': 5.0,
 'AUTOTHROTTLE_TARGET_CONCURRENCY': 1.0,
 'BOT_NAME': 'scrapybot',
 'CLOSESPIDER_ERRORCOUNT': 0,
 'CLOSESPIDER_ITEMCOUNT': 0,
 'CLOSESPIDER_PAGECOUNT': 0,
 'CLOSESPIDER_TIMEOUT': 0,
 'COMMANDS_MODULE': '',
 'COMPRESSION_ENABLED': True,
 'CONCURRENT_ITEMS': 100,
 'CONCURRENT_REQUESTS': 16,
 'CONCURRENT_REQUESTS_PER_DOMAIN': 8,
 'CONCURRENT_REQUESTS_PER_IP': 0,
 'COOKIES_DEBUG': False,
 'COOKIES_ENABLED': True,
 'DEFAULT_ITEM_CLASS': 'scrapy.item.Item',
 'DEFAULT_REQUEST_HEADERS': {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
 'Accept-Language': 'en'},
 'DEPTH_LIMIT': 0,
 'DEPTH_PRIORITY': 0,
 'DEPTH_STATS_VERBOSE': False,
 'DNSCACHE_ENABLED': True,
 'DNSCACHE_SIZE': 10000,
 'DNS_TIMEOUT': 60,
 'DOWNLOADER': 'scrapy.core.downloader.Downloader',
 'DOWNLOADER_CLIENTCONTEXTFACTORY': 'scrapy.core.downloader.contextfactory.ScrapyClientContextFactory',
 'DOWNLOADER_CLIENT_TLS_METHOD': 'TLS',
 'DOWNLOADER_HTTPCLIENTFACTORY': 'scrapy.core.downloader.webclient.ScrapyHTTPClientFactory',
 'DOWNLOADER_MIDDLEWARES': {},
 'DOWNLOADER_MIDDLEWARES_BASE': {'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
 'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500},
 'DOWNLOADER_STATS': True,
 'DOWNLOAD_DELAY': 0,
 'DOWNLOAD_FAIL_ON_DATALOSS': True,
 'DOWNLOAD_HANDLERS': {},
 'DOWNLOAD_HANDLERS_BASE': {'data': 'scrapy.core.downloader.handlers.datauri.DataURIDownloadHandler',
 'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler',
 'ftp': 'scrapy.core.downloader.handlers.ftp.FTPDownloadHandler',
 'http': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
 'https': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
 's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler'},
 'DOWNLOAD_MAXSIZE': 1073741824,
 'DOWNLOAD_TIMEOUT': 180,
 'DOWNLOAD_WARNSIZE': 33554432,
 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
 'EDITOR': 'D:\\Program Files (x86)\\Notepad++\\notepad++.exe',
 'EXTENSIONS': {},
 'EXTENSIONS_BASE': {'scrapy.extensions.closespider.CloseSpider': 0,
 'scrapy.extensions.corestats.CoreStats': 0,
 'scrapy.extensions.feedexport.FeedExporter': 0,
 'scrapy.extensions.logstats.LogStats': 0,
 'scrapy.extensions.memdebug.MemoryDebugger': 0,
 'scrapy.extensions.memusage.MemoryUsage': 0,
 'scrapy.extensions.spiderstate.SpiderState': 0,
 'scrapy.extensions.telnet.TelnetConsole': 0,
 'scrapy.extensions.throttle.AutoThrottle': 0},
 'FEED_EXPORTERS': {},
 'FEED_EXPORTERS_BASE': {'csv': 'scrapy.exporters.CsvItemExporter',
 'jl': 'scrapy.exporters.JsonLinesItemExporter',
 'json': 'scrapy.exporters.JsonItemExporter',
 'jsonlines': 'scrapy.exporters.JsonLinesItemExporter',
 'marshal': 'scrapy.exporters.MarshalItemExporter',
 'pickle': 'scrapy.exporters.PickleItemExporter',
 'xml': 'scrapy.exporters.XmlItemExporter'},
 'FEED_EXPORT_ENCODING': None,
 'FEED_EXPORT_FIELDS': None,
 'FEED_EXPORT_INDENT': 0,
 'FEED_FORMAT': 'jsonlines',
 'FEED_STORAGES': {},
 'FEED_STORAGES_BASE': {'': 'scrapy.extensions.feedexport.FileFeedStorage',
 'file': 'scrapy.extensions.feedexport.FileFeedStorage',
 'ftp': 'scrapy.extensions.feedexport.FTPFeedStorage',
 's3': 'scrapy.extensions.feedexport.S3FeedStorage',
 'stdout': 'scrapy.extensions.feedexport.StdoutFeedStorage'},
 'FEED_STORE_EMPTY': False,
 'FEED_TEMPDIR': None,
 'FEED_URI': None,
 'FEED_URI_PARAMS': None,
 'FILES_STORE_GCS_ACL': '',
 'FILES_STORE_S3_ACL': 'private',
 'FTP_PASSIVE_MODE': True,
 'FTP_PASSWORD': 'guest',
 'FTP_USER': 'anonymous',
 'HTTPCACHE_ALWAYS_STORE': False,
 'HTTPCACHE_DBM_MODULE': 'dbm',
 'HTTPCACHE_DIR': 'httpcache',
 'HTTPCACHE_ENABLED': False,
 'HTTPCACHE_EXPIRATION_SECS': 0,
 'HTTPCACHE_GZIP': False,
 'HTTPCACHE_IGNORE_HTTP_CODES': [],
 'HTTPCACHE_IGNORE_MISSING': False,
 'HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS': [],
 'HTTPCACHE_IGNORE_SCHEMES': ['file'],
 'HTTPCACHE_POLICY': 'scrapy.extensions.httpcache.DummyPolicy',
 'HTTPCACHE_STORAGE': 'scrapy.extensions.httpcache.FilesystemCacheStorage',
 'HTTPPROXY_AUTH_ENCODING': 'latin-1',
 'HTTPPROXY_ENABLED': True,
 'IMAGES_STORE_GCS_ACL': '',
 'IMAGES_STORE_S3_ACL': 'private',
 'ITEM_PIPELINES': {},
 'ITEM_PIPELINES_BASE': {},
 'ITEM_PROCESSOR': 'scrapy.pipelines.ItemPipelineManager',
 'LOGSTATS_INTERVAL': 0,
 'LOG_DATEFORMAT': '%Y-%m-%d %H:%M:%S',
 'LOG_ENABLED': True,
 'LOG_ENCODING': 'utf-8',
 'LOG_FILE': None,
 'LOG_FORMAT': '%(asctime)s [%(name)s] %(levelname)s: %(message)s',
 'LOG_FORMATTER': 'scrapy.logformatter.LogFormatter',
 'LOG_LEVEL': 'DEBUG',
 'LOG_SHORT_NAMES': False,
 'LOG_STDOUT': False,
 'MAIL_FROM': 'scrapy@localhost',
 'MAIL_HOST': 'localhost',
 'MAIL_PASS': None,
 'MAIL_PORT': 25,
 'MAIL_USER': None,
 'MEMDEBUG_ENABLED': False,
 'MEMDEBUG_NOTIFY': [],
 'MEMUSAGE_CHECK_INTERVAL_SECONDS': 60.0,
 'MEMUSAGE_ENABLED': True,
 'MEMUSAGE_LIMIT_MB': 0,
 'MEMUSAGE_NOTIFY_MAIL': [],
 'MEMUSAGE_WARNING_MB': 0,
 'METAREFRESH_ENABLED': True,
 'METAREFRESH_MAXDELAY': 100,
 'NEWSPIDER_MODULE': '',
 'RANDOMIZE_DOWNLOAD_DELAY': True,
 'REACTOR_THREADPOOL_MAXSIZE': 10,
 'REDIRECT_ENABLED': True,
 'REDIRECT_MAX_TIMES': 20,
 'REDIRECT_PRIORITY_ADJUST': 2,
 'REFERER_ENABLED': True,
 'REFERRER_POLICY': 'scrapy.spidermiddlewares.referer.DefaultReferrerPolicy',
 'RETRY_ENABLED': True,
 'RETRY_HTTP_CODES': [500, 502, 503, 504, 522, 524, 408],
 'RETRY_PRIORITY_ADJUST': -1,
 'RETRY_TIMES': 2,
 'ROBOTSTXT_OBEY': False,
 'SCHEDULER': 'scrapy.core.scheduler.Scheduler',
 'SCHEDULER_DEBUG': False,
 'SCHEDULER_DISK_QUEUE': 'scrapy.squeues.PickleLifoDiskQueue',
 'SCHEDULER_MEMORY_QUEUE': 'scrapy.squeues.LifoMemoryQueue',
 'SCHEDULER_PRIORITY_QUEUE': 'queuelib.PriorityQueue',
 'SPIDER_CONTRACTS': {},
 'SPIDER_CONTRACTS_BASE': {'scrapy.contracts.default.ReturnsContract': 2,
 'scrapy.contracts.default.ScrapesContract': 3,
 'scrapy.contracts.default.UrlContract': 1},
 'SPIDER_LOADER_CLASS': 'scrapy.spiderloader.SpiderLoader',
 'SPIDER_LOADER_WARN_ONLY': False,
 'SPIDER_MIDDLEWARES': {},
 'SPIDER_MIDDLEWARES_BASE': {'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
 'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800},
 'SPIDER_MODULES': [],
 'STATSMAILER_RCPTS': [],
 'STATS_CLASS': 'scrapy.statscollectors.MemoryStatsCollector',
 'STATS_DUMP': True,
 'TELNETCONSOLE_ENABLED': 1,
 'TELNETCONSOLE_HOST': '127.0.0.1',
 'TELNETCONSOLE_PASSWORD': None,
 'TELNETCONSOLE_PORT': [6023, 6073],
 'TELNETCONSOLE_USERNAME': 'scrapy',
 'TEMPLATES_DIR': 'd:\\python36\\lib\\site-packages\\scrapy\\templates',
 'URLLENGTH_LIMIT': 2083,
 'USER_AGENT': 'Scrapy/1.6.0 (+https://scrapy.org)',
 'KEEP_ALIVE': True}

到此这篇关于详解scrapy内置中间件的顺序的文章就介绍到这了,更多相关scrapy 中间件顺序内容请搜索三水点靠木以前的文章或继续浏览下面的相关文章希望大家以后多多支持三水点靠木!

Python 相关文章推荐
python连接mongodb操作数据示例(mongodb数据库配置类)
Dec 31 Python
Python版微信红包分配算法
May 04 Python
Python使用os模块和fileinput模块来操作文件目录
Jan 19 Python
如何在Python中编写并发程序
Feb 27 Python
python 添加用户设置密码并发邮件给root用户
Jul 25 Python
基于Python函数的作用域规则和闭包(详解)
Nov 29 Python
Python线性方程组求解运算示例
Jan 17 Python
python查询mysql,返回json的实例
Mar 26 Python
Python pymongo模块用法示例
Mar 31 Python
Python实现定时执行任务的三种方式简单示例
Mar 30 Python
在tensorflow中设置使用某一块GPU、多GPU、CPU的操作
Feb 07 Python
Python函数参数分类原理详解
May 28 Python
Python爬虫代理池搭建的方法步骤
Sep 28 #Python
浅析python 通⽤爬⾍和聚焦爬⾍
Sep 28 #Python
Scrapy 配置动态代理IP的实现
Sep 28 #Python
Scrapy中如何向Spider传入参数的方法实现
Sep 28 #Python
详解向scrapy中的spider传递参数的几种方法(2种)
Sep 28 #Python
小结Python的反射机制
Sep 28 #Python
scrapy与selenium结合爬取数据(爬取动态网站)的示例代码
Sep 28 #Python
You might like
php 防止单引号,双引号在接受页面转义
2008/07/10 PHP
Ajax+PHP 边学边练之四 表单
2009/11/27 PHP
php集成环境xampp中apache无法启动问题解决方案
2014/11/18 PHP
PHP环境中Memcache的安装和使用
2015/11/05 PHP
ThinkPHP3.2.3框架邮件发送功能图文实例详解
2019/04/23 PHP
JavaScript 事件参考手册
2008/12/24 Javascript
javascript web对话框与弹出窗口
2009/02/22 Javascript
Dojo 学习要点
2010/09/03 Javascript
js输出列表实现代码
2010/09/12 Javascript
很棒的学习jQuery的12个网站推荐
2011/04/28 Javascript
JQuery动态创建DOM、表单元素的实现代码
2011/08/09 Javascript
Jquery中国地图热点效果-鼠标经过弹出提示层信息的简单实例
2014/02/12 Javascript
详解JavaScript中this关键字的用法
2016/05/26 Javascript
微信小程序 开发之快递查询功能的实现
2017/01/09 Javascript
angular和BootStrap3实现购物车功能
2017/01/25 Javascript
nodejs制作爬虫实现批量下载图片
2017/05/19 NodeJs
解决layui上传文件提示上传异常,实际文件已经上传成功的问题
2018/08/19 Javascript
js打开word文档预览操作示例【不是下载】
2019/05/23 Javascript
layer.confirm()右边按钮实现href的例子
2019/09/27 Javascript
JS实现长图上下滚动效果
2020/03/19 Javascript
[07:57]DOTA2热力大趴狂欢夜 广州站活动回顾
2013/11/27 DOTA
python正则表达式判断字符串是否是全部小写示例
2013/12/25 Python
python实现按任意键继续执行程序
2016/12/30 Python
python简单实现操作Mysql数据库
2018/01/29 Python
Python/Django后端使用PIL Image生成头像缩略图
2019/04/30 Python
Python从list类型、range()序列简单认识类(class)【可迭代】
2019/05/31 Python
python+numpy按行求一个二维数组的最大值方法
2019/07/09 Python
Python的bit_length函数来二进制的位数方法
2019/08/27 Python
pymysql模块的使用(增删改查)详解
2019/09/09 Python
django model object序列化实例
2020/03/13 Python
Django模板报TemplateDoesNotExist异常(亲测可行)
2020/12/18 Python
德国柯吉澳趣味家居:Koziol
2017/08/24 全球购物
主要的Ajax框架都有什么
2013/11/14 面试题
2016幼儿园中班开学寄语
2015/12/03 职场文书
pytorch 两个GPU同时训练的解决方案
2021/06/01 Python
win10系统xps文件怎么打开?win10打开xps文件的两种操作方法
2022/07/23 数码科技