详解scrapy内置中间件的顺序


Posted in Python onSeptember 28, 2020

1. 内置下载器中间件顺序

{'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
 'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500}

2. 内置爬虫中间件顺序

{'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
 'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800}

3. 内置scrapy的settings

{'AJAXCRAWL_ENABLED': False,
 'AUTOTHROTTLE_DEBUG': False,
 'AUTOTHROTTLE_ENABLED': False,
 'AUTOTHROTTLE_MAX_DELAY': 60.0,
 'AUTOTHROTTLE_START_DELAY': 5.0,
 'AUTOTHROTTLE_TARGET_CONCURRENCY': 1.0,
 'BOT_NAME': 'scrapybot',
 'CLOSESPIDER_ERRORCOUNT': 0,
 'CLOSESPIDER_ITEMCOUNT': 0,
 'CLOSESPIDER_PAGECOUNT': 0,
 'CLOSESPIDER_TIMEOUT': 0,
 'COMMANDS_MODULE': '',
 'COMPRESSION_ENABLED': True,
 'CONCURRENT_ITEMS': 100,
 'CONCURRENT_REQUESTS': 16,
 'CONCURRENT_REQUESTS_PER_DOMAIN': 8,
 'CONCURRENT_REQUESTS_PER_IP': 0,
 'COOKIES_DEBUG': False,
 'COOKIES_ENABLED': True,
 'DEFAULT_ITEM_CLASS': 'scrapy.item.Item',
 'DEFAULT_REQUEST_HEADERS': {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
 'Accept-Language': 'en'},
 'DEPTH_LIMIT': 0,
 'DEPTH_PRIORITY': 0,
 'DEPTH_STATS_VERBOSE': False,
 'DNSCACHE_ENABLED': True,
 'DNSCACHE_SIZE': 10000,
 'DNS_TIMEOUT': 60,
 'DOWNLOADER': 'scrapy.core.downloader.Downloader',
 'DOWNLOADER_CLIENTCONTEXTFACTORY': 'scrapy.core.downloader.contextfactory.ScrapyClientContextFactory',
 'DOWNLOADER_CLIENT_TLS_METHOD': 'TLS',
 'DOWNLOADER_HTTPCLIENTFACTORY': 'scrapy.core.downloader.webclient.ScrapyHTTPClientFactory',
 'DOWNLOADER_MIDDLEWARES': {},
 'DOWNLOADER_MIDDLEWARES_BASE': {'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
 'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500},
 'DOWNLOADER_STATS': True,
 'DOWNLOAD_DELAY': 0,
 'DOWNLOAD_FAIL_ON_DATALOSS': True,
 'DOWNLOAD_HANDLERS': {},
 'DOWNLOAD_HANDLERS_BASE': {'data': 'scrapy.core.downloader.handlers.datauri.DataURIDownloadHandler',
 'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler',
 'ftp': 'scrapy.core.downloader.handlers.ftp.FTPDownloadHandler',
 'http': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
 'https': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
 's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler'},
 'DOWNLOAD_MAXSIZE': 1073741824,
 'DOWNLOAD_TIMEOUT': 180,
 'DOWNLOAD_WARNSIZE': 33554432,
 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
 'EDITOR': 'D:\\Program Files (x86)\\Notepad++\\notepad++.exe',
 'EXTENSIONS': {},
 'EXTENSIONS_BASE': {'scrapy.extensions.closespider.CloseSpider': 0,
 'scrapy.extensions.corestats.CoreStats': 0,
 'scrapy.extensions.feedexport.FeedExporter': 0,
 'scrapy.extensions.logstats.LogStats': 0,
 'scrapy.extensions.memdebug.MemoryDebugger': 0,
 'scrapy.extensions.memusage.MemoryUsage': 0,
 'scrapy.extensions.spiderstate.SpiderState': 0,
 'scrapy.extensions.telnet.TelnetConsole': 0,
 'scrapy.extensions.throttle.AutoThrottle': 0},
 'FEED_EXPORTERS': {},
 'FEED_EXPORTERS_BASE': {'csv': 'scrapy.exporters.CsvItemExporter',
 'jl': 'scrapy.exporters.JsonLinesItemExporter',
 'json': 'scrapy.exporters.JsonItemExporter',
 'jsonlines': 'scrapy.exporters.JsonLinesItemExporter',
 'marshal': 'scrapy.exporters.MarshalItemExporter',
 'pickle': 'scrapy.exporters.PickleItemExporter',
 'xml': 'scrapy.exporters.XmlItemExporter'},
 'FEED_EXPORT_ENCODING': None,
 'FEED_EXPORT_FIELDS': None,
 'FEED_EXPORT_INDENT': 0,
 'FEED_FORMAT': 'jsonlines',
 'FEED_STORAGES': {},
 'FEED_STORAGES_BASE': {'': 'scrapy.extensions.feedexport.FileFeedStorage',
 'file': 'scrapy.extensions.feedexport.FileFeedStorage',
 'ftp': 'scrapy.extensions.feedexport.FTPFeedStorage',
 's3': 'scrapy.extensions.feedexport.S3FeedStorage',
 'stdout': 'scrapy.extensions.feedexport.StdoutFeedStorage'},
 'FEED_STORE_EMPTY': False,
 'FEED_TEMPDIR': None,
 'FEED_URI': None,
 'FEED_URI_PARAMS': None,
 'FILES_STORE_GCS_ACL': '',
 'FILES_STORE_S3_ACL': 'private',
 'FTP_PASSIVE_MODE': True,
 'FTP_PASSWORD': 'guest',
 'FTP_USER': 'anonymous',
 'HTTPCACHE_ALWAYS_STORE': False,
 'HTTPCACHE_DBM_MODULE': 'dbm',
 'HTTPCACHE_DIR': 'httpcache',
 'HTTPCACHE_ENABLED': False,
 'HTTPCACHE_EXPIRATION_SECS': 0,
 'HTTPCACHE_GZIP': False,
 'HTTPCACHE_IGNORE_HTTP_CODES': [],
 'HTTPCACHE_IGNORE_MISSING': False,
 'HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS': [],
 'HTTPCACHE_IGNORE_SCHEMES': ['file'],
 'HTTPCACHE_POLICY': 'scrapy.extensions.httpcache.DummyPolicy',
 'HTTPCACHE_STORAGE': 'scrapy.extensions.httpcache.FilesystemCacheStorage',
 'HTTPPROXY_AUTH_ENCODING': 'latin-1',
 'HTTPPROXY_ENABLED': True,
 'IMAGES_STORE_GCS_ACL': '',
 'IMAGES_STORE_S3_ACL': 'private',
 'ITEM_PIPELINES': {},
 'ITEM_PIPELINES_BASE': {},
 'ITEM_PROCESSOR': 'scrapy.pipelines.ItemPipelineManager',
 'LOGSTATS_INTERVAL': 0,
 'LOG_DATEFORMAT': '%Y-%m-%d %H:%M:%S',
 'LOG_ENABLED': True,
 'LOG_ENCODING': 'utf-8',
 'LOG_FILE': None,
 'LOG_FORMAT': '%(asctime)s [%(name)s] %(levelname)s: %(message)s',
 'LOG_FORMATTER': 'scrapy.logformatter.LogFormatter',
 'LOG_LEVEL': 'DEBUG',
 'LOG_SHORT_NAMES': False,
 'LOG_STDOUT': False,
 'MAIL_FROM': 'scrapy@localhost',
 'MAIL_HOST': 'localhost',
 'MAIL_PASS': None,
 'MAIL_PORT': 25,
 'MAIL_USER': None,
 'MEMDEBUG_ENABLED': False,
 'MEMDEBUG_NOTIFY': [],
 'MEMUSAGE_CHECK_INTERVAL_SECONDS': 60.0,
 'MEMUSAGE_ENABLED': True,
 'MEMUSAGE_LIMIT_MB': 0,
 'MEMUSAGE_NOTIFY_MAIL': [],
 'MEMUSAGE_WARNING_MB': 0,
 'METAREFRESH_ENABLED': True,
 'METAREFRESH_MAXDELAY': 100,
 'NEWSPIDER_MODULE': '',
 'RANDOMIZE_DOWNLOAD_DELAY': True,
 'REACTOR_THREADPOOL_MAXSIZE': 10,
 'REDIRECT_ENABLED': True,
 'REDIRECT_MAX_TIMES': 20,
 'REDIRECT_PRIORITY_ADJUST': 2,
 'REFERER_ENABLED': True,
 'REFERRER_POLICY': 'scrapy.spidermiddlewares.referer.DefaultReferrerPolicy',
 'RETRY_ENABLED': True,
 'RETRY_HTTP_CODES': [500, 502, 503, 504, 522, 524, 408],
 'RETRY_PRIORITY_ADJUST': -1,
 'RETRY_TIMES': 2,
 'ROBOTSTXT_OBEY': False,
 'SCHEDULER': 'scrapy.core.scheduler.Scheduler',
 'SCHEDULER_DEBUG': False,
 'SCHEDULER_DISK_QUEUE': 'scrapy.squeues.PickleLifoDiskQueue',
 'SCHEDULER_MEMORY_QUEUE': 'scrapy.squeues.LifoMemoryQueue',
 'SCHEDULER_PRIORITY_QUEUE': 'queuelib.PriorityQueue',
 'SPIDER_CONTRACTS': {},
 'SPIDER_CONTRACTS_BASE': {'scrapy.contracts.default.ReturnsContract': 2,
 'scrapy.contracts.default.ScrapesContract': 3,
 'scrapy.contracts.default.UrlContract': 1},
 'SPIDER_LOADER_CLASS': 'scrapy.spiderloader.SpiderLoader',
 'SPIDER_LOADER_WARN_ONLY': False,
 'SPIDER_MIDDLEWARES': {},
 'SPIDER_MIDDLEWARES_BASE': {'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
 'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800},
 'SPIDER_MODULES': [],
 'STATSMAILER_RCPTS': [],
 'STATS_CLASS': 'scrapy.statscollectors.MemoryStatsCollector',
 'STATS_DUMP': True,
 'TELNETCONSOLE_ENABLED': 1,
 'TELNETCONSOLE_HOST': '127.0.0.1',
 'TELNETCONSOLE_PASSWORD': None,
 'TELNETCONSOLE_PORT': [6023, 6073],
 'TELNETCONSOLE_USERNAME': 'scrapy',
 'TEMPLATES_DIR': 'd:\\python36\\lib\\site-packages\\scrapy\\templates',
 'URLLENGTH_LIMIT': 2083,
 'USER_AGENT': 'Scrapy/1.6.0 (+https://scrapy.org)',
 'KEEP_ALIVE': True}

到此这篇关于详解scrapy内置中间件的顺序的文章就介绍到这了,更多相关scrapy 中间件顺序内容请搜索三水点靠木以前的文章或继续浏览下面的相关文章希望大家以后多多支持三水点靠木!

Python 相关文章推荐
python中函数默认值使用注意点详解
Jun 01 Python
python模块简介之有序字典(OrderedDict)
Dec 01 Python
解决Python3.5+OpenCV3.2读取图像的问题
Dec 05 Python
Python文件读写常见用法总结
Feb 22 Python
Python 仅获取响应头, 不获取实体的实例
Aug 21 Python
Django将默认的SQLite更换为MySQL的实现
Nov 18 Python
Python pandas库中的isnull()详解
Dec 26 Python
python打开音乐文件的实例方法
Jul 21 Python
Python hashlib模块的使用示例
Oct 09 Python
python爬虫scrapy框架的梨视频案例解析
Feb 20 Python
python3实现常见的排序算法(示例代码)
Jul 04 Python
梳理总结Python开发中需要摒弃的18个坏习惯
Jan 22 Python
Python爬虫代理池搭建的方法步骤
Sep 28 #Python
浅析python 通⽤爬⾍和聚焦爬⾍
Sep 28 #Python
Scrapy 配置动态代理IP的实现
Sep 28 #Python
Scrapy中如何向Spider传入参数的方法实现
Sep 28 #Python
详解向scrapy中的spider传递参数的几种方法(2种)
Sep 28 #Python
小结Python的反射机制
Sep 28 #Python
scrapy与selenium结合爬取数据(爬取动态网站)的示例代码
Sep 28 #Python
You might like
《星际争霸II》全新指挥官斯台特曼现已上线
2020/03/08 星际争霸
第六节 访问属性和方法 [6]
2006/10/09 PHP
如何取得中文字符串中出现次数最多的子串
2013/08/08 PHP
php过滤敏感词的示例
2014/03/31 PHP
Laravel的Auth验证Token验证使用自定义Redis的例子
2019/09/30 PHP
JavaScript建立一个语法高亮输入框实现思路
2013/02/26 Javascript
iframe子父页面调用js函数示例
2013/11/07 Javascript
Javascript模块化编程详解
2014/12/01 Javascript
Javascript动态创建div的方法
2015/02/09 Javascript
jQuery实现动态表单验证时文本框抖动效果完整实例
2015/08/21 Javascript
js实现的星星评分功能函数
2015/12/09 Javascript
jQuery中bind(),live(),delegate(),on()绑定事件方法实例详解
2016/01/19 Javascript
JavaScript function函数种类详解
2016/02/22 Javascript
JavaScript实现格式化字符串函数String.format
2016/12/16 Javascript
基于react组件之间的参数传递(详解)
2017/09/05 Javascript
jquery ajaxfileupload异步上传插件
2017/11/21 jQuery
利用VS Code开发你的第一个AngularJS 2应用程序
2017/12/15 Javascript
angularjs 获取默认选中的单选按钮的value方法
2018/02/28 Javascript
JS简单实现查看文档创建日期、修改日期和文档大小的方法示例
2018/04/08 Javascript
vue.js中ref和$refs的使用及示例讲解
2019/08/14 Javascript
[45:46]2014 DOTA2国际邀请赛中国区预选赛5.21 HGT VS DT
2014/05/23 DOTA
python 实现自动远程登陆scp文件实例代码
2017/03/13 Python
Python函数的参数常见分类与用法实例详解
2019/03/30 Python
python学习--使用QQ邮箱发送邮件代码实例
2019/04/16 Python
python实现信号时域统计特征提取代码
2020/02/26 Python
如何解决pycharm调试报错的问题
2020/08/06 Python
使用jTopo给Html5 Canva中绘制的元素添加鼠标事件
2014/05/15 HTML / CSS
HTML5 Canvas中绘制椭圆的4种方法
2015/04/24 HTML / CSS
高二地理教学反思
2014/01/24 职场文书
不错的求职信范文
2014/07/20 职场文书
企业领导对照检查材料
2014/08/20 职场文书
教学改革问题查摆整改措施
2014/09/27 职场文书
关于工作经历的证明书
2014/10/11 职场文书
办公室务虚会发言材料
2014/10/20 职场文书
幼儿园学前班幼儿评语
2014/12/29 职场文书
党风廉洁教育心得体会
2016/01/20 职场文书