详解scrapy内置中间件的顺序


Posted in Python onSeptember 28, 2020

1. 内置下载器中间件顺序

{'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
 'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500}

2. 内置爬虫中间件顺序

{'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
 'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800}

3. 内置scrapy的settings

{'AJAXCRAWL_ENABLED': False,
 'AUTOTHROTTLE_DEBUG': False,
 'AUTOTHROTTLE_ENABLED': False,
 'AUTOTHROTTLE_MAX_DELAY': 60.0,
 'AUTOTHROTTLE_START_DELAY': 5.0,
 'AUTOTHROTTLE_TARGET_CONCURRENCY': 1.0,
 'BOT_NAME': 'scrapybot',
 'CLOSESPIDER_ERRORCOUNT': 0,
 'CLOSESPIDER_ITEMCOUNT': 0,
 'CLOSESPIDER_PAGECOUNT': 0,
 'CLOSESPIDER_TIMEOUT': 0,
 'COMMANDS_MODULE': '',
 'COMPRESSION_ENABLED': True,
 'CONCURRENT_ITEMS': 100,
 'CONCURRENT_REQUESTS': 16,
 'CONCURRENT_REQUESTS_PER_DOMAIN': 8,
 'CONCURRENT_REQUESTS_PER_IP': 0,
 'COOKIES_DEBUG': False,
 'COOKIES_ENABLED': True,
 'DEFAULT_ITEM_CLASS': 'scrapy.item.Item',
 'DEFAULT_REQUEST_HEADERS': {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
 'Accept-Language': 'en'},
 'DEPTH_LIMIT': 0,
 'DEPTH_PRIORITY': 0,
 'DEPTH_STATS_VERBOSE': False,
 'DNSCACHE_ENABLED': True,
 'DNSCACHE_SIZE': 10000,
 'DNS_TIMEOUT': 60,
 'DOWNLOADER': 'scrapy.core.downloader.Downloader',
 'DOWNLOADER_CLIENTCONTEXTFACTORY': 'scrapy.core.downloader.contextfactory.ScrapyClientContextFactory',
 'DOWNLOADER_CLIENT_TLS_METHOD': 'TLS',
 'DOWNLOADER_HTTPCLIENTFACTORY': 'scrapy.core.downloader.webclient.ScrapyHTTPClientFactory',
 'DOWNLOADER_MIDDLEWARES': {},
 'DOWNLOADER_MIDDLEWARES_BASE': {'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
 'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500},
 'DOWNLOADER_STATS': True,
 'DOWNLOAD_DELAY': 0,
 'DOWNLOAD_FAIL_ON_DATALOSS': True,
 'DOWNLOAD_HANDLERS': {},
 'DOWNLOAD_HANDLERS_BASE': {'data': 'scrapy.core.downloader.handlers.datauri.DataURIDownloadHandler',
 'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler',
 'ftp': 'scrapy.core.downloader.handlers.ftp.FTPDownloadHandler',
 'http': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
 'https': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
 's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler'},
 'DOWNLOAD_MAXSIZE': 1073741824,
 'DOWNLOAD_TIMEOUT': 180,
 'DOWNLOAD_WARNSIZE': 33554432,
 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
 'EDITOR': 'D:\\Program Files (x86)\\Notepad++\\notepad++.exe',
 'EXTENSIONS': {},
 'EXTENSIONS_BASE': {'scrapy.extensions.closespider.CloseSpider': 0,
 'scrapy.extensions.corestats.CoreStats': 0,
 'scrapy.extensions.feedexport.FeedExporter': 0,
 'scrapy.extensions.logstats.LogStats': 0,
 'scrapy.extensions.memdebug.MemoryDebugger': 0,
 'scrapy.extensions.memusage.MemoryUsage': 0,
 'scrapy.extensions.spiderstate.SpiderState': 0,
 'scrapy.extensions.telnet.TelnetConsole': 0,
 'scrapy.extensions.throttle.AutoThrottle': 0},
 'FEED_EXPORTERS': {},
 'FEED_EXPORTERS_BASE': {'csv': 'scrapy.exporters.CsvItemExporter',
 'jl': 'scrapy.exporters.JsonLinesItemExporter',
 'json': 'scrapy.exporters.JsonItemExporter',
 'jsonlines': 'scrapy.exporters.JsonLinesItemExporter',
 'marshal': 'scrapy.exporters.MarshalItemExporter',
 'pickle': 'scrapy.exporters.PickleItemExporter',
 'xml': 'scrapy.exporters.XmlItemExporter'},
 'FEED_EXPORT_ENCODING': None,
 'FEED_EXPORT_FIELDS': None,
 'FEED_EXPORT_INDENT': 0,
 'FEED_FORMAT': 'jsonlines',
 'FEED_STORAGES': {},
 'FEED_STORAGES_BASE': {'': 'scrapy.extensions.feedexport.FileFeedStorage',
 'file': 'scrapy.extensions.feedexport.FileFeedStorage',
 'ftp': 'scrapy.extensions.feedexport.FTPFeedStorage',
 's3': 'scrapy.extensions.feedexport.S3FeedStorage',
 'stdout': 'scrapy.extensions.feedexport.StdoutFeedStorage'},
 'FEED_STORE_EMPTY': False,
 'FEED_TEMPDIR': None,
 'FEED_URI': None,
 'FEED_URI_PARAMS': None,
 'FILES_STORE_GCS_ACL': '',
 'FILES_STORE_S3_ACL': 'private',
 'FTP_PASSIVE_MODE': True,
 'FTP_PASSWORD': 'guest',
 'FTP_USER': 'anonymous',
 'HTTPCACHE_ALWAYS_STORE': False,
 'HTTPCACHE_DBM_MODULE': 'dbm',
 'HTTPCACHE_DIR': 'httpcache',
 'HTTPCACHE_ENABLED': False,
 'HTTPCACHE_EXPIRATION_SECS': 0,
 'HTTPCACHE_GZIP': False,
 'HTTPCACHE_IGNORE_HTTP_CODES': [],
 'HTTPCACHE_IGNORE_MISSING': False,
 'HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS': [],
 'HTTPCACHE_IGNORE_SCHEMES': ['file'],
 'HTTPCACHE_POLICY': 'scrapy.extensions.httpcache.DummyPolicy',
 'HTTPCACHE_STORAGE': 'scrapy.extensions.httpcache.FilesystemCacheStorage',
 'HTTPPROXY_AUTH_ENCODING': 'latin-1',
 'HTTPPROXY_ENABLED': True,
 'IMAGES_STORE_GCS_ACL': '',
 'IMAGES_STORE_S3_ACL': 'private',
 'ITEM_PIPELINES': {},
 'ITEM_PIPELINES_BASE': {},
 'ITEM_PROCESSOR': 'scrapy.pipelines.ItemPipelineManager',
 'LOGSTATS_INTERVAL': 0,
 'LOG_DATEFORMAT': '%Y-%m-%d %H:%M:%S',
 'LOG_ENABLED': True,
 'LOG_ENCODING': 'utf-8',
 'LOG_FILE': None,
 'LOG_FORMAT': '%(asctime)s [%(name)s] %(levelname)s: %(message)s',
 'LOG_FORMATTER': 'scrapy.logformatter.LogFormatter',
 'LOG_LEVEL': 'DEBUG',
 'LOG_SHORT_NAMES': False,
 'LOG_STDOUT': False,
 'MAIL_FROM': 'scrapy@localhost',
 'MAIL_HOST': 'localhost',
 'MAIL_PASS': None,
 'MAIL_PORT': 25,
 'MAIL_USER': None,
 'MEMDEBUG_ENABLED': False,
 'MEMDEBUG_NOTIFY': [],
 'MEMUSAGE_CHECK_INTERVAL_SECONDS': 60.0,
 'MEMUSAGE_ENABLED': True,
 'MEMUSAGE_LIMIT_MB': 0,
 'MEMUSAGE_NOTIFY_MAIL': [],
 'MEMUSAGE_WARNING_MB': 0,
 'METAREFRESH_ENABLED': True,
 'METAREFRESH_MAXDELAY': 100,
 'NEWSPIDER_MODULE': '',
 'RANDOMIZE_DOWNLOAD_DELAY': True,
 'REACTOR_THREADPOOL_MAXSIZE': 10,
 'REDIRECT_ENABLED': True,
 'REDIRECT_MAX_TIMES': 20,
 'REDIRECT_PRIORITY_ADJUST': 2,
 'REFERER_ENABLED': True,
 'REFERRER_POLICY': 'scrapy.spidermiddlewares.referer.DefaultReferrerPolicy',
 'RETRY_ENABLED': True,
 'RETRY_HTTP_CODES': [500, 502, 503, 504, 522, 524, 408],
 'RETRY_PRIORITY_ADJUST': -1,
 'RETRY_TIMES': 2,
 'ROBOTSTXT_OBEY': False,
 'SCHEDULER': 'scrapy.core.scheduler.Scheduler',
 'SCHEDULER_DEBUG': False,
 'SCHEDULER_DISK_QUEUE': 'scrapy.squeues.PickleLifoDiskQueue',
 'SCHEDULER_MEMORY_QUEUE': 'scrapy.squeues.LifoMemoryQueue',
 'SCHEDULER_PRIORITY_QUEUE': 'queuelib.PriorityQueue',
 'SPIDER_CONTRACTS': {},
 'SPIDER_CONTRACTS_BASE': {'scrapy.contracts.default.ReturnsContract': 2,
 'scrapy.contracts.default.ScrapesContract': 3,
 'scrapy.contracts.default.UrlContract': 1},
 'SPIDER_LOADER_CLASS': 'scrapy.spiderloader.SpiderLoader',
 'SPIDER_LOADER_WARN_ONLY': False,
 'SPIDER_MIDDLEWARES': {},
 'SPIDER_MIDDLEWARES_BASE': {'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
 'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800},
 'SPIDER_MODULES': [],
 'STATSMAILER_RCPTS': [],
 'STATS_CLASS': 'scrapy.statscollectors.MemoryStatsCollector',
 'STATS_DUMP': True,
 'TELNETCONSOLE_ENABLED': 1,
 'TELNETCONSOLE_HOST': '127.0.0.1',
 'TELNETCONSOLE_PASSWORD': None,
 'TELNETCONSOLE_PORT': [6023, 6073],
 'TELNETCONSOLE_USERNAME': 'scrapy',
 'TEMPLATES_DIR': 'd:\\python36\\lib\\site-packages\\scrapy\\templates',
 'URLLENGTH_LIMIT': 2083,
 'USER_AGENT': 'Scrapy/1.6.0 (+https://scrapy.org)',
 'KEEP_ALIVE': True}

到此这篇关于详解scrapy内置中间件的顺序的文章就介绍到这了,更多相关scrapy 中间件顺序内容请搜索三水点靠木以前的文章或继续浏览下面的相关文章希望大家以后多多支持三水点靠木!

Python 相关文章推荐
python采集百度百科的方法
Jun 05 Python
CentOS 7下Python 2.7升级至Python3.6.1的实战教程
Jul 06 Python
python的pandas工具包,保存.csv文件时不要表头的实例
Jun 14 Python
python基础学习之如何对元组各个元素进行命名详解
Jul 12 Python
python+numpy+matplotalib实现梯度下降法
Aug 31 Python
关于python多重赋值的小问题
Apr 17 Python
Tensorflow 定义变量,函数,数值计算等名字的更新方式
Feb 10 Python
使用python-pptx包批量修改ppt格式的实现
Feb 14 Python
python redis存入字典序列化存储教程
Jul 16 Python
浅析Python中字符串的intern机制
Oct 03 Python
关于python类SortedList详解
Sep 04 Python
Python中的程序流程控制语句
Feb 24 Python
Python爬虫代理池搭建的方法步骤
Sep 28 #Python
浅析python 通⽤爬⾍和聚焦爬⾍
Sep 28 #Python
Scrapy 配置动态代理IP的实现
Sep 28 #Python
Scrapy中如何向Spider传入参数的方法实现
Sep 28 #Python
详解向scrapy中的spider传递参数的几种方法(2种)
Sep 28 #Python
小结Python的反射机制
Sep 28 #Python
scrapy与selenium结合爬取数据(爬取动态网站)的示例代码
Sep 28 #Python
You might like
通过php添加xml文档内容的方法
2015/01/23 PHP
phpinfo()中Loaded Configuration File(none)的解决方法
2017/01/16 PHP
用javascript实现的支持lrc歌词的播放器
2007/05/17 Javascript
AJAX 网页保留浏览器前进后退等功能
2011/02/12 Javascript
jQuery实现鼠标滑过遮罩并高亮显示效果
2013/07/16 Javascript
使用jquery自定义鼠标样式满足个性需求
2013/11/05 Javascript
js加入收藏以及使用Jquery更改透明度
2014/01/26 Javascript
jQuery选择器源码解读(七):elementMatcher函数
2015/03/31 Javascript
JS实现图片的不间断连续滚动的简单实例
2016/06/03 Javascript
JavaScript基础语法之js表达式
2016/06/07 Javascript
微信小程序 tabs选项卡效果的实现
2017/01/05 Javascript
微信小程序实战之自定义模态弹窗(8)
2017/04/18 Javascript
PHP7新特性简述
2017/06/11 Javascript
微信小程序支付之c#后台实现方法
2017/10/19 Javascript
Vue.js通用应用框架-Nuxt.js的上手教程
2017/12/25 Javascript
在vue中获取微信支付code及code被占用问题的解决方法
2019/04/16 Javascript
Python中针对函数处理的特殊方法
2014/03/06 Python
Python验证企业工商注册码
2015/10/25 Python
TensorFlow实现创建分类器
2018/02/06 Python
python3获取当前文件的上一级目录实例
2018/04/26 Python
Python 实现在文件中的每一行添加一个逗号
2018/04/29 Python
Python Matplotlib实现三维数据的散点图绘制
2019/03/19 Python
python自动化UI工具发送QQ消息的实例
2019/08/27 Python
Python小程序 控制鼠标循环点击代码实例
2019/10/08 Python
python多线程使用方法实例详解
2019/12/30 Python
SpringBoot实现登录注册常见问题解决方案
2020/03/04 Python
python GUI库图形界面开发之PyQt5布局控件QGridLayout详细使用方法与实例
2020/03/06 Python
基于Python爬取搜狐证券股票过程解析
2020/11/18 Python
selenium自动化测试入门实战
2020/12/21 Python
中专毕业生个人职业生涯规划
2014/02/19 职场文书
公司的门卫岗位职责
2014/09/09 职场文书
2014派出所所长群众路线对照检查材料思想汇报
2014/09/18 职场文书
出纳2015年度工作总结范文
2015/10/14 职场文书
2016拓展训练心得体会范文
2016/01/12 职场文书
公安纪律作风整顿心得体会
2016/01/23 职场文书
zabbix 代理服务器的部署与 zabbix-snmp 监控问题
2022/07/15 Servers