-
并发性(默认16,最到32,当对方不当做爬虫时)
Configure maximum concurrent requests performed by Scrapy (default: 16)
CONCURRENT_REQUESTS = 32 -
下载延迟
DOWNLOAD_DELAY = 3 //一般建议设置,避免造成目标服务器宕机 -
域名和代理IP的并发
The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN = 16
CONCURRENT_REQUESTS_PER_IP = 16 -
Cookies(默认使用服务区给的cookies)
Disable cookies (enabled by default)
COOKIES_ENABLED = False -
talnet(默认开启)
Disable Telnet Console (enabled by default)
TELNETCONSOLE_ENABLED = False -
默认请求头
Override the default request headers:
DEFAULT_REQUEST_HEADERS = {
‘Accept’: ‘text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8’,
‘Accept-Language’: ‘en’,
} -
爬虫中间件
Enable or disable spider middlewares
See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
SPIDER_MIDDLEWARES = {
‘yangguang.middlewares.YangguangSpiderMiddleware’: 543,
} -
下载中间件
Enable or disable downloader middlewares
See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
‘yangguang.middlewares.YangguangDownloaderMiddleware’: 543,
} -
插件
Enable or disable extensions
See https://docs.scrapy.org/en/latest/topics/extensions.html
EXTENSIONS = {
‘scrapy.extensions.telnet.TelnetConsole’: None,
} -
管道
Configure item pipelines
See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
‘yangguang.pipelines.YangguangPipeline’: 300,
‘yangguang.pipelines.MongoPipeline’: 301
} -
自动限速(防止对方服务器崩溃)
Enable and configure the AutoThrottle extension (disabled by default)
See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True
The initial download delay
AUTOTHROTTLE_START_DELAY = 5
The maximum download delay to be set in case of high latencies
AUTOTHROTTLE_MAX_DELAY = 60
The average number of requests Scrapy should be sending in parallel to
each remote server
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
Enable showing throttling stats for every response received:
AUTOTHROTTLE_DEBUG = False扫描二维码关注公众号,回复: 10556491 查看本文章 -
HTTP缓存
Enable and configure HTTP caching (disabled by default)
See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
HTTPCACHE_ENABLED = True
HTTPCACHE_EXPIRATION_SECS = 0
HTTPCACHE_DIR = ‘httpcache’
HTTPCACHE_IGNORE_HTTP_CODES = []
HTTPCACHE_STORAGE = ‘scrapy.extensions.httpcache.FilesystemCacheStorage’
scrapy中的settings.py文件详情
猜你喜欢
转载自blog.csdn.net/weixin_43388615/article/details/105102704
今日推荐
周排行