scrapy中的settings.py文件详情

并发性(默认16,最到32，当对方不当做爬虫时)
Configure maximum concurrent requests performed by Scrapy (default: 16)
CONCURRENT_REQUESTS = 32
下载延迟
DOWNLOAD_DELAY = 3 //一般建议设置，避免造成目标服务器宕机
域名和代理IP的并发
The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN = 16
CONCURRENT_REQUESTS_PER_IP = 16
Cookies(默认使用服务区给的cookies)
Disable cookies (enabled by default)
COOKIES_ENABLED = False
talnet(默认开启)
Disable Telnet Console (enabled by default)
TELNETCONSOLE_ENABLED = False
默认请求头
Override the default request headers:
DEFAULT_REQUEST_HEADERS = {
‘Accept’: ‘text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8’,
‘Accept-Language’: ‘en’,
}
爬虫中间件
Enable or disable spider middlewares
See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
SPIDER_MIDDLEWARES = {
‘yangguang.middlewares.YangguangSpiderMiddleware’: 543,
}
下载中间件
Enable or disable downloader middlewares
See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
‘yangguang.middlewares.YangguangDownloaderMiddleware’: 543,
}
插件
Enable or disable extensions
See https://docs.scrapy.org/en/latest/topics/extensions.html
EXTENSIONS = {
‘scrapy.extensions.telnet.TelnetConsole’: None,
}
管道
Configure item pipelines
See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
‘yangguang.pipelines.YangguangPipeline’: 300,
‘yangguang.pipelines.MongoPipeline’: 301
}
自动限速(防止对方服务器崩溃)
Enable and configure the AutoThrottle extension (disabled by default)
See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True
The initial download delay
AUTOTHROTTLE_START_DELAY = 5
The maximum download delay to be set in case of high latencies
AUTOTHROTTLE_MAX_DELAY = 60
The average number of requests Scrapy should be sending in parallel to
each remote server
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
Enable showing throttling stats for every response received:
AUTOTHROTTLE_DEBUG = False

扫描二维码关注公众号，回复： 10556491 查看本文章
HTTP缓存
Enable and configure HTTP caching (disabled by default)
See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
HTTPCACHE_ENABLED = True
HTTPCACHE_EXPIRATION_SECS = 0
HTTPCACHE_DIR = ‘httpcache’
HTTPCACHE_IGNORE_HTTP_CODES = []
HTTPCACHE_STORAGE = ‘scrapy.extensions.httpcache.FilesystemCacheStorage’

Leadingme

发布了54 篇原创文章 · 获赞 24 · 访问量 3万+

私信关注

scrapy中的settings.py文件详情

猜你喜欢