scrapy框架异常之no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates) - 代码天地

scrapy框架异常之no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)

其他 2019-01-21 09:01:06 阅读次数: 0

今天在用scrapy爬虫时，报了下面的错误：

2019-01-17 16:47:18 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://newhouse.fang.com/house/s/b95/> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
2019-01-17 16:47:18 [scrapy.core.engine] INFO: Closing spider (finished)
2019-01-17 16:47:18 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 2653,
 'downloader/request_count': 7,
 'downloader/request_method_count/GET': 7,
 'downloader/response_bytes': 220568,
 'downloader/response_count': 7,
 'downloader/response_status_count/200': 7,
 'dupefilter/filtered': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2019, 1, 17, 8, 47, 18, 37428),
 'log_count/DEBUG': 9,
 'log_count/INFO': 7,
 'request_depth_max': 7,
 'response_received_count': 7,
 'scheduler/dequeued': 7,
 'scheduler/dequeued/memory': 7,
 'scheduler/enqueued': 7,
 'scheduler/enqueued/memory': 7,
 'start_time': datetime.datetime(2019, 1, 17, 8, 47, 5, 279308)}
2019-01-17 16:47:18 [scrapy.core.engine] INFO: Spider closed (finished)

原因:在爬虫出现了重复的链接,重复的请求,出现这个DEBUG或者是yield scrapy.Request(xxxurl,callback=self.xxxx)中有重复的请求其实scrapy自身是默认有过滤重复请求的让这个DEBUG不出现,可以有 dont_filter=True,在Request中添加可以解决

yield scrapy.Request(xxxurl,callback=self.xxxx,dont_filter=True)

猜你喜欢

转载自blog.csdn.net/qq_40176258/article/details/86527568

scrapy框架异常之no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)

Find All Duplicates in an Array

leetcode:Find All Duplicates in an Array

【CODE】Find All Duplicates in an Array

【LeetCode】442. Find All Duplicates in an Array

442. Find All Duplicates in an Array

leetcode——442Find All Duplicates in an Array

Leetcode 442. Find All Duplicates in an Array

[leetcode]442. Find All Duplicates in an Array

LeetCode #442 - Find All Duplicates in an Array

leetcode-442.Find All Duplicates in an Array

LeetCode-Find All Duplicates in an Array

LeetCode系列(七)-Find All Duplicates in an Array

LeetCode442: Find All Duplicates in an Array

Leetcode 442 Find All Duplicates in an Array

LeetCode442. Find All Duplicates in an Array

[leetcode] 442. Find All Duplicates in an Array

LeetCode 442 Find All Duplicates in an Array (思维)

1047--Remove All Adjacent Duplicates In String

[LC] 442. Find All Duplicates in an Array

leetcode array|442. Find All Duplicates in an Array

【LeetCode】442. Find All Duplicates in an Array【M】【60】

python leetcode 442. Find All Duplicates in an Array

【LeetCode】442. Find All Duplicates in an Array（C++）

[LeetCode] 442. Find All Duplicates in an Array (C++)

LeetCode刷题：442. Find All Duplicates in an Array

lc1047. Remove All Adjacent Duplicates In String

1047. Remove All Adjacent Duplicates In String - Easy

【Leetcode_easy】1047. Remove All Adjacent Duplicates In String

LeetCode 1047. Remove All Adjacent Duplicates In String

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)