Scrapy basics of python

Don't talk nonsense, just open it up!

Brief introduction to Scrapy

Scrapy is an application framework written for crawling website data and extracting structured data. It can be used in a series of programs including data mining, information processing or storing historical data.

Install library

pip install scrapy
pip install pypiwin32

Create project

Done by command

scrapy startproject +项目名称

After installation, the following directory will appear

|--myspider/ 项目根目录
    |--scrapy.cfg项目配置文件
    |--myspider/ 爬虫程序开发模块
        |--spiders/爬虫程序所在的目录
            |---demo
        |--items.py采集的数据,定义封装模型类
        |--pipelines.py采集完成后对数据进行验证和存储模块
        |--middlewares.py中间件定义模块
        |--setting.py项目设置模块

Execute a file command

scrapy crawl baidu_com.py这样项目就启动了

The following is attached to create baidu_com.py under the spider folder

# -*- coding: utf-8 -*-
import scrapy
from scrapy.crawler import CrawlerProcess
class BaiduComSpider(scrapy.Spider):
    name = 'baidu.com'
    allowed_domains = ['www.baidu.com']
    start_urls = ['https://www.baidu.com/']

    def parse(self, response):
        yield {
    
    
            'title': response.xpath('//title/text()').extract_first()
        }


# 创建一个CrawlerProcess对象
process = CrawlerProcess() # 括号中可以添加参数

process.crawl(BaiduComSpider)
process.start()
# if __name__=="__main__":
#     tk = BaiduComSpider()
#     tk.parse()

The effect diagram is as follows
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_37254196/article/details/108233334