Scrapy-- selector

Scrapy extraction mechanism has its own set of data, referred to as a selector ( selectors ) , by a specific Xpath or CSS selected expression HTML of a portion of the file

Xpath is specialized in XML selected node file language can also be used in HTML on.

CSS is a goalkeeper HTML document style language, it is defined by the selectors, and with a particular HTML associated with the style elements.

 

Xpath commonly used methods:

nodeName      select all nodes this node

/                Select from the root node

//               the current node matches the selected document node selected from, regardless of their location

.                Selects the current node

..                select the parent node of the current node

@              Select Properties

*               Matches any element node

@ *               Matches any attribute node

The Node ()            matches any type of node

  

CSS common use:

.class .color                    select class = "color" of all the elements

#id #info                    select id = "info" all elements

* *                       Select all elements

element p                        selects all p elements

element, element div, p                    selects all div elements and all p elements

element element div p                    choose div internal label all p elements

[attribute] [target]                selected with targe all elements attributes

[arrtibute = value] [target =          _blank] select target = "_ blank" all elements

 

 

 

xpath selectors

css selector

Unable to find a match

Customizable return value response.xpath ( '// title / text ( )'). Extract_first (default = 'not-found')

The default is None

with

Extract the matching element (Back to list)

.extract () method

with

Extracting the first matching element (returns the string)

.extract_first () method

with

Get the text

response.xpath('//title/text()')

response.css('title::text')

Acquiring property

response.xpath('//base/@href')

response.css('base::attr(href)')

Obtaining a label all href included in image fields href attribute

response.xpath('//a[contains(@href, "image")]/@href')

response.css('a[href*=image]::attr(href)')

Acquiring property in the label tag

response.xpath('//a[contains(@href, "image")]/img/@src')

response.css('a[href*=image] img::attr(src)')

注意img前面有空格

可与正则表达式连用re()返回列表,re_first()返回第一个匹配字符串

response.xpath('//a/text').re(r'Name:\s*(.*)')

response.css('a::text').re(r'Name:\s*(.*)')

 

 

Guess you like

Origin www.cnblogs.com/lanston1/p/11894433.html