Issues related to the use of xpath in selenium's automated information collection

1. Basic usage

There are many ways to use it on the Internet , but according to the relatively new find_element_by_xpathmethod I used, this method can no longer be used and needs to be used instead .seleniumwebdriverfind_element_by_xpathfind_element(By.XPATH,'A XPATH Value')

web_element = driver.find_element(By.XPATH,'A XPATH Value')

2. About xpath

(1) Copy an xpath from the web page

The stupid xpath value used in this article is the use of xpath as follows:
directly F12
insert image description here
click on the icon in the upper left corner of the webpage
insert image description here
and click on an element, and the location in the source code will be displayed on the right, in the source code
insert image description here
You can get an xpath by right-clicking in the position , but this kind of xpath is "not good", because if the source code changes, it is likely to change the xpath

(2) Why does xpath look like this

First, do a simple analysis of the xpath just mentioned
//*[@id="root"]/div/main/div/div/div/div/div[2]/div/div[1]/div/div[1]/form/div[2]/div/label/input

username = self.driver.find_element(By.XPATH,'//*[@id="root"]/div/main/div/div/div/div/div[2]/div/div[1]/div/div[1]/form/div[2]/div/label/input').text

insert image description here

The above xpath, combined with the above python3 statement, can be interpreted as: in self.driverthis web element, among all tags (because of the first one //), find the first one (because the find_element method is used, which will be expanded later) there is an id in the tag attribute, and its value is any (because of *) tag of "root", the first div tag under it, the first div tag under it, the first div tag under it*4, the second one under it The first label tag under the div tag, the first input tag under it, so the input tag is finally found.
insert image description here

3. Customize xpath according to your needs

(1) Multiple xpath methods can correspond to the same HTML tag

After almost understanding the principle of xpath, you can design xapth according to your own needs and how to write the xpath above:

//*[@id="root"]/div/main/div/div/div/div/div[2]/div/div[1]/div/div[1]/form/div[2]/div/label/input
Change:
//div[@id="root"]/div/main/div/div/div/div/div[2]/div/div[1]/div/div[1]/form/div[2]/div/label/input
This is just a small change

② Another example is
the complete xpath value of the tag:
/html/body/div[1]/div/main/div/div/div/div/div[2]/div/div[1]/div/div[1]/form/div[2]/div/label/input
the /html/body/div[1]and //div[@id="root"]are the same and can be replaced with each other.

(2) Obtain the custom xpath value according to the attribute information of the tag

It can be noticed that the above inputtag has an attribute of class, which valueis , after searching "Input i7cW1UcwT6ThdhTakqFm username-input" in the console , it is found that there are only two tags with (or two input tags that meet this condition), we want to get the input phone number The corresponding xpath (used in the python code below) can be designed like this:ctrl+FInput i7cW1UcwT6ThdhTakqFm username-inputclass="Input i7cW1UcwT6ThdhTakqFm username-input"

phone_number = self.driver.find_element(By.XPATH,'XPATH')

//input[@class="Input i7cW1UcwT6ThdhTakqFm username-input"]
Or
//*[@class="Input i7cW1UcwT6ThdhTakqFm username-input"]
the meaning is: self.driverfind the first class="Input i7cW1UcwT6ThdhTakqFm username-input"(restricted condition, don't ignore @) input tag in (in this case, it can be *(any tag))
Please add a picture description
As mentioned above, the xpath copied by the console is generally not very good (in my opinion), because If there are some changes in the source code, it is possible to change the xpath, but the attribute of a certain tag usually does not change, which improves the fault tolerance of the code in the face of changes in the source code of the web page.

(3) Get all web element elements that meet a certain xpath condition

Using the above xpath value customized according to the label attribute, you can get all the elements of the web element that meet a certain xpath condition at one time. This is very useful when processing similar data in batches. In the above example, you need to enter the verification code after entering the mobile phone number. If Still using the find_element method + full xpath method, you may need to design python statements like this:

phone_number = self.driver.find_element(By.XPATH,'//*[@id="root"]/div/main/div/div/div/div/div[2]/div/div[1]/div/div[1]/form/div[2]/div/label/input')
code = self.driver.find_element(By.XPATH,'//*[@id="root"]/div/main/div/div/div/div/div[2]/div/div[1]/div/div[1]/form/div[3]/div/label/input')

Every time you need to copy the xpath value on the web page,
and using the above method of customizing the xpath value according to the label attribute, you can design the following python code

input_list = self.driver.find_elements(By.XPATH,'//input[@class="Input i7cW1UcwT6ThdhTakqFm username-input"]')
phone_number = input_list[0]
code = input_list[1]

The find_elements method is used (different from find_element, find_element is find_elements[0]), and its return value is a list of Web Elements. The meaning of the above code is: input_list is a list containing all under self.driverthisweb elementclass="Input i7cW1UcwT6ThdhTakqFm username-input"web element

(4) Relative path + xpath value

Sometimes there will be such a requirement:
insert image description here
it is necessary to obtain the number of approvals, title information, content information, etc. in all the answer information of the user. Of course, you can write the following code:

agree_list = self.driver.find_elements(By.XPATH,'//Button[@class="Button VoteButton VoteButton--up FEfUrdfMIKpQDJDqkjte"]/span')
titles = ....
content = ...
......
for index,agree_info in enumerate(agree) :
	agree = agree_list[i]
	title = titles[i]
	....

But if sometimes there is more than one class="Button VoteButton VoteButton--up FEfUrdfMIKpQDJDqkjte"](agree button) in the answer information, give an inappropriate example:
if the button in the underlined position in the picture below is also class="Button VoteButton VoteButton--up FEfUrdfMIKpQDJDqkjte"]how to solve it? (actually not, just an example)
insert image description here

The following provides a way of thinking:
this method in a certain class

list_items = self.driver.find_elements(By.XPATH,'//div[@class="List-item"]')
for list_item in list_items :
	agree = list_item.find_element(By.XPATH,'.//Button[@class="Button VoteButton VoteButton--up FEfUrdfMIKpQDJDqkjte"]/span')
	...
	......

Note: In the following code, the beginning of the xpath //is replaced ../, and self.driver is replaced with another web element( list_item)
so that only the button list itemin this is obtained class.
\

4. Implicitly wait for the element corresponding to xpath to appear

find_elementIn order to prevent the python file from executing other methods before the page is fully loaded due to network speed problems, causing the program to report an error and exit abnormally, you can use the following statement to wait for the element corresponding to xpath to appear:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(self.driver,10000).until(EC.presence_of_element_located((By.XPATH, 'XPATH')))

Among them, 1000 is the longest waiting time set. If xpaththe element corresponding to the timeout still does not appear, the program will still report an error. Compared with the method used , the advantage of this method is that after the corresponding element sleepappears on the page , it will xpathContinue to execute the program quickly, without xpaththe problem that the program is still waiting for execution after it appears.

5. WEBELEMENT.click()The solution of the method that the invisible (not displayed in the current view) element cannot be clicked

You can use the following statement

self.driver.execute_script("arguments[0].click();", web_element_clickable)

One of them web_element_clickableis clickable web_element.

6. The page does not load all the content at one time

Another thing is that the page does not load all the content at one time. It is necessary to slide down the page to load more content. You can use the following python statement. The following
example is that if the current page has less than 10 xpath values //div[@class="List-item"], web elementsend Scroll down the page until the conditions are met.

ask_answers = self.driver.find_elements(By.XPATH,'//div[@class="List-item"]')
        ask_answers_count = len(ask_answers)
        # Scroll down until the count is at least 10
        while ask_answers_count < valid_answers_count:
            # Scroll by 100 pixels
            self.driver.execute_script("window.scrollBy(0, 100);")
            # Get the updated ask_answers_count of div elements with class "List-item"
            ask_answers = self.driver.find_elements(By.XPATH,'//div[@class="List-item"]')
            ask_answers_count = len(ask_answers)
       

Guess you like

Origin blog.csdn.net/weixin_52111404/article/details/129834733