Goals
For a series of articles in a webpage, if you want to get this series of articles, you can automate certain functions through selenium
method
Right click on the webpage—>Check—>Find the part of the target article block in the figure below
Then right-click the part of the code and select Copy—>copy full xpath (to get /html/body/div[6]/main/div[2]/div[2])
then
driver.find_element_by_xpath(‘/html/body/div[6]/main/div[2]/div[2]’)
Focus
Note :
This is not enough, because the xpath obtained here is just the parent node of all articles (it finds that its length is only 1 through len(), so only the first page can be accessed automatically, and the subsequent ones cannot be automated), want To access the following articles, you need to add /div after the xpath obtained, that is, select all the div child nodes of the parent node (the length is found to be the total number of articles through len() at this time), namely: /html/body/ div[6]/main/div[2]/div[2]/div, then you can get all the articles of the remaining part.
Code:
from selenium import webdriver
from time import sleep
i = 0
def get_all_article():
global i
all_articles = driver.find_elements_by_xpath('/html/body/div[6]/main/div[2]/div[2]/div')
for article in all_articles:
print('length of article:',len(all_articles))
i = i + 1
a = article.find_element_by_tag_name("a")
href = a.get_attribute('href')
js='window.open("'+href+'");'
driver.execute_script(js)
current_window = driver.current_window_handle
allHandles = driver.window_handles
for handle in allHandles:
if handle != driver.current_window_handle:
driver.switch_to.window(handle)
break
At last
I am very happy to share with everyone again, and the rest is...please praise! ! ! You have all seen this, creation is not easy, leave your precious likes~
For other selenium related content, please see: https://blog.csdn.net/weixin_45386875/article/details/113933541