[Selenium] Query article quality score through Java+Selenium

Series Article Directory

Query the article quality score through Java+
Selenium Query the Top40 article quality score of a blogger through Java+Selenium


insert image description here


foreword

Hello everyone, I am Qinghua. In this article, I will share with you "Querying Article Quality Scores Through Java+Selenium".


1. Environmental preparation

Browser: This article uses Chrome
Chrome browser version: 113
Chrome driver version: 113 ( Java crawler first article )
Java version: Jdk1.8
Selenium version: 4.9.1


2. Query article quality score

2.1. Modify pom.xml configuration

	<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
    <dependency>
         <groupId>org.jsoup</groupId>
         <artifactId>jsoup</artifactId>
         <version>1.11.3</version>
     </dependency>

     <dependency>
         <groupId>org.seleniumhq.selenium</groupId>
         <artifactId>selenium-java</artifactId>
         <version>4.9.1</version>
     </dependency>

2.2. Configure Chrome driver

	public final static String CHROMEDRIVERPATH = "/Users/apple/Downloads/chromedriver_mac64/chromedriver";
	System.setProperty("webdriver.chrome.driver", SeleniumUtil.CHROMEDRIVERPATH );//    	chromedriver localPath

2.3. Introduce browser configuration

	 WebDriver driver;
	 ChromeOptions chromeOptions = new ChromeOptions();

2.4. Set headless mode

	chromeOptions.addArguments('--headless')
	chromeOptions.addArguments("--remote-allow-origins=*");

2.5. Start the browser instance and add configuration information

	driver = new ChromeDriver(chromeOptions);

2.6. Access quality sub-address

	driver.get('https://www.csdn.net/qc')

2.7, window settings

	chromeOptions.addArguments("–no-sandbox");  //--start-maximized

2.8. Navigate to the input box and enter the address of the blog post

   //定位到输入框
    WebElement inputSelectE = driver.findElement(By.cssSelector("input.el-input__inner"));
    //输入文字地址
    inputSelectE.sendKeys(blog_url);

2.9. Locate the query button and click

   //定位查询按钮
   WebElement qcSelectE = driver.findElement(By.cssSelector("div.trends-input-box-btn"));
   //点击查询按钮
   qcSelectE.click();

2.10. Mandatory wait for 1s, and convert to jsoup document processing

  SeleniumUtil.sleep(1000);
  
  获取右边区域 -- 文章质量分结果区域
  WebElement mainSelectE = driver.findElement(By.cssSelector("div.csdn-body-right"));

  //转化为Jsoup文档处理
   Document doc = Jsoup.parse( mainSelectE.getAttribute("outerHTML") );

2.11. Obtain the title of the blog post

   String title = doc.select("span.title").text();

2.12. Obtain blog post author and release time

    String posttime = doc.select("span.name").text();

2.13. Obtain blog post quality score

    String score = doc.select("p.img").text();

2.14. Obtain suggestions for blog post quality points

    String remark = doc.select("p.desc").text();

2.15, print results

    log.info("文章标题:{} , 作者和发布时间:{} , 质量分:{} , 博文建议:{}" , title , posttime , score , remark );

2.16. Effect

Article title: "Project Combat" to build the SpringCloud alibaba project (3. Build the service party sub-project store-user-service), author and release time: - Qinghuasuo 2023-06-21 18:20:46 - , quality score: 86 , blog post suggestion: article quality is good


3. Code

	/**
     *  获取文章质量分数据
     * @throws IOException
     */
    void csdnQcBySelenium() {
    
    
        log.info("csdnQcBySelenium start!");

        String blog_url = "https://blog.csdn.net/s445320/article/details/131332238";

        System.setProperty("webdriver.chrome.driver", SeleniumUtil.CHROMEDRIVERPATH );// chromedriver localPath
        ChromeOptions chromeOptions = new ChromeOptions();
        chromeOptions.addArguments("--remote-allow-origins=*");
        chromeOptions.addArguments("–no-sandbox");  //--start-maximized

        WebDriver driver = new ChromeDriver(chromeOptions);

        driver.get("https://www.csdn.net/qc");

        SeleniumUtil.sleep(1000);

        //定位到输入框
        WebElement inputSelectE = driver.findElement(By.cssSelector("input.el-input__inner"));
        //输入文字地址
        inputSelectE.sendKeys(blog_url);

        SeleniumUtil.sleep(100);

        //定位查询按钮
        WebElement qcSelectE = driver.findElement(By.cssSelector("div.trends-input-box-btn"));
        //点击查询按钮
        qcSelectE.click();

        SeleniumUtil.sleep(1000);

        WebElement mainSelectE = driver.findElement(By.cssSelector("div.csdn-body-right"));

        //转化为Jsoup文档处理
        Document doc = Jsoup.parse( mainSelectE.getAttribute("outerHTML") );

        //获取文章标题
        String title = doc.select("span.title").text();

        //获取作者和发布时间
        String posttime = doc.select("span.name").text();

        //获取质量分
        String score = doc.select("p.img").text();

        //获取博文质量分建议
        String remark = doc.select("p.desc").text();

        //打印结果
        log.info("文章标题:{} , 作者和发布时间:{} , 质量分:{} , 博文建议:{}" , title , posttime , score , remark );

        driver.quit();
        log.info("csdnQcBySelenium end!");

    }

Summarize

The quality score of a single query article ends here

Guess you like

Origin blog.csdn.net/s445320/article/details/131347069