Series Article Directory
Query the article quality score through Java+
Selenium Query the Top40 article quality score of a blogger through Java+Selenium
Article Directory
- Series Article Directory
- foreword
- 1. Environmental preparation
- 2. Query article quality score
-
- 2.1. Modify pom.xml configuration
- 2.2. Configure Chrome driver
- 2.3. Introduce browser configuration
- 2.4. Set headless mode
- 2.5. Start the browser instance and add configuration information
- 2.6. Access quality sub-address
- 2.7, window settings
- 2.8. Navigate to the input box and enter the address of the blog post
- 2.9. Locate the query button and click
- 2.10. Mandatory wait for 1s, and convert to jsoup document processing
- 2.11. Obtain the title of the blog post
- 2.12. Obtain blog post author and release time
- 2.13. Obtain blog post quality score
- 2.14. Obtain suggestions for blog post quality points
- 2.15, print results
- 2.16. Effect
- 3. Code
- Summarize
foreword
Hello everyone, I am Qinghua. In this article, I will share with you "Querying Article Quality Scores Through Java+Selenium".
1. Environmental preparation
Browser: This article uses Chrome
Chrome browser version: 113
Chrome driver version: 113 ( Java crawler first article )
Java version: Jdk1.8
Selenium version: 4.9.1
2. Query article quality score
2.1. Modify pom.xml configuration
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.9.1</version>
</dependency>
2.2. Configure Chrome driver
public final static String CHROMEDRIVERPATH = "/Users/apple/Downloads/chromedriver_mac64/chromedriver";
System.setProperty("webdriver.chrome.driver", SeleniumUtil.CHROMEDRIVERPATH );// chromedriver localPath
2.3. Introduce browser configuration
WebDriver driver;
ChromeOptions chromeOptions = new ChromeOptions();
2.4. Set headless mode
chromeOptions.addArguments('--headless')
chromeOptions.addArguments("--remote-allow-origins=*");
2.5. Start the browser instance and add configuration information
driver = new ChromeDriver(chromeOptions);
2.6. Access quality sub-address
driver.get('https://www.csdn.net/qc')
2.7, window settings
chromeOptions.addArguments("–no-sandbox"); //--start-maximized
2.8. Navigate to the input box and enter the address of the blog post
//定位到输入框
WebElement inputSelectE = driver.findElement(By.cssSelector("input.el-input__inner"));
//输入文字地址
inputSelectE.sendKeys(blog_url);
2.9. Locate the query button and click
//定位查询按钮
WebElement qcSelectE = driver.findElement(By.cssSelector("div.trends-input-box-btn"));
//点击查询按钮
qcSelectE.click();
2.10. Mandatory wait for 1s, and convert to jsoup document processing
SeleniumUtil.sleep(1000);
获取右边区域 -- 文章质量分结果区域
WebElement mainSelectE = driver.findElement(By.cssSelector("div.csdn-body-right"));
//转化为Jsoup文档处理
Document doc = Jsoup.parse( mainSelectE.getAttribute("outerHTML") );
2.11. Obtain the title of the blog post
String title = doc.select("span.title").text();
2.12. Obtain blog post author and release time
String posttime = doc.select("span.name").text();
2.13. Obtain blog post quality score
String score = doc.select("p.img").text();
2.14. Obtain suggestions for blog post quality points
String remark = doc.select("p.desc").text();
2.15, print results
log.info("文章标题:{} , 作者和发布时间:{} , 质量分:{} , 博文建议:{}" , title , posttime , score , remark );
2.16. Effect
Article title: "Project Combat" to build the SpringCloud alibaba project (3. Build the service party sub-project store-user-service), author and release time: - Qinghuasuo 2023-06-21 18:20:46 - , quality score: 86 , blog post suggestion: article quality is good
3. Code
/**
* 获取文章质量分数据
* @throws IOException
*/
void csdnQcBySelenium() {
log.info("csdnQcBySelenium start!");
String blog_url = "https://blog.csdn.net/s445320/article/details/131332238";
System.setProperty("webdriver.chrome.driver", SeleniumUtil.CHROMEDRIVERPATH );// chromedriver localPath
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--remote-allow-origins=*");
chromeOptions.addArguments("–no-sandbox"); //--start-maximized
WebDriver driver = new ChromeDriver(chromeOptions);
driver.get("https://www.csdn.net/qc");
SeleniumUtil.sleep(1000);
//定位到输入框
WebElement inputSelectE = driver.findElement(By.cssSelector("input.el-input__inner"));
//输入文字地址
inputSelectE.sendKeys(blog_url);
SeleniumUtil.sleep(100);
//定位查询按钮
WebElement qcSelectE = driver.findElement(By.cssSelector("div.trends-input-box-btn"));
//点击查询按钮
qcSelectE.click();
SeleniumUtil.sleep(1000);
WebElement mainSelectE = driver.findElement(By.cssSelector("div.csdn-body-right"));
//转化为Jsoup文档处理
Document doc = Jsoup.parse( mainSelectE.getAttribute("outerHTML") );
//获取文章标题
String title = doc.select("span.title").text();
//获取作者和发布时间
String posttime = doc.select("span.name").text();
//获取质量分
String score = doc.select("p.img").text();
//获取博文质量分建议
String remark = doc.select("p.desc").text();
//打印结果
log.info("文章标题:{} , 作者和发布时间:{} , 质量分:{} , 博文建议:{}" , title , posttime , score , remark );
driver.quit();
log.info("csdnQcBySelenium end!");
}
Summarize
The quality score of a single query article ends here