Guys, do you remember PowerShell?

background

While I was having lunch, a colleague suddenly sent me a link and asked me if I could pull down the pictures on this website (ps. they are all open resources and do not involve piracy, privacy and other sensitive issues).

I looked at a total of 400 pictures, and thought that it might be faster than writing a crawler manually, so I replied to her.

After I finished replying, I suddenly thought of an old friend - powershell. I thought that if I use powershell, it may be completely done with just a few lines of code. There is no need to configure the environment or anything, just use it and go (only for Windows systems) ).

Then come and try it

Ideas

First of all, not all the images on this website are loaded, which involves paging.

I tried it on postman, and the paging interface is also public.

Then this matter is simple! I have thought of 2 options here,

  • The first is to request data while downloading, that is, request a page of data, then extract the image path attribute, and finally execute the download, looping until the download is completed.

  • The second one is that I first download all the paging data into a local json file, so that the data source is the local json file, and then read the json file and obtain the image address one by one for downloading.

The second option is equivalent to separating the process of sorting data and downloading files, and finally decided to use the second option.

Download paginated data

The request method of powshell is actually very simple, but the actual case is not of reference value, because for crawler applications or scripts, your target website is often inconsistent. If my request code is applicable, there is a high probability that it will not be used. Whatever works for you, you still need to manually participate in fine-tuning.

Because it involves the information of the target website, I will not post my data acquisition script here. I will post a test script. You can modify it according to actual needs.

# 设置API的URL和请求参数  
$apiUrl = "http://example.com/api/data"  
$pageSize = 100  
  
# 发送HTTP请求并获取响应  
$response = Invoke-WebRequest -Uri $apiUrl -Method Get  
  
# 将响应转换为JSON对象  
$json = $response.Content | ConvertFrom-Json  
  
# 获取总记录数和已获取的记录数  
$totalRecords = $json.total_records  
$recordsReturned = $json.data.length  
  
# 计算需要获取的页数  
$pageCount = [math]::Ceiling($totalRecords / $pageSize)  
  
# 循环获取每一页的数据,并将结果存储到数组中  
$data = @()  
for ($i = 1; $i -le $pageCount; $i++) {  
    $url = "{0}?page={1}&per_page={2}" -f $apiUrl, $i, $pageSize  
    $response = Invoke-WebRequest -Uri $url -Method Get  
    $json = $response.Content | ConvertFrom-Json  
    $data += $json.data  
}  
  
# 将获取到的数据转换为JSON格式,并保存到本地文件  
$jsonOutput = $data | ConvertTo-Json -Depth 100  
$jsonOutput | Out-File -FilePath "tt.json"

Read json file to download pictures

After sorting it out, you can read the file and download it. There is nothing much to say about this step. I will just paste a few pieces of code here.

$Path = 'tt.json'
$json_data = Get-content -Raw -Path $Path | ConvertFrom-Json
$cnt = 0
$save = 'C:\Users\Administrator\Desktop\tt\'
foreach ($item in $json_data.pages) {
 # 目标网站缺少协议头,powershell不能自动设定缺省值下载,这里咱给它补上! 
 $uri = 'https:' + $item.origin_img 
 # 存这里 
 $savePath = $save + $item.pic_name 
 # 走你 
 Invoke-WebRequest -Uri $uri -OutFile $savePath $cnt++
}
Write-Output $cnt

After execution, the download effect is like this

Then, just wait for it to slowly download.

If the download volume is large, you also need to pay attention to the download interval, and you may also want to consider automatically changing the proxy IP. These are beyond the scope of this article.

Summarize

In fact, this part of the code is the least important. It is mainly to remind Windows developers that the POWRshell that comes with our system is also a very powerful and efficient development tool. When faced with some simple and repetitive tasks, , you can consider it, very convenient!

If you don't remember the syntax of PowerShell, it's really not a problem at this point! Why? Because there are big models! Whether it is chatgpt or domestic Wen Xinyiyan, etc., you can quickly generate powershell sample code. We only need to make simple modifications! Very silky~

Okay, that's it.

This article was simultaneously published in the InfoQ community: Veterans, do you still remember PowerShell? _PowerShell_Take salt for yourself_InfoQ writing community

Guess you like

Origin blog.csdn.net/juanhuge/article/details/132536497