overview
requests can simulate a browser to initiate a network request of the HTTP or HTTPS protocol to obtain the source code of the web page
The main methods of initiating network requests are get() and post() in requests. The function of get() is to initiate a request to obtain a web page, and post() is to transmit data to the server and is often used to simulate user login.
1. Obtain the source code of the static web page
Open the Baidu webpage and print the source code of the webpage
import requests as re
rp = re.get(url='https://www.baidu.com')
print(rp.text)
operation result
2. Get dynamically loaded data
A dynamic webpage is a webpage template returned by the server. In the template filled with data through Ajax or other methods, the required data is generally in the JSON format data package returned by the server.
The distinction between dynamic and static: If the web page will load more data as the browser scrolls down, then this is dynamic
import requests as re
header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}
url='https://movie.douban.com/j/chart/top_list'
params={'type':'25','interval_id':'100:900','action':'','start':'0','limit':'1'}
rp = re.get(url=url,headers=header,params=params)
r=rp.json()
print(r)
3. Get pictures
When obtaining the source code, first use get() to obtain the response object, and then use the text property of the response object to extract the source code of the web page. But if you want to get a picture, you also use get() to get the response object first, but you can't use the text attribute to extract the binary bytecode of the image. You should use the content attribute to extract the image.
import requests
url = ''
response = requests.get(url = url)
content = response.content
with open('图片.jpg', 'wb') as fp:
fp.write(content)