get请求
百度搜索界面爬取示例:
先查看网址规律,一个是搜索关键词,一个是页面:
再查看要爬取的内容,有两种形式::
import urllib.request
import re
keyword = "python"
keyword = urllib.request.quote(keyword)
for i in range(1,10):
url = "http://www.baidu.com/s?wd=" + keyword+"&pn=" + str((i-1)*10)
data = urllib.request.urlopen(url).read().decode("UTF-8")
pat1 = '"title":"(.*?)",'
pat2 = "title:'(.*?)',"
r1 = re.compile(pat1).findall(data)
r2 = re.compile(pat2).findall(data)
for n in range(0,len(r1)):
print(r1[n])
for m in range(0,len(r2)):
print(r2[m])
爬取结果:
post请求
表单测试网址:https://www.iqianyue.com/mypost/
import urllib.request
import urllib.parse
posturl = "https://www.iqianyue.com/mypost/"
postdata = urllib.parse.urlencode({
"name": "123",
"pass": "456",
}).encode("utf-8")
req = urllib.request.Request(posturl, postdata)
r = urllib.request.urlopen(req).read().decode("utf-8")
print(r)
Requests get/post
requests库相对于来说使用更加方便:
Get:
import requests
url="http://www.baidu.com"
r=requests.get(url).content.decode()
print(r)
Post:
import requests
url="https://www.iqianyue.com/mypost/"
postdata={"key","value"}
r = requests.post(url,data=postdata).content.decode()
print(r)