Introduction
text and pictures of this article from the network, only to learn, exchange, not for any commercial purposes, belongs to original author, if any questions, please contact us for treatment.
Author: HOT_and_COOl
Crawlers can use data mining, such as other people can crawl web pages to collect useful data integration and division, it is to use a simple program to crawl all the pictures on the page and save it in your new folder, and you can climb social networking sites selfie map, the hundreds of thousands of pictures together, it looks like the public to know. Crawling data also may be processed to generate a visual thing. Also, if you did not learn python reptile, it is recommended to exchange small series of Python dress: a long time and their weapons while under a stream of thought (digital homonym) conversion can be found, there are new Python Tutorial project, with more than big brother inside communicate with!
II. The page request process
(Note: Editor's use environment for Python3.6.1, python2.x and Python3.x differ on this, 2.x has two urllib and urllib2, and 3.x only urllib)
The main use of this library urllib
Process requested web page simply means that transmits a header information to the server, then returns a message.
You can view the page elements to see
common method used is GET, POST
in the filter messages in advance can be seen to have a parameter is the User-Agent, this is the access request environment, usually the browser, if the application when access is Python3.x, which is not allowed to visit, to prevent malicious access, but there are ways to disguise
III. Simple climb pages
-
import urllib.resquest
-
url= "http://www.baidu.com"
-
response=urllib.resquest.urlopen(url)
-
html=respose.read()
-
for eachline in HTML:
-
print (eachline)
① The first part is the protocol (or referred to as service mode).
② there is a second part of the resources of the host IP address (sometimes also include the port number).
③ The third part is the host specific address resources, such as directories and file names.
IV. An example of a fun translated, so that you understand every minute of the fun of reptiles
-
import urllib.request
-
import urllib.parse
-
import json
-
-
-
the INPUT = Content ( "Please enter the content to be translated: \ n")
-
-
-
url= 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=https://www.baidu.com/link'
-
data={}
-
data[ 'type']='AUTO'
-
data[ 'i']=content
-
data[ 'doctype']='json'
-
data[ 'xmlVersion']='1.8'
-
Data [ 'keyfrom'] = 'fanyi.web'
-
data[ 'ue']='UTF-8'
-
data[ 'action']='FY_BY_CLICKBUTTTON'
-
data[ 'typoResult']='true'
-
-
data=urllib.parse.urlencode(data).encode( 'utf-8')
-
-
response=urllib.request.urlopen(url,data)
-
html=response.read().decode( 'utf-8')
-
-
target=json.loads(html)
-
Print ( 'translation is: S%'% (target [ 'translateResult'] [ 0] [ 0] [ 'TGT']))
-
urllib.request.urlopen(url,data)
data is the request data in Fig.
url is the request URL above figure
!