lxml and proxy ip

pip install lxml

Package guide
the From lxml Import etree
1. Note that this is a local html can be used directly etree.parse
2. html_etree = etree.parse ( "the test.html") Print (html_etree)
3. Html_etree.xpath ( "// Li" ) // is the direct print out all properties li
4. the following overview of all li html_etree.xpath class of values (li @ / @ class)
5. the tag html_etree.xpath obtain all span under li ( "/ / li // span ") / span is used to get the child and the child elements are not li element
6. All acquired li below a label inside the All class html_etree.xpath (" // li / a // @ class ")
7. the html.xpath ( '// li [last ()] / a / @ href') # predicate [last ()] can find the last element of the last acquired li href attribute of a value corresponding to
8 obtaining penultimate is [last () - 1 / a ] li acquires the content of the penultimate element


directly open to use the read response data in the local case file without it
# html_etree = etree.HTML (HTML)
/ is selected from the root node.
// Select the document matches the selected node from the current node, regardless of their location.
. Select the current node.
.. select the parent node of the current node.
@ Select Properties.

1) is generally used html_etree.xpath ( "// div [@ class = ''] / A // text ()")
2) using text () can be in the form of text data is not in the way print data is printed out

CookieJar
1. first you need to import import http.cookiejar
then to generate this method (so you can not write in the headers inside the value of the cookie)
# create the cookie object can help us save the server writes a cookie to the browser content
cookie http.cookiejar.CookieJar = ()
# to create a handler object using cookie object
handler = request.HTTPCookieProcessor (cookie)
# opner create objects using an object handler
opener = request.build_opener (handler)
Finally, you can directly open opened on the line:
REQ1 = request.Request (URL = LOGIN_URL, headers = headers)
response1 = opener.open (REQl)
Print (response1.read (). decode ( "UTF-. 8"))

Proxy ip
# build free agent
# Proxy = {
# "HTTP": "49.86.183.163:9999",
# "HTTPS": "49.86.183.163:9999"
#}
(here we use the free ip, (fast proxy), ! the company usually has a proxy pool)
proxy = {
"HTTP": "HTTP: // 18,632,229,371: [email protected]: 28803",
"HTTPS": "HTTP: // 18,632,229,371: [email protected]: 28803 "
}
1. Handler = ProxyHandler method request.ProxyHandler (proxies = proxy) request the
opener = request.build_opener (handler) method of generating this

Guess you like

Origin www.cnblogs.com/liuxiaomo/p/11967036.html