Yesterday Once More
What is the file
The operating system provides a virtual unit
Step open files
1. Locate the path to the file file_path
2. Open the file Open
3. Read / modify the file Read / Write
4. Save the file the flush
5. The close the file close
Open the file in 3 modes +2 ways
mode
1.w: After emptying written
2.r: read-only do not write
3.a: additional writing
the way
1.b: Binary
2.t: Text
Not recommended for use
1.r +: readable and writeable yet
2.a +: readable and writeable yet
3.w +: readable and writeable and (empty)
with management context
f = open()
f.read()
#自动关闭文件
with open() as f:
f.read()
Reptile principle
By the browser sends a request to get the content; analog transmission request requests by the browser to get content
Reptile process
1. The transmission request (fill a URL)
2. acquire content
3. Filter data you need
Use requests module
import requests
res = requests.get(url)
#文本
res.text
#二进制流
res.content
re module
re.S 全局搜索
data = '<img id = "blogLogo" src = "http://www.baidu.com" alt="返回主页">'
re.findall('src ="(.*?)"',data)从内容中筛选所需要的内容
.*?--你需要什么就把什么(.*?)#80%-90%场景下用.*?