3.11 Python操作数据库（三）

插入操作：

#去掉id,因为它是字增长的

In [ ]:

sql = "insert into 'class'('name') values('高一四班')

cursor = db.cursor()

cursor.excute(sql)

cursor.excute(sql) #可以执行两次

db.commit()

删除操作：

In [ ]:

sql = "delete from 'class' where 'name' = '高一五班'"

cursor = db.cursor() #游标=

cursor = execute(sql) #执行execute语句

db.commit()

更新操作：

In [ ]:

sql = "update 'class' set 'name' = '高一十四班' where'id' = 4;"

cursor = db.cursor()

cursor = execute(sql)

db.commit()

3.12 Python操作数据库（四）

捕捉程序异常

In [3]:

a = 10

b = a + 'hello'

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-3-ce8b6729d737> in <module>()

1 a = 10

----> 2 b = a + 'hello'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

使用 try 语句，Exception捕捉

In [4]:

try:

a = 10

b = a + 'hello'

except Exception as e:

print(e)

#有输出，没报错是因为Exception把异常给捕捉住了

unsupported operand type(s) for +: 'int' and 'str'

#在编程时，要捕捉很清楚的异常，它的异常是 TypeError

In [ ]:

try:

a = 10

b = a + 'hello'、

except TypeError as e:

print(e)

#未知的异常不捕捉，删除：except Exception as e:

print(e)

数据库回滚操作：rollback 回滚操作指：前面已经执行了一些语言，后面执行一个或多个，前面后一个失败，后面都失败，不想再执行前面的

3.13 Python爬虫（一）

爬虫链家网 https://bj.lianjia.com/zufang/ 写爬虫，储存在数据库中

打开像‘今日头条’那样的链接，它是一个接口，结构很清晰，返回的是一个 json 文件的，通过获取外部链接的库，通过 json 的出来，容易获取想要的信息

爬虫：爬取外部页面上的信息（没法从接口的形式获取，所以写爬虫爬取，再从页面中提取想要的信息）

主要使用的Python库：

requests —— 获取页面信息

requests文档学习网址：http://docs.pythonrequests.org/zh_CN/latest/user/quickstart.html

BeautifulSoup —— 提取页面内容（分析页面信息，提出想要的内容）

BeautifulSoup文档学习网址：https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html

目标：

获取链家网页面中的每一个链接；

进入每一个链接中具体的租房页面，再在租房页面获取对应的信息。

#定义一个变量为 url，把租房的链接赋值给这个变量

In [11]: url = 'https://bj.lianjia.com/zufang/'

https://bj.lianjia.com/zufang/

zufang后面跟着的是一个参数，通过这个参数在链家网里面筛选信息，如果不设置参数，是从北京所用的租房信息里面返回信息

安装第三方库：

pip install reqeusts

pip install bs4

（BeautifulSoup 存在 pip install bs4 库里面）

3.14 Python爬虫（二）

In [7]: pip install requests # 翻译：点安装请求

File "<ipython-input-7-74dcce72a708>", line 1

pip install reqeusts

SyntaxError: invalid syntax

导入第三方库:

In [221]:

import requests

from bs4 import BeautifulSoup

# BeaufifulSoup 为了和 requests区分，所以说它来自 bs4 里面

In [222]:

url = 'https://bj.lianjia.com/zufang/'

responce = requests.get(url)

# 通过浏览器直接获得用 get

#requests.post 注册的时候用 post

# responce 这里是结果的意思

soup = BeautifulSoup(responce.text,'lxml')

In [223]: url = 'https://bj.lianjia.com/zufang/'

responce = requests.get(url) # responce 翻译：结果

response

Out[223]: <Response [200]> # 返回 200 ，意味着这个页面成功获取了数据

# 数据存在的地方

In [15]: responce.text

#获得页面全部的htlm代码很多也很乱

Out[15]:

'<!DOCTYPE html><html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge" /><meta http-equiv="Cache-Control" content="no-transform" /><meta http-equiv="Cache-Control" content="no-siteapp" /><meta http-equiv="Content-language" content="zh-CN" /><meta name="format-detection" content="telephone=no" /><meta name="applicable-device" content="pc"><link

网)</title>\n<meta name="description" content="链家北京租房

网,现有真实房屋租赁10765套

………………

1.使用谷歌浏览器右击查看

2.使用 360游览器右击审查元素

（这里没有，应该是电脑系统的问题，右击更多工具开发者工具，出现一个框，它允许分析页面）

In [34]:

url = 'https://bj.lianjia.com/zufang/'

responce = requests.get(url)

soup = BeautifulSoup(rseponce.text,'lxml')

# lxml 定义的一种解析的，不写，会提示：不写也是可以的，但是如果在其他系统运行会出现不同的情况

In [35]:

url = 'https://bj.lianjia.com/zufang/'

responce = requests.get(url)

soup = BeautifulSoup(rseponce.text)

#把刚才的文本结构化了：

In [18]:

soup

Out[18]:

<!DOCTYPE html>

<html><head><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="IE=edge" http-equiv="X-UA-Compatible"/><meta content="no-transform" http-equiv="Cache-

……………………

#（使用soup好看了，但是不是我们想要的元素）找想要的链接元素在div中,（div中含有链接）

In [19]:

soup.find_all('div',class_="")

定义类的时候用到class，class是Python中的关键字，这里为了区分使用class_

目的是找链接，点击图片打开了，说明图片上带着 a 属性的链接

# div 一个框的意思

Out[19]:

[<div class="wrapper "><div class="fl"><a class="logo" href="//www.lianjia.com/" title="链家房产网"></a></div><div class="fr nav "><div class="fl"><ul>

<li>

</li>

……………………

value="1"/></span>我已阅读并同意</label><a class="toprotocol" href="//www.lianjia.com/zhuanti/protocol" target="_blank">《链家用户使用协议》</a></li><li class="li_btn"><a class="register-user-btn"></a>注册</li></ul></form></div>]

找 pic-panel 下面的 a (a ：包含着链接)

# 定义 links_div ，指所有链接的一个框

In [20]:

links_div = soup.find_all('div',class_="pic-panel") #注意pic-panel中间的杠

In [21]:

links_div

#因为前面是find_all，所以它有很多，相当于一个列表

Out[21]:

[<div class="pic-panel"><a href="https://bj.lianjia.com/zufang/101102663605.html" target="_blank"><img alt="西城马甸双朝南精装干净两居室采光充足无遮挡" data-apart-layout="https://image1.ljcdn.com/x-se/hdic-frame/a21770d9-d29b-4797-b732-

………………

version/default_block.png?_v=20180319195424"/></a></div>]

# 找第一个元素

In [22]:

links_div[0]

Out[22]:

links_div[0]出来一个框，里面有很多东西，我们仅从框里面提取一个链接 https://bj.lianjia.com/zufang/101102663605.html 即可，其他的是我们不需要的。

从框列表生成一个链接列表

使用 for 循环中的列表推导式（从一个列表（框列表）生成另一个列表（链接列表））

In [ ]:

links_div = soup.find_all('div',class_="pic-panel")

links = [for div in links_div]

#运行报错

# 想获取 a

In [25]:

links_div[1].a

Out[25]:

In [26]:

links_div[1].a.get('href')

#href是需要的网址的属性

#此时链接就提取出来了

Out[26]:

'https://bj.lianjia.com/zufang/101102657620.html'

# 构建一个新列表

In [29]:

links_div = soup.find_all('div',class_="pic-panel")

links =[div .a.get('href')for div in links_div]

看列表 links 里面储存的内容：

In [30]: links

Out[30]:

['https://bj.lianjia.com/zufang/101102663605.html',

'https://bj.lianjia.com/zufang/101102657620.html',

'https://bj.lianjia.com/zufang/101102627382.html',

'https://bj.lianjia.com/zufang/101102562541.html',

'https://bj.lianjia.com/zufang/101102601891.html',

'https://bj.lianjia.com/zufang/101102612472.html',

'https://bj.lianjia.com/zufang/101102560877.html',

'https://bj.lianjia.com/zufang/101102563962.html',

'https://bj.lianjia.com/zufang/101102565535.html',

'https://bj.lianjia.com/zufang/101102567140.html',

'https://bj.lianjia.com/zufang/101102569458.html',

'https://bj.lianjia.com/zufang/101102595781.html',

'https://bj.lianjia.com/zufang/101102583860.html',

'https://bj.lianjia.com/zufang/101102586630.html',

'https://bj.lianjia.com/zufang/101102589206.html',

'https://bj.lianjia.com/zufang/101102590254.html',

'https://bj.lianjia.com/zufang/101102577318.html',

'https://bj.lianjia.com/zufang/101102577961.html',

'https://bj.lianjia.com/zufang/101102593955.html',

'https://bj.lianjia.com/zufang/101102615116.html',

'https://bj.lianjia.com/zufang/101102601605.html',

'https://bj.lianjia.com/zufang/101102573573.html',

'https://bj.lianjia.com/zufang/101102616903.html',

'https://bj.lianjia.com/zufang/101102402567.html',

'https://bj.lianjia.com/zufang/101102424075.html',

'https://bj.lianjia.com/zufang/101102666239.html',

'https://bj.lianjia.com/zufang/101102627365.html',

'https://bj.lianjia.com/zufang/101102630521.html',

'https://bj.lianjia.com/zufang/101102634527.html',

'https://bj.lianjia.com/zufang/101102658371.html']

In [31]:

# 打印列表的长度

links,len(links)

Out[31]:

(['https://bj.lianjia.com/zufang/101102663605.html',

'https://bj.lianjia.com/zufang/101102657620.html',

'https://bj.lianjia.com/zufang/101102627382.html',

'https://bj.lianjia.com/zufang/101102562541.html',

'https://bj.lianjia.com/zufang/101102601891.html',

'https://bj.lianjia.com/zufang/101102612472.html',

'https://bj.lianjia.com/zufang/101102560877.html',

'https://bj.lianjia.com/zufang/101102563962.html',

'https://bj.lianjia.com/zufang/101102565535.html',

'https://bj.lianjia.com/zufang/101102567140.html',

'https://bj.lianjia.com/zufang/101102569458.html',

'https://bj.lianjia.com/zufang/101102595781.html',

'https://bj.lianjia.com/zufang/101102583860.html',

'https://bj.lianjia.com/zufang/101102586630.html',

'https://bj.lianjia.com/zufang/101102589206.html',

'https://bj.lianjia.com/zufang/101102590254.html',

'https://bj.lianjia.com/zufang/101102577318.html',

'https://bj.lianjia.com/zufang/101102577961.html',

'https://bj.lianjia.com/zufang/101102593955.html',

'https://bj.lianjia.com/zufang/101102615116.html',

'https://bj.lianjia.com/zufang/101102601605.html',

'https://bj.lianjia.com/zufang/101102573573.html',

'https://bj.lianjia.com/zufang/101102616903.html',

'https://bj.lianjia.com/zufang/101102402567.html',

'https://bj.lianjia.com/zufang/101102424075.html',

'https://bj.lianjia.com/zufang/101102666239.html',

'https://bj.lianjia.com/zufang/101102627365.html',

'https://bj.lianjia.com/zufang/101102630521.html',

'https://bj.lianjia.com/zufang/101102634527.html',

'https://bj.lianjia.com/zufang/101102658371.html'],

30)

打印出来是30个信息，我们就获取了第一个页面的30个租房信息；访问其中一个链接，得到的就是一个具体的租房信息

上面是找到一整个页面（共30个）租房信息

Python介绍（16）

3.11 Python操作数据库（三）

3.12 Python操作数据库（四）

3.13 Python爬虫（一）

3.14 Python爬虫（二）

猜你喜欢

Python介绍（16）

3.11 Python操作数据库 （三）

3.12 Python操作数据库 （四）

3.13 Python爬虫 （一）

3.14 Python爬虫 （二）

猜你喜欢

3.11 Python操作数据库（三）

3.12 Python操作数据库（四）

3.13 Python爬虫（一）

3.14 Python爬虫（二）