This column is based on teacher Yang Xiuzhang’s crawler book "Python Web Data Crawling and Analysis "From Beginner to Proficiency"" as the main line, personal learning and understanding as the main content, written in the form of study notes.
This column is not only a study and sharing of my own, but also hopes to popularize some knowledge about crawlers and provide some trivial crawler ideas.
Column address: Python web data crawling and analysis "From entry to proficiency" For
more crawler examples, please see the column: Python crawler sledgehammer test
Previous article review:
"Python crawler series explanation" 1. Network data crawling overview
"Python crawler series explanation" 2. Python knowledge beginners
"Python crawler series explanation" 3. Regular expression crawler's powerful test
"Python crawler series explanation" 4. BeautifulSoup Technical
"Python Crawler Series Explanation" 5. Use BeautifulSoup to crawl movie information
table of Contents
1.1 MySQL installation and configuration
1.2 Detailed explanation of basic SQL statements
2 Python operating MySQL database
2.1 Install MySQL extension library
2.3 Python calls MySQLdb extension library
3 Python operating SQLite 3 database
Database is a warehouse that organizes, stores and manages data according to data structure. In the database management system, users can perform operations such as adding, deleting, updating, and querying data, thereby transforming them into various data required by users and conducting flexible management. The corpus obtained by crawling Python network data introduced in the previous articles is usually stored in TXT text, Excel or CSV format. This article will focus on the knowledge of MySQL database and the method of Python operating MySQL, and how to retrieve the data Stored in the database, so that data analysis and data statistics are more convenient.
1 MySQL database
Database technology is the core part of various information systems such as information management systems, automated office systems, and sales statistics systems. It is an important technical means for scientific research and decision-making management. Commonly used databases include Oracle, DB2, MySQL, Server, Sybase, VF, etc. Among them, MySQL database has the advantages of excellent performance, good stability, simple configuration, and support for various operating systems.
1.1 MySQL installation and configuration
For the installation and configuration of MySQL, please refer to the blog post: https://www.cnblogs.com/2020javamianshibaodian/p/mysql8020anzhuangjiaocheng.html
The narrative is not expanded here.
1.2 Detailed explanation of basic SQL statements
The most important thing in the database is the SQL Structure Query Language statement, which is a structured query language and a database application language that uses a relational model.
SQL statements are mainly divided into 3 categories, as follows:
DDL (Data Definition Language) statement: database definition language. This statement defines database objects such as different data fields, databases, data tables, columns, and indexes. Commonly used sentence keywords include create, drop, alter, etc.
DML (Data Manioulation Language) statement: database operation statement. This statement is used to insert, delete, update and query database records. It is the most commonly used statement in database operations and can check the integrity of the data. Commonly used statement keywords include insert, delete, update, and select.
DCL (Data Control Language) statement: data control statement. This statement is used to control the permissions and access levels of different data fields, define databases, tables, fields, user permissions, and security levels. Commonly used sentence keywords include grant, revoke, etc.
1.2.1 Display database
show databases
Note : If a database already exists, you can use the use statement directly; if the database does not exist, you need to use the create statement to create the database.
1.2.2 Using the database
If you want to use the existing database bookmanage directly, use the following statement directly:
use bookmanage
1.2.3 Create a database
If you want to create a new database, use the create keyword to create it.
create database course
1.2.4 Create Table
Suppose here that you want to create a books book table, which includes the book number bookid, book name bookname, price price and book date bookdate fields.
create table books(bookid int primary key,
bookname varchar(20),
price float,
bookdate date
)
Among them, the name of the created table is books; the book number is of type int, and the primary key is used to uniquely identify the field of the table; the name of the book is of type varchar and the length is 20; the price is of type float; the date of the book is of type date .
1.2.5 View table information
If you want to see how many tables exist in the current database, use the show keyword.
show tables
According to the above code, there is only one table books currently. If you want to view the definition of the table, use the desc keyword.
desc books
1.2.6 Delete table
If you want to delete the table books, use the drop keyword.
drop table books
1.2.7 Insert statement
After the database and table are created successfully, you need to insert data into the table. The keyword used is insert.
For example, insert information in the table books, the code is as follows:
insert into books(bookid, bookname, price, bookdate) values ('1', '人工智能导论', '88', '2020-07-02')
Use select query to display the results:
select * from books
In the process of executing the insert statement, if all fields are omitted, you only need to correspond to the values one-to-one.
insert into books
value('2', '软件工程导论', '77.7', '2020-07-02')
Similarly, if you point to data inserted into certain fields, you only need to correspond to the values in the same way, such as:
insert into books(bookid, bookname)
value('3', 'Python程序设计语言')
1.2.8 Query statement
The basic syntax format of the query statement is as follows:
select 字段 from 表名 [where 条件]
This statement is used to query the data of the specified field. When the field is a "*" symbol, it is used to query all the instructions in the table; where immediately follows the query condition, this parameter can be omitted.
If all the fields and data in the books table are displayed
select * from books
If you want to display the required fields, you can separate them with commas
select bookid,bookname,price from books
If you need to increase the query conditions, use the where statement. For example, the query number is greater than 1 and the price is not empty
select bookid,bookname,price,bookdate from books where bookid>1;
select bookid,bookname,price,bookdate from books where price is not null;
1.2.9 Update statement
The update statement uses the update keyword.
For example, update "Introduction to Artificial Intelligence" to "Network Data Crawling and Analysis"
update books set bookname= '网络数据爬取及分析' where bookid='1';
Before the update statement is executed:
After the update statement is executed:
1.2.10 Delete statement
The delete statement uses the delete keyword.
For example, to delete data with a blank date, the specific code is as follows:
delete from books where bookdate is null;
Before the delete statement is executed:
After the delete statement is executed:
It is worth noting that when MySql executes DELETE or UPDATE commands, it reports Error Code: 1175. You are using safe update mode and you tried to update a table without a WHERE that uses a KEY column To disable safe mode, toggle the option in Preferences -> SQL Editor and reconnect. Error, this is because MySql is running in safe-updates mode, which will make it impossible to execute update or delete commands under non-primary key conditions, execute the following commands:
SET SQL_SAFE_UPDATES = 0;
Modify the database mode, and then you can continue to execute DELETE/UPDATE.
If you want to change to the safe-updates mode, execute the following commandSET SQL_SAFE_UPDATES = 1;
(Source of error resolution: refer to blog post address: https://blog.csdn.net/m2606707610/article/details/86531526 )
2 Python operating MySQL database
Python access to the database requires a corresponding interface program, which can be understood as a module of Python, which provides an interface for the database client for you to access.
2.1 Install MySQL extension library
pip install mysql
2.2 Program interface DB-API
The Python interface program must comply with the Python DB-API specification. DB-API defines a series of necessary operation objects and database access methods in order to provide consistent access interfaces for various underlying database systems and different database interfaces. Because DB-API provides consistent access interfaces for different databases, it makes it easy to port code between different databases.
2.2.1 Module properties
The definition of the DB-API module is shown in the following table
Module | meaning |
apilevel | Module compatible DB-API version number |
threadsafety | Thread safety level |
paramstyle | Support SQL statement parameter style |
connect | Link database function |
Python calls MySQL needs to import MySQLLdb library, the code is "import MySQLdb".
2.2.2 Link database function
The function to connect to the database is the connect() function, which generates a connect object for accessing the database.
parameter | English meaning | Chinese explanation |
user | Username | database username |
password | Password | Database login password |
host | Hostname | Database hostname |
database | DatabaseName | Database name |
port | Port | Database port number, default 3306 |
dsn | Data source name | Data source name |
The following code shows that Python imports the MySQLdb extension library and calls the connect() function to link the database:
import MySQLdb
conn = MySQLdb.connect(host='localhost', db='MySQL', user='root', passwd='123456', port=3306, charset='utf8')
method | meaning |
close() | Close the database link, or close the cursor object |
commit() | Commit the current transaction |
rollback() | Cancel the current transaction, often called a rollback operation in the database |
cursor() | Create a cursor or cursor-like object |
errorhandler(cxn,errcls,errval) | As a handle to the cursor |
Note: A transaction refers to a series of operations performed as a single logical unit of work, either completely executed or not executed at all, thereby ensuring the integrity and security of the data.
2.2.3 Cursor Object
From the above content, the connect() method is used to provide an interface to connect to the database, but if you want to operate on the database, you need to use a cursor object.
method | meaning |
fetchone() | Fetch a (one) value, that is, get a row of data in the result set |
fetchmany(size) | Take out (fetch) multiple (many) values, where the parameter size is the limit, and get the next few rows of the result set |
fetchall() | Fetch all (all) values |
execute(sql) | Perform database operations, the parameters are SQL statements |
close() | Close the cursor. When the cursor is not needed, close it as much as possible |
2.3 Python calls MySQLdb extension library
The database bookmanage and table books were created earlier to record the book information in the book management system. This section introduces how to display it through Python.
2.3.1 Query database name
To view the database name contained in the local database, use the "show database" statement.
import MySQLdb
try:
# 访问用户 root 的本地 MySQL 数据库
conn = MySQLdb.connect(host='localhost', user='root', passwd='123456', port=3306)
cur = conn.cursor()
# 执行显示所有数据库名称的语句
res = cur.execute('show databases')
print(res)
# 返回结果循环获取
for data in cur.fetchall():
print('%s' % data)
cur.close()
conn.close()
except MySQLdb.Error as e:
print('MySQL Error %d: %s' % (e.args[0], e.args[1]))
If the local database already exists, but the user forgets the name of the database, Silver Fox can query all the databases of Aohan in the local MySQL through this method, and then link the database to perform related operations.
2.3.2 Query table
Here you need to query the contents of the books table in the bookmanage database, the code is as follows:
import MySQLdb
try:
# 连接数据库
conn = MySQLdb.connect(host='localhost', user='root', passwd='123456', port=3306, db='bookmanage', charset='utf8')
# cursor()函数定义游标
cur = conn.cursor()
res = cur.execute('select * from books') # 执行查询操作
print('表中含', res, '条数据\n')
# 获取所有数据
for data in cur.fetchall():
print('%s %s %s %s' % data)
cur.close()
conn.close()
except MySQLdb.Error as e:
print('MySQL Error %d: %s' % (e.args[0], e.args[1]))
We found that the output results are consistent with the results in MySQL.
2.3.3 New Table
Next, create a student table, which is mainly to call the commit() function to submit data and execute the create table statement. The code is as follows:
import MySQLdb
try:
conn = MySQLdb.connect(host='localhost', user='root', passwd='123456', port=3306, db='bookmanage', charset='utf8')
cur = conn.cursor()
sql = "create table students(id int not null primary key auto_increment," \
"name char(30) not null," \
"sex char(20) not null)"
cur.execute(sql)
# 查看表
print('插入后包含表:')
cur.execute('show tables')
for data in cur.fetchall():
print('%s' % data)
cur.close()
conn.commit()
conn.close()
except MySQLdb.Error as e:
print('MySQL Error %d: %s' % (e.args[0], e.args[1]))
2.3.4 Insert data
Inserting data is also achieved by first defining the SQL statement, and then calling the execute() function.
Usually, the new data inserted needs to be assigned by dressing up, and its value is not fixed.
import MySQLdb
try:
conn = MySQLdb.connect(host='localhost', user='root', passwd='123456', port=3306, db='bookmanage', charset='utf8')
cur = conn.cursor()
sql = "insert into students values(%s, %s, %s)"
cur.execute(sql, ('3', 'zzr', '男'))
# 查看表
print('插入数据:')
cur.execute('select * from students')
for data in cur.fetchall():
print('%s %s %s' % data)
cur.close()
conn.commit()
conn.close()
except MySQLdb.Error as e:
print('MySQL Error %d: %s' % (e.args[0], e.args[1]))
3 Python operating SQLite 3 database
SQLite is a lightweight database, a relational database management system that obeys the ACID nature of transactions. It occupies very low resources. It can support mainstream operating systems such as Windows/Linux/Unix. It can also interact with many programming languages such as C3, Combine PHP, Java, Python, etc.
SQLite 3 integrates with Python by applying SQLite 3 modules. The SQLite 3 module provides a SQL interface compatible with the DB-API 2.0 specification. Users can directly use the SQLite 3 module, because Python 2.5.x and above all have their own modules by default.
The usage of SQLite 3 is similar to the MySQLLdb library introduced in the previous article. First, you must create a connection object that represents the database, then selectively create a cursor object, then define the SQL statement execution, and finally close the object and connection.
Module | meaning |
sqlite.connect(...) | Open a connection to the SQLite database file database |
connection.cursor() | Create a cursor, which will be used in Python database programming |
cursor.execute(sql) | Execute a SQL statement, note that the SQL statement can be parameterized |
cursor.executescript(sql) | Once the script is received, multiple SQL statements are executed. SQL statements should be separated by semicolons |
connection.commit() | Commit the current transaction |
connection.rollback() | Roll back to the changes made to the database in the last call to commit() |
connection.close() | Close database connection |
cursor.fetchone () | Get the next row in the query result set, return a single sequence, and return None when there is no more data available |
cursor.fetchmany() | Get the next row of group data in the query result set and return a list |
cursor.fetchall() | 获取查询结果集中所有的数据行,返回一个列表 |
下面介绍的是 Python 操作 SQLite 3 的基础用法(与 MySQLdb 类似),主要内容包括:
- 在本地创建一个 test.db 的数据库文件。
- 执行游标中的 execute() 函数,创建表 PEOPLE,包括的字段有序号、姓名、年龄、公司和薪水,字段涉及各种数据类型。
- 执行插入数据操作,注意需要调用 conn.commit() 函数。
- 执行查询操作,SQL 语句为“"SELECT id, name, age, company, salary from PEOPLE"”,然后通过 for 循环获取查询结果,显示“小杨”、“小颜”、“小红”的信息。
- 执行更新操作并查询数据结果,将序号为“2”的公司信息更改为“华为”。
- 执行删除操作,删除公司名称为“华为”的数据,最后剩下小红的信息。
import sqlite3
# 连接数据库,如果数据库不存在则创建
conn = sqlite3.connect('test.db')
cur = conn.cursor()
print('数据库创建成功\n')
# 创建表 PEOPLE(序号、姓名、年龄、公司、薪水)
cur.execute('create table people'
'(id int primary key not null,'
'name text not null,'
'age int not null,'
'company char(50),'
'salary real);')
print('prople 表创建成功!\n')
conn.commit()
# 插入数据
cur.execute('insert into people(id, name, age, company, salary) '
'values(1, "小杨",26, "华为", 10000.00)');
cur.execute('insert into people(id, name, age, company, salary) '
'values(2, "小颜",26, "百度", 8800.00)');
cur.execute('insert into people(id, name, age, company, salary) '
'values(3, "小红",28, "腾讯", 98000.00)');
conn.commit()
print('插入数据成功!\n')
# 查询操作
cursor = cur.execute('select id, name, age, company, salary from people')
print('数据查询成功!')
print('序号', '姓名', '年龄', '公司', '薪水')
for row in cursor:
print(row[0], row[1], row[2], row[3], row[4])
print('')
# 更新操作
cur.execute('update people set company="华为" where id=2')
conn.commit()
print('数据更新成功!')
cursor = cur.execute('select id, name, company from people')
for row in cursor:
print(row[0], row[1], row[2])
print('')
# 删除操作
cur.execute('delete from people where company="华为";')
conn.commit()
print('数据删除成功!')
cursor = cur.execute('select id, name, company from people')
for row in cursor:
print(row[0], row[1], row[2])
print('')
# 关闭连接
conn.close()
4 本文小结
数据库是按照数据结构来组织、存储和管理数据的仓库。用户可以通过数据库来存储和管理所需的数据,包括简单的数据表格、海量数据等。数据库被广泛应用于各行各业,比如信息管理系统、办公自动化系统、各种云信息平台等。本文为什么要介绍 Python 操作数据库知识呢?一方面,数据爬取、数据存储、数据分析、数据可视化是密不可分的 4 部分,当爬取了相关数据后,需要将其存储至数据库中,这能够更加标准化、智能化、自动化、便捷地管理数据,也为后续的数据分析提供强大的技术支持,能够自定义提取所需数据块进行分析;另一方面,数据库为实现数据共享、实现数据集中控制、保证数据的一致性和可维护性提供保障,所以,学习 Python 操作数据库是非常必要的。
欢迎留言,一起学习交流~
感谢阅读