1. Brief introduction of the project
There are many ways for us to obtain the content of the field, but basically we need to ctrl+c (copy) ctrl+v to paste, and then manually change it
Using python as a processing tool will be much faster. The libraries that need to be installed in this project: MySQLdb, pandas, numpy
For example, we want to add "" to each different value and copy the data as shown below
Then we have to add "" and between the fields of each line every time, which is very troublesome, so how to deal with this form of converting it into a string list and remove the list attribute.
2. Content plus symbol formatted output solution
1. Import the library, as is followed by an alias
import MySQLdb
import pandas as pd
import numpy as np
2. pandas read file data:
E:/py/txt/var.txt is the table where I store the data content. Its content is the different values in the brand field in the table extracted from SQL. The original table is the sales data of Jingdong mobile phone for one year:
# 导入文件,并设置index_col=None,
df = pd.read_csv('E:/py/txt/var.txt', index_col=None, engine='python', names=['品牌'])
print(df)
At this time, the data type is DataFrame, which is a two-dimensional table
Among them, there are three types of attribute index_col values, integer type, sequence, Boolean (the default is None)
index_col=None is the value of index, that is, the column uses the default index 0 1 2 3....
index_col=0 The first column is the index value, that is, the content of the first column is used as the index at this time
The result of the operation is as follows:
Brand 0
vivo
1 Glory 2
Xiaomi
3 Apple
4 Newman
5 Huawei
6 realme
7 oppo
8 Samsung
9 Nubia 10 OnePlus
11 Meizu 12 Motorola 13 Others 14 Coolpad
15 Dovey 16 Nokia 17 ZTE 18 Philips 19 Nikain 20 Tianyu 21 Coolby 22 Candy 23 Gionee 24 nzone 25 Black Shark
# 将DataFrame格式转换为数组
array = np.array(df)
3. Loop through the output and format the output
If range(1, 26) exceeds this range, an error will be reported indicating that the index range is exceeded, but it will not affect the running results
IndexError: index 26 is out of bounds for axis 0 with size 26
for i in range(1, 26):
s = df[i-1:i]
array[i] = np.array(s)
print(f"%c{str(*array[i])}%c," % (34, 34))
Among them, s=df[i-1:i] is the loop output of each piece of data in df, similar to the index output of series
array[i] = np.array(s), circularly convert the data into an array and assign
%c is the formatted output character and the output with ASCII code value 34% (34,34)
Output as a list:
# 数组转列表
a_list = array.tolist()
print(a_list)
[['vivo'], ['vivo'], ['Glory'], ['Xiaomi'], ['Apple'], ['Newman'], ['Huawei'], ['realme'], [ 'oppo'], ['Samsung'], ['Nubia'], ['OnePlus'], ['Meizu'], ['Motorola'], ['Others'], ['Coolpad'], ['Duowei'], ['Nokia'], ['ZTE'], ['Philips'], ['Nikane'], ['Tianyu'], ['Coolby'], ['Candy' ], ['Gionee'], ['nzone']]
3. Mysqldb library solves the problem of complex data extraction from the database
1. There must be related libraries, as mentioned above
2. The following is the database output code
# 打开数据库连接
db = MySQLdb.connect("localhost", "root", "489000", "test", charset='utf8')
# 使用cursor()方法获取操作游标
cursor = db.cursor()
# 使用execute方法执行SQL语句
cursor.execute("SELECT VERSION()")
# 使用 fetchone() 方法获取一条数据
version = cursor.fetchone()
print("Database version : %s " % version)
# 循环下标
# Sql预处理语句分组并查询各个品牌
sql = """SELECT 品牌 FROM SHEET1 \
GROUP BY 品牌 """
cursor.execute(sql)
for i in range(1, 10000):
data = cursor.fetchone()
if data is None:
break
else:
print(*data) # 解包输出
# 关闭数据库连接
db.close()
3. In order to ensure the complete range of readings during the cycle, you can set a larger range, and then add judgment conditions on this basis,
Assign the data read by the loop cursor to data. If this is None, the loop will be terminated, that is, if there is no more data to read, the loop will be terminated and output. In order to ensure that the output format is pure data, use *data to unpack this variable
4. The complete code and comparison of the whole process:
# _*_ coding:utf-8 _*_
# @Time : 2022/9/1 9:30
# @Author : ice_Seattle
# @File : testprogram.py
# @Software: PyCharm
import MySQLdb
import pandas as pd
import numpy as np
# 打开数据库连接
db = MySQLdb.connect("localhost", "root", "489000", "test", charset='utf8')
# 使用cursor()方法获取操作游标
cursor = db.cursor()
# 使用execute方法执行SQL语句
cursor.execute("SELECT VERSION()")
# 使用 fetchone() 方法获取一条数据
version = cursor.fetchone()
print("Database version : %s " % version)
# 循环下标
# Sql预处理语句分组并查询各个品牌
sql = """SELECT 品牌 FROM SHEET1 \
GROUP BY 品牌 """
cursor.execute(sql)
for i in range(1, 10000):
data = cursor.fetchone()
if data is None:
break
else:
print(*data) # 解包输出
# 关闭数据库连接
db.close()
# 导入文件,并设置index_col=None,
df = pd.read_csv('E:/py/txt/var.txt', index_col=None, engine='python', names=['品牌'])
# 将DataFrame格式转换为数组
array = np.array(df)
for i in range(1, 26):
s = df[i-1:i]
array[i] = np.array(s)
print(f"%c{str(*array[i])}%c," % (34, 34))
# 数组转列表
a_list = array.tolist()
print(a_list)
Navicat runs SELECT brand, count(brand) from sheet1 group by brand and the results are as follows:
The spoon.bat running program test results in Kettle are as follows
5. Summary
In summary, if you want to extract different content in the field: after writing the python code and running it, it is much faster than Navicat and Kettle, and you can add the " " sign for other data conversion such as lists, reducing complex operations step.