Mysql data acquired using the Pandas
Mysql operating in python, though convenient, but frequently get data from the server, the efficiency is very low. Try a bit today to extract data from Mysql database with pandas, really convenient, but also acquire data once after using pandas rich variety of analysis functions is also handy.
- First, establish a database connection, the establishment of a common conn as a Connection object
import pymysql
import pandas as pd
class cardb():
conn = None
def connect_db(self):
db_host = "XXXX"
user = "XXXX"
pw = "XXXX"
try:
# 创建连接数据库
self.conn = pymysql.connect(db_host, user, pw, "yiche_car_info", use_unicode=True, charset='utf8mb4')
return self.conn
except Exception as e:
print("数据库连接异常!错误%s", e)
return None
- pandas connection is established using the acquired data
try:
self.conn.ping(reconnect=True)
except Exception as e:
print("%s" % e)
return None
sql=“select * from viewcarinfo”
df = pd.read_sql_query(sql % (str_info, str_viewname), con=self.conn)
print(df)
The data obtained, a dataframe format. Line by line index is called index, a column is a column like form.
pz_id cartype_id pz_name ... 车型级别 车身型式 前大灯
0 m139120 m4758 2020款1.2L手动超值版 ... 小型车 两厢 卤素
1 m111122 m2790 2014款1.3L标准版 ... 小型车 三厢 卤素
2 m133807 m3067 2019款1.5L手动进取版 ... 小型车 三厢 LED
3 m133409 m4758 2019款1.2LAMT舒适版 ... 小型车 两厢 卤素
4 m139423 m3167 2020款1.4L手动焕新版 ... 小型车 三厢 卤素
.. ... ... ... ... ... ... ...
343 m129677 m4586 2018款5.3L手自一体白宫一号4座 ... 全尺寸SUV SUV LED
344 m136613 m3859 2020款6.0TW12标准版 ... 豪华车 三厢 LED
345 m131852 m4373 2019款S680双调典藏版 ... 豪华车 三厢 矩阵LED
346 m132800 m2078 2019款GT6.0TW12敞篷版 ... 豪华车 敞篷车 矩阵LED
347 m125538 m3044 2017款6.8T手自一体长轴距版 ... 豪华车 三厢 氙气
- First establish that you need to save the dictionary in python json
json_res = {}
json_res["item_name"] = []
json_res["item_option"] = {}
json_res["item_value"] = []
json_res["car_prosys_name"] = car_prosys_name
json_res["car_prosys_value"] = []
json_res["car_prosys_series_value"] = []
json_res["car_pricezone_name"] = car_pricezone_name
json_res["car_pricezone_value"] = []
json_res["car_pricezone_series_value"] = []
- Analysis of data with pandas
usage data query: df.loc is positioned rows of data, coupled with data filtering conditions can be achieved.
A. filtration or equal to the data size filter
item="卤素"
item1 = df.loc[df["前大灯"] == item]
== equal to, greater than less than <,> you can filter out the data rows that satisfy all of equal size condition or value.
n1 = item1 [str_feild] .count ( )
statistics of the number of lines of "halogen" == All headlights.
B. Fuzzy text search filters
and fuzzy search text can utilize this function .str.contains
as
item="卤素"
item2 = df.loc[df["前大灯"].str.contains(item)]
All can count all the rows with halogen.
n2 = item2[str_feild].count()
C. Joint criteria to
the statistics of the number of lines all headlamps with "halo" of.
You can also find a joint two or more conditions, such as "&" is and effect, and "|" is or effect.
item3 = df.loc[(df[str_feild] == item) & (df[prosys_name] == ps_name)]
n3 = item2[str_feild].count()
Statistical satisfying df [str_feild] == item and df [prosys_name] == ps_name) data columns.
- Dictionary is derived json format, can be used for flask, Django data source.
save_path = sys.path[0] + json_path + str_feild_id + ".json"
with open(save_path, 'w') as wr:
json.dump(json_res, wr)
- Analysis of data with pandas
pandas start, there are many uses, can be said to be an upgraded version of numpy, for more numpy arrays and matrices, and pandas for more
advanced data processing and similar excel sql can also directly interface with the database , and there are multiple access and export formats (a common
such csv, excel, json), can be said that a large data processing tool.