Test learning-110- test data batch generation, and generate CSV file (full version)

Foreword:

In the last two blogs, we explained how to use each attribute of the data construction framework Faker. Today we will use Faker to generate data and save it to a CSV file for use.

Commitment: This blog promises that the displayed code can be copied immediately. No routine

1. ID generation problem solved

Although our last two articles talked about the use of Faker, although Faker is powerful, it seems to have a flaw. There is no ID generation method. We know that no matter any data is structured, it has a master code ID, which is the unique identification of this piece of data. But I seem to find that there is no ID generation method in Faker. Therefore, in order to solve this defect, I also wrote the ID construction method while using the Faker framework.

Not much nonsense, go directly to the code, I have made a detailed description of each function. I believe anyone with some programming skills can understand it.

1. ID generation function

def Generate_ID(num):
    '''
    1、生成ID递增的数据,以baseHead为数据前半部分,自定义的selfAdd递增的为尾部数据
    2、两者合并为一个string类型的数据,存入IDlist中备用
    :param num:要生成的数据个数
    :return: 返回num个数据list形式
    '''

    BaseHead = "65010000001190000001022017121910051599983015"
    IDlist = []

    for i in range(num):
        selfAdd = 1001
        selfAdd = selfAdd + i
        result = BaseHead + str(selfAdd)
        IDlist.append(result)
    # print(IDlist)
    return IDlist

'''最终ID 例子 650100000011900000010220171219100515999830151001'''

The above code is for the preparation of ID generation function. BaseHead is the foundation, and the selfAdd division dynamically changes. Then the two merge.

2. Data generation complete version (copy and use)

# coding:utf8
from faker import Faker
import random
import datetime
import pandas as pd

'''
引入faker库来生成随机假数数据
faker 对于中文 地址 姓名 颜色 时间等等之类的数据构造有很好的支持
Faker 对于ID的构造没有很好的支持
'''
# zh_CN 为中文数据
faker = Faker("zh_CN")

def Generate_ID(num):
    '''
    1、生成ID递增的数据,以baseHead为数据前半部分,自定义的selfAdd递增的为尾部数据
    2、两者合并为一个string类型的数据,存入IDlist中备用
    :param num:要生成的数据个数
    :return: 返回num个数据list形式
    '''

    BaseHead = "65010000001190000001022017121910051599983015"
    IDlist = []

    for i in range(num):
        selfAdd = 1001
        selfAdd = selfAdd + i
        result = BaseHead + str(selfAdd)
        IDlist.append(result)
    # print(IDlist)
    return IDlist

#------------------------------以上是生成ID的方法------------------------------------------

def Generate_oteher_data(IDlist, num):
    '''
    1、以IDlist中的数据作为第一列数据
    2、以Faker框架随机生成其他数据
    3、两者最终合并存入一一对应的列表中

    :param IDlist: 上个函数写好的ID数据
    :param num: 要生成的数据个数
    :return: 最终数据,包含各个属性
    '''
    otherDatalist = []
    for i in range(num):
        name = faker.name()
        phone = faker.phone_number()
        address = faker.address()
        country = faker.country()
        cityName = faker.city_name()
        province = faker.province()
        date = faker.date()
        otherDatalist.append([IDlist[i], name, phone, address, country, cityName, province, date])
    # print(otherDatalist)
    return otherDatalist


def gettime():
    '''
    :return: 当前时间的规范形式
    '''
    now_time = datetime.datetime.now().strftime('%Y-%m-%d %H-%M-%S')
    return now_time


def dataMerager_toCSV(otherDatalist):
    '''
    1、定义好列名
    2、列名与 数据内容合并
    3、使用pandas框架生成CSV文件

    :param otherDatalist: 数据内容
    :return: 数据生成CSV文件
    '''
    outputfile = 'data/测试数据' + gettime() + '.csv'
    # 定义列名
    columns = ['ID', 'name', 'phone', 'address', 'country', 'cityName', 'province', 'date']
    Data = []
    Data.append(columns)
    Data.extend(otherDatalist)  # 合并列名与数据
    # print(Data)
    # 生成csv数据文件
    dataframe = pd.DataFrame(Data)
    dataframe.to_csv(outputfile, encoding='utf-8', index=False, header=0)


if __name__ == '__main__':
    # 生成1000条数据
    Data = Generate_oteher_data(Generate_ID(1000), 1000)
    dataMerager_toCSV(Data)

Note: Don't forget to add a folder named data to your Python project structure.

3. Data sample display. Use jupyter notebook

import pandas as pd
df = pd.read_csv('./测试数据2020-11-23 17-11-55.csv')
df.head(1000)

Is the data very "true"? You can leave a message if you have any questions, and you must reply when you see it.

 

 

Guess you like

Origin blog.csdn.net/u013521274/article/details/110005978