pandas API support numerous file formats, such as CSV, SQL, XLS, JSON, HDF5.
CSV
- pandas.read_csv(filepath_or_buffer, sep =',' )
- filepath_or_buffer: File Path
- usecols: column name specified read a list
- sep- default character segmentation ''
# 读取文件,并且指定只获取'open', 'close'指标
data = pd.read_csv("./data/stock_day.csv", usecols=['open', 'close'])
- to_csv
- DataFrame.to_csv(path_or_buf=None, sep=', ’, columns=None, header=True, index=True, mode='w', encoding=None)
- path_or_buf :string or file handle, default None
- sep :character, default ‘,’
- columns :sequence, optional
- mode: 'w': rewriting, 'a' is added
- index: whether to write index
- header: boolean or list of string, default True, whether written column index value
- DataFrame.to_csv(path_or_buf=None, sep=', ’, columns=None, header=True, index=True, mode='w', encoding=None)
HDF5
HDF5 need to read and store file specifies a key value to be stored DataFrame
Read data from files among h5
- pandas.read_hdf(path_or_buf,key =None,** kwargs)
- path_or_buffer: File Path
- key: key to read
- return:Theselected object
- DataFrame.to_hdf(path_or_buf, key, \kwargs)
- key: save the specified key
JSON
- read_json- read the file
- orient-- specified read data dictionary format
- {records- record a line name column: value}
- lines- whether branches - a recording of a line
- DataFrame.to_json(path_or_buf=None, orient=None, lines=False)
- The object store is json format Pandas
- path_or_buf = None: file address
- orient: json stored in the form of, { 'split', 'records', 'index', 'columns', 'values'}
- lines: one line is stored as an object
- to_json-- store files - Note: lines = True
Preferred to use HDF5 file storage
- When stored in HDF5 supports compression method used is blosc, this is the fastest pandas also default support
- Use compression can be improved disk utilization, saving space
- HDF5 is cross-platform, you can easily migrate to the top hadoop