File read and stored

pandas API support numerous file formats, such as CSV, SQL, XLS, JSON, HDF5.

CSV

  • pandas.read_csv(filepath_or_buffer, sep =',' )
    • filepath_or_buffer: File Path
    • usecols: column name specified read a list
    • sep- default character segmentation ''
# 读取文件,并且指定只获取'open', 'close'指标
data = pd.read_csv("./data/stock_day.csv", usecols=['open', 'close'])
  • to_csv
    • DataFrame.to_csv(path_or_buf=None, sep=', ’, columns=None, header=True, index=True, mode='w', encoding=None)
      • path_or_buf :string or file handle, default None
      • sep :character, default ‘,’
      • columns :sequence, optional
      • mode: 'w': rewriting, 'a' is added
      • index: whether to write index
      • header: boolean or list of string, default True, whether written column index value

HDF5

HDF5 need to read and store file specifies a key value to be stored DataFrame

Read data from files among h5

  • pandas.read_hdf(path_or_buf,key =None,** kwargs)
    • path_or_buffer: File Path
    • key: key to read
    • return:Theselected object
  • DataFrame.to_hdf(path_or_buf, key, \kwargs)
    • key: save the specified key

JSON

  • read_json- read the file
    • orient-- specified read data dictionary format
    • {records- record a line name column: value}
    • lines- whether branches - a recording of a line
  • DataFrame.to_json(path_or_buf=None, orient=None, lines=False)
    • The object store is json format Pandas
    • path_or_buf = None: file address
    • orient: json stored in the form of, { 'split', 'records', 'index', 'columns', 'values'}
    • lines: one line is stored as an object
  • to_json-- store files - Note: lines = True

Preferred to use HDF5 file storage

  • When stored in HDF5 supports compression method used is blosc, this is the fastest pandas also default support
  • Use compression can be improved disk utilization, saving space
  • HDF5 is cross-platform, you can easily migrate to the top hadoop

Guess you like

Origin www.cnblogs.com/oklizz/p/11488677.html