Python binary file read and convert to a float Detailed

As used herein the environment:

Python 3.6.5 |Anaconda custom (64-bit)|

introduction

For some reason, you need to read binary files in python, this is mainly used struct package, and this package inside the main method is to unpack, pack, calcsize. Details can be seen: Python Struct official documents . Here the main discussion, python Binary floating point operations.

python in a float number is four bytes.

Binary data is transferred float, can struct.unpack () is achieved.

Small file read

Smaller files, the ability to read:

First import required packages:

?
1
2
3
import numpy as np
import struct
Python

For example: I need to read a file called filename, kept the shape [100,1025] floating-point file. The following measures can be adopted

?
1
2
3
4
5
6
# 加载测试数据
f = open ( 'filename' , 'rb' )
# 102500为文档中包含的数字个数,而一个浮点数占4个字节
data_raw = struct.unpack( 'f' * 102500 ,f.read( 4 * 102500 ))
f.close()
verify_data = np.asarray(verify_data_raw).reshape( - 1 , 1025 )

Large file handling methods

I need to deal with file sizes have 38.1G, kept the [10000000,1025] vector size.

For dealing with large files, I made reference to this article , however, this method is not well to convert the binary file into a floating-point number.

So I thought of another way:

Cutting through the Linux file command

By split command 38.1G files in the specified size cut,

?
1
split -b 820000k -a 2 filename data_

The code means is specified for each block size 820000k, -a 2 represents 2-digit name, 'data_' is representative of the prefix 'data_'

49 finally generated files (lexicographically aa - bw), the file 48 before the last row of each file 169,600 204,800 line

By reading the file python cycle

First, build vocabulary:

?
1
2
3
4
5
voc = [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' , 'h' , 'i' , 'j' , 'k' , 'l' ,
'm' , 'n' , 'o' , 'p' , 'q' , 'r' , 's' , 't' , 'u' , 'v' , 'w' , 'x' ,
'y' , 'z' ]
voc_short = [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' , 'h' , 'i' , 'j' , 'k' , 'l' ,
'm' , 'n' , 'o' , 'p' , 'q' , 'r' , 's' , 't' , 'u' , 'v' ]

For easy to read, converted into a binary file 49 numpy proprietary binary format * .npy

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
for i in voc:
data_name = 'data_a' + str (i)
f = open (data_name, 'rb' )
data_raw = struct.unpack( 'f' * 209920000 ,f.read( 4 * 209920000 ))
f.close()
data = np.asarray(data_raw).reshape( - 1 , 1025 )
np.save(data_name + '.npy' ,data) # 保存data_a*.npy文件
for i in voc_short:
data_name = 'data_b' + str (i)
f = open (data_name, 'rb' )
data_raw = struct.unpack( 'f' * 209920000 ,f.read( 4 * 209920000 ))
f.close()
data = np.asarray(data_raw).reshape( - 1 , 1025 )
np.save(data_name + '.npy' ,data) # 保存data_b*.npy文件
data_name = 'data_bw'
f = open (data_name, 'rb' )
data_raw = struct.unpack( 'f' * 173840000 ,f.read( 4 * 173840000 ))
np.save(data_name + '.npy' ,data_raw) # 保存data_bw.npy文件

Guess you like

Origin www.cnblogs.com/orzhangz/p/11088076.html