guide | It is often due to the limitation of network transmission that we need to cut large files under the Linux system in many cases. In this way, a large file is cut into multiple small files, transferred, and merged after the transfer is completed. |
1. File cutting - split
It is very convenient to use the split command to cut large files under the Linux system
[1] Command syntax
# -a: Specify the suffix length of the output file name (the default is 2: aa, ab...) # -d: The suffix of the specified output file name is replaced by a number # -l: Line split mode (specify how many lines are cut into a small file; the default line number is 1000 lines) # -b: Binary segmentation mode (support unit: k/m) # -C: file size split mode (try to maintain the integrity of each line when cutting) split [-a] [-d] [-l <number of lines>] [-b <byte>] [-C <byte>] [file to split] [output filename]
[2] Example of use
# line cut file $ split -l 300000 users.sql /data/users_ # use numeric suffix $ split -d -l 300000 users.sql /data/users_ # split by byte size $ split -d -b 100m users.sql /data/users_
[3] Help information
# help information $ split --help Usage: split [OPTION]... [FILE [PREFIX]] Output pieces of FILE to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is 'x'. With no FILE, or when FILE is -, read standard input. Mandatory arguments to long options are mandatory for short options too. -a, --suffix-length=N generate suffixes of length N (default 2) --additional-suffix=SUFFIX append an additional suffix to file names -b, --bytes=SIZE put SIZE bytes per output file size in bytes per output file -C, --line-bytes=SIZE put at most SIZE bytes of records per output file -d use numeric suffixes starting at 0, not alphabetic Use numeric suffixes instead of alphabetic suffixes --numeric-suffixes[=FROM] same as -d, but allow setting the start value -e, --elide-empty-files do not generate empty output files with '-n' do not generate empty output files --filter=COMMAND write to shell COMMAND; file name is $FILE write to shell command line -l, --lines=NUMBER put NUMBER lines/records per output file set the number of lines per output file -n, --number=CHUNKS generate CHUNKS output files; see explanation below to generate chunks files -t, --separator=SEP use SEP instead of newline as the record separator; '\0' (zero) specifies the NUL character -u, --unbuffered immediately copy input to output with '-n r/...' without buffering --verbose print a diagnostic just before each show split progress output file is opened --help display this help and exit display help information --version output version information and exit display version information The SIZE argument is an integer and optional unit (example: 10K is 10*1024). Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000). CHUNKS may be: N split into N files based on size of input K/N output Kth of N to stdout l/N split into N files without splitting lines/records l/K/N output Kth of N to stdout without splitting lines/records r/N like 'l' but use round robin distribution r/K/N likewise but only output Kth of N to stdout GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Full documentation at: <http://www.gnu.org/software/coreutils/split> or available locally via: info '(coreutils) split invocation'
2. File merge - cat
It is also very convenient to use the cat command to merge multiple small files under the Linux system
[1] Command syntax
# -n: display line number # -e: end each line with the $ character # -t: display TAB characters (^I) cat [-n] [-e] [-t] [output file name]
[2] Example of use
# merge files $ cat /data/users_* > users.sql
[3] Help information
# help information $ cat --h Usage: cat [OPTION]... [FILE]... Concatenate FILE(s) to standard output. With no FILE, or when FILE is -, read standard input. -A, --show-all equivalent to -vET -b, --number-nonblank number nonempty output lines, overrides -n -e equivalent to -vE -E, --show-ends display $ at end of each line -n, --number number all output lines -s, --squeeze-blank suppress repeated empty output lines -t equivalent to -vT -T, --show-tabs display TAB characters as ^I -u (ignored) -v, --show-nonprinting use ^ and M- notation, except for LFD and TAB --help display this help and exit --version output version information and exit Examples: cat f - g Output f's contents, then standard input, then g's contents. cat Copy standard input to standard output. GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Full documentation at: <http://www.gnu.org/software/coreutils/cat> or available locally via: info '(coreutils) cat invocation'
3. Reference documents
- Splitting and merging of large files in Linux
- Linux Learning – File Splitting and Merging