tail 和 head
- see the first several lines of the file README.txt
head README.txt # see first 10 lines of the README.txt
head -n 50 README.txt # see first 50 lines of the README.txt
- see last several lines of the file README.txt
tail README.txt # see the last 10 lines of file README.txt
tail -30 README.txt | head -10 # first ten lines of the last 30 lines
Word count (wc command)
- get number of lines in the file
wc -l README.txt # calculate how many lines in the file README.txt
split
split -l 1000 README.txt # split file into sub_files that have 1000 lines each
shuffle
- shuffle 数据集然后分割,机器学习做数据集的时候应该很有用!
shuf MQ2007 | split -l 1000 # shuffle and then split
cat
cat file | grep 'carneige mellon university' # show all lines with 'cmu' to command line
uniq
- Linux uniq 命令用于检查及删除文本文件中重复出现的行列,一般与 sort 命令结合使用 --摘自‘菜鸟教程’
sort testfile1 | uniq -c # 先对文件排序,在去重,在标准输出显示每一行重复的次数