Sort, uniq, tr commands and shell regular expressions and related exercises

1. The sort command (sort the contents of files)

1. Overview: Sort the contents of files in units of rows, or according to different data types

2. French format:

sort [选项] 参数
cat file | sort 选项

3. Common options

  • -f: Ignore case, uppercase letters are sorted first by default
  • -b: Ignore the spaces in front of each line
  • -n: sort by number
  • -r: reverse sort
  • -u: equivalent to uniq, which means that only one line of the same data is displayed
  • -t: Specify the field separator, and use the [tab] key to separate by default
  • -k: Specify the sort field
  • -o <output file>: dump the sorted results to the specified file

4. Examples
Insert picture description here
Insert picture description here
Insert picture description here

-u shows only one line of the same data
Insert picture description here

Deduplication is performed on the premise of sorting after sorting.
Insert picture description here

-t is generally used with -k
Arrange the contents of the /etc/passwd file in uid (third field) order

sort -t ":" -k 3 /etc/passwd

Use with du
Insert picture description here

-b Ignore spaces in each line
Insert picture description here

Two, uniq command

==1. Overview: Used to report or ignore consecutive repeated lines in a file, often combined with the sort command ==
2. Syntax format:

uniq [选项] 参数
cat file | uniq 选项

3. Common options:

  • -c: count and delete repetitive lines in the file
  • -d: display only consecutive repeated lines
  • -u: only display lines that appear once
    Insert picture description here

Continue to use the pipe symbol
Insert picture description here

Three, tr command

1. Overview: often used to replace, compress and delete characters from standard input
Syntax format:

tr [选项] 参数

2. Common options:

  • -c: reserved character set 1 characters, other characters (including newline \n) are replaced with character set 2
  • -d: delete all characters belonging to character set 1
  • -s: Compress the repetitive character string into a character string, and replace character set 1 with character set 2
  • -t: character set 2 replaces character set 1, the same result without options

3. Parameters:
Character set 1:
Specify the original character set to be converted or deleted. When performing a conversion operation, the parameter "Character Set 2" must be used to specify the conversion operation, and the parameter "Character Set 2" must be used to specify the target character set for the conversion. But when executing the delete operation, the parameter "character set 2" is not required
Character set 2:
Specify the target character set to be converted

4. Examples

echo abc | tr 'a-z' 'A-Z'
echo abc | tr 'ac'  'AZ'

Insert picture description here

Use of -c
Insert picture description here
Insert picture description here

-d delete character set 1

echo 'hello world' | tr -d 'od'

-s compresses a string into a string, or compresses character set 1 into 1 and replaces a character set with character set 2.

echo "thissss is a text 1 innnnnnne." | tr -s 'sn'
echo "thissss is a text 1 innnnnnne." | tr -s 'sn' 'AB'

Insert picture description here

删除空行
echo -e "aa\n\n\n\n\nbb" | tr -s "\n"
cat testfile4| tr -s "\n"

把路径变量中的冒号":",替换成换行符"\n"实现每行输出
echo $PATH | tr -s ":" "\n"

以冒号分割
echo -e "aa\n\n\n\n\nbb" | tr -s "\n" ":"

Delete the "^M" character in the Windows file "According to"

1.第一种方法
cat abc.txt | tr -s "\r" "\n" > new_file
或
cat abc.txt | tr -d "\r" > new_file

cat -v abc.txt

2.第二种方法
yum install -y dos2unix
dos2unix abc.txt

When encountering a newline character ("\n") in Linux, the carriage return + line feed operation will be performed. Instead, the carriage return character will only be displayed as a control character ("^M"), and the carriage return operation will not occur. In Windows, only carriage return + line feed ("\r\n") can be used to enter carriage return + line feed. If a control character is missing or the order is not correct, a new line cannot be correctly started.

Array sort
If the algorithm is more complicated, using sort and tr combined with the pipe symbol will be simpler

arr=(3 5 6 2 1 7)
echo ${arr[@]} | tr " " "\n" | sort -n | tr "\n" " "     

Insert picture description here

Four, cut command

1. Cut overview
Display the specified part in the line, delete the specified field in the file
2. Syntax format:

 cut 选项参数
 cat file | cut 选项 

2. Common options

  • -f: Extract by specifying which field. The cut command uses "tab" as the default field separator
  • -d: "TAB" is the default separator, use this option to change to other separators
  • --Complement: This option is used to exclude the specified field
  • --Output-delimiter: change the delimiter of the output content
cut -d ':'-f 1 /etc/passwd
grep '/bin/bash' /etc/passwd | cut -d ':' -f 1-4,6,7       #以,分隔的开始字段和结束字段指定字段的范围(显示1-4字段和第6第7字段)
grep '/bin/bash' /etc/passwd | cut -d ':' --complement -f 2      #排除第二个字段
cut -d ':' -f1,7 --output-delimiter=' ' /etc/passwd              #输出分隔符使用空格分隔

Five, regular expressions

It is usually used in judgment sentences to check whether a string satisfies a certain format.
Regular expressions are composed of ordinary characters and metacharacters.
Ordinary characters include uppercase and lowercase letters, numbers, punctuation marks and some other symbols
. Metacharacters refer to Special characters with special meaning in regular expressions, which can be used to specify the appearance mode of its leading character (that is, the character before the metacharacter) in the target object

1. Common metacharacters of basic regular expressions (supported tools: grep, egrep, sed, awk)

\ :转义字符,用于取消特殊符号的含义,例: \!、\n、\$等

^ :匹配字符串开始的位置,例: ^a、 ^the、 ^#、^[a-z]

$ :匹配字符串结束的位置,例: word$、 ^$匹配空行

. :匹配除\n之外的任意的一个字符,例: go.d、 g..d

* :匹配前面子表达式0次或者多次,例: goo*d、 go.*d

[list] :匹配list列表中的一个字符,例: go[ola]d, [abc][a-z][a-z0-9][0-9]匹配任意一位数字

[^list] :匹配任意非list列表中的一个字符,例: [^0-9][^A-20-9][^a-z]匹配任意一位非小写字母

\{
    
    n\} :匹配前面的子表达式n次,例: go\{
    
    2\}d、 '[0-9]\{2\} '匹配两位数字

\{
    
    n,\} :匹配前而的子表达式不少于n次,例: go\{
    
    2, \}d、'[0-9]\{2, \}'匹配两位及两位以上数字

\{
    
    n,m\} :匹配前面的子表达式n到m次,例: go\{
    
    2,3\}d、 ' [0-9]\{2,3\}'匹配两位到三位数字

注: egrep、 awk使用{
    
    n}{
    
    n,小、{
    
    n, m}匹配时“{
    
    }”前不用加“\” 

2. Extended regular expression metacharacters (supported tools: egrep, awk)

+ :匹配前面子表达式1次以上,例: go+d, 将匹配至少一个o, 如god、 good、 goood等

? :匹配前面子表达式0次或者1次,例: go?d, 将匹配gd或god

() :将括号中的字符串作为一个整体,例1: g(oo)+d," 将匹配oo整体1次以上,如good、gooood等

| :以或的方式匹配字条串,例: g (oo|la)d," 将匹配good或者glad

3. Example email address matching

The user name requires the beginning of a letter, and up to 2 symbols can be used in the middle, and the end of the symbol cannot be used. The length of the user name is at least 6 characters
@sohu.com
@qq.com
@163.com
@wo.cn
@sina.com.cn
username @子domain.top-
level domain The length of the top-level domain string is generally 2 to 5

egrep ^[a-zA-Z0-9]{
    
    1,}[a-zA-Z0-9\.\-]{
    
    4,}[a-zA-Z0-9]@[a-zA-Z0-9_\.\-]+\.([a-zA-Z]{
    
    2,5})$

4. Screen phone numbers

Insert picture description here

egrep ^[1][3|5][0-9][ ]?[0-9]{
    
    4}[ ]?[0-9]{
    
    4}$   number.txt

Guess you like

Origin blog.csdn.net/weixin_53567573/article/details/114803228