Shell programming-sort sort, uniq ignores duplication, tr replaces compression and deletes, cut specifies delete fields, regular expression metacharacters

Insert picture description here

sort command

  • sort command-sort the contents of the file by the end of the line, or according to different data types
  • Grammatical format
    Insert picture description here
    Insert picture description here
  • Common options:
    • -f: Ignore case, uppercase letters are sorted by default
    • -b: Ignore the spaces in front of each line
    • -n: sort by number
    • -r: reverse sort
    • -u: equivalent to uniq, which means that only one line of the same data is displayed
    • -t: Specify the field separator, and use the [tab] key to separate by default
    • -k: Specify the sort field
    • -o <output file>: dump the sorted results to the specified file
  • Option effect verification ( theories are copied from each other and the same set of practice is the truth )
    • Add nothing: blank lines are displayed in the front by default, and spaces between characters are ignored and then sorted by the first character. The priority is space>number size>letters a to z (lowercase takes precedence over uppercase)
      Insert picture description here
      Insert picture description here

    • -f parameter: sort by the first character including spaces, the priority is space>number size>letters a to z (uppercase is preferred to lowercase)
      Insert picture description here

    • -n parameter: sort the number size
      Insert picture description here

    • -r parameter: can be used in conjunction with other parameters to reverse the results for sorting
      Insert picture description here

    • -u parameter: for the same data
      Insert picture description here

    • -t -k parameter: if the specified separator is added after -t, the default is [tab] key and then specify the paragraphs for comparison and sorting
      Insert picture description here

    • View the space disk usage
      du -a | sort -n -r
      Insert picture description here

uniq command

  • uniq command-used to report or ignore consecutive repeated lines in the file, often used in conjunction with the sort command
  • Grammatical format
    Insert picture description here
    Insert picture description here
  • Common options:
    • -c: Perform calculations and delete repetitive lines in the file
    • -d: display only consecutive repeated lines
    • -u: only display lines that appear once
  • Practice the truth
    • Do not add parameters: display only one continuous repeated line
      Insert picture description here

    • Because the removal of duplication is not thorough enough, it can be used with the sort command
      Insert picture description here

    • -c parameter: count the number of consecutive repeated rows
      Insert picture description here

    • -d parameter: only display consecutively repeated lines
      Insert picture description here

    • -u parameter: only display non-continuously repeated lines
      Insert picture description here

tr command

  • tr command-commonly used to replace, compress and delete characters from standard input
  • Grammatical format
    Insert picture description here
  • Common options :
    • -c: reserved character set 1 characters, other characters (including newline \n) are replaced with character set 2
    • -d: delete all characters belonging to character set 1
    • -s: Compress the repeated character string into one character string: replace character set 1 with character set 2
    • -t: character set 2 replaces character set 1, the same result without options
  • parameter:
    • Character set 1: Specify the original character set to be converted or deleted. When performing the conversion operation, you must use the parameter "Character Set 2" to specify the target character set for conversion. But when performing the delete operation, the parameter "Character Set 2" is not required
    • Character set 2: Specify the target character set to be converted
  • Practice the truth
    • No parameter and -t: the same effect as the t parameter, replace character set 2 with character set 1
      Insert picture description here
    • -c parameter: the content specified in character set 1 is not replaced, and the other content is replaced with the content of character set 2 including line breaks
      Insert picture description here
    • -d parameter: delete all characters specified in character set 1
      Insert picture description here
    • -s parameter: character set 1 with the character to be deduplicated for deduplication, if there is a character set with character set 2, then the replacement operation will be performed after deduplication
      Insert picture description here
  • Delete blank lines
    • echo -e "aa\n\n\n\nbb" | tr -s "\n"
    • cat test.txt | tr -s "\n"
      List item
  • Replace the colon ":" in the path variable with a newline character\n
    • echo $PATH | tr -s ":" "\n"
    • echo -e "aa\n\n\n\n\nbb" | tr -s "\n" ":"
      Insert picture description here
  • '^M'Characters caused by deleting Windows files
    • cat abc.txt | tr -s "\r" "\n" > abc1.txt
    • cat abc.txt | tr -d "\r" > abc2.txt
    • yum install -y dos2 unix
    • dos2 unix adc.txt
    • When encountering a line feed in Linux, the ("\n")carriage return + line feed operation will be performed. Instead, the carriage return character will only be ("^M")displayed as a control character, and the carriage return operation will not occur. In Windows, ("\r\n")only carriage return + line feed can be used for carriage return + line feed. If a control character is missing or the order is not correct, a new line cannot be correctly started.
  • Array sort
    • echo ${array[@]} |tr ' ' '\n' | sort -n > abc.txt
      Insert picture description here

cut command

  • cut command-display the specified part of the line, delete the specified field in the file
  • Syntax format:
    Insert picture description here
    Insert picture description here
  • Common options:
    • -f: Extract by specifying which field. The cut command is used "TAB"as the default field separator
    • -d: "TAB"is the default delimiter, use this option to change to other delimiters
    • --complement : This option is used to exclude the specified field
    • --output-delimiter : change the delimiter of the output content
  • Practice practice again practice
    • -df parameter: d specifies the separator as: f extracts the first paragraph
      cut -d ':' -f 1 /etc/passwd
      Insert picture description here
      grep '/bin/bash' /etc/passwd | cut -d ':' -f 1-4,6,7
      Insert picture description here
  • --complement parameter:
    grep '/bin/bash' /etc/passwd | cut -d ':' --complement -f 1-4
    Insert picture description here
  • --output-delimiter parameters:
    Insert picture description here

Regular expression

  • Regular expression-usually used in judgment statements to check whether a string satisfies a certain format
  • Regular expressions are composed of ordinary characters and metacharacters
    • Common characters include uppercase and lowercase letters, numbers, punctuation marks and some other symbols
    • Metacharacters refer to special characters with special meaning in regular expressions, which can usher in the appearance mode of the leading character (that is, the character before the metacharacter) in the target object.

Basic regular expression common metacharacters

  • Support: grep, egrep, sed, awk
    • \: escape character , used to cancel the meaning of special symbols, for example: \!、\n、\$etc.
    • ^: The position at the beginning of the matching string , for example:^a、 ^the、 ^#、^[a-z]
    • $: match the position of the end of the string , for example:word$、 ^$匹配空行
    • .: Match any character except \n, for example:go.d、 g..d
    • *: Match the preceding sub-expression 0 or more times , for example:goo*d、 go.*d
    • [list]: Match a character in the list , for example:go[ola]d, [abc]、 [a-z]、 [a-z0-9]、 [0-9]匹配任意一位数字
    • [^list]: Match any character in a non-list list , for example:[^0-9]、 [^A-20-9]、 [^a-z]匹配任意一位非小写字母
    • \{n\}: Match the preceding sub-expression n times , for example:go\{2\}d、 '[0-9]\{2\} '匹配两位数字
    • \{n,\}: The sub-expression before the match is not less than n times , for example:go\{2, \}d、'[0-9]\{2, \}'匹配两位及两位以上数字
    • \{n,m\}: Match the preceding sub-expression n to m times , for example:go\{2,3\}d、 ' [0-9]\{2,3\}'匹配两位到三位数字
    • Note : When egrep and awk use {n}, {n, small, {n, m} to match, there is no need to add "\" before "{}"

Extended regular expression metacharacters

  • Support: egrep, awk
    • +: Match the preceding sub-expression more than once , for example:go+d, 将匹配至少一个o, 如god、 good、 goood等
    • ?: Match the preceding sub-expression 0 or 1 times , for example:go?d, 将匹配gd或god
    • (): Take the string in parentheses as a whole , for example:g(oo)+d," 将匹配oo整体1次以上,如good、gooood等
    • |: Match the string of words in an or manner , for example:g (oo|la)d," 将匹配good或者glad

example

  • Practice practice again practice
  • Only display mobile phone numbers starting with 13, 15 can be displayed in the form of three digits, one space, and four digits, one space
    egrep "^1[35][0-9] ?[0-9]{4} ?[0-9]{4}$" 1.txt
    Insert picture description here
    Insert picture description here
  • E-mail format, the user name begins with the letters requirements, can be used by at most two intermediate symbols -or ., the end symbol can not be used, the user name of at least 6 characters in length
  • 用户名@子域名.顶级域名
  • The length of the top-level domain name string is generally 2 to 5
  • @ sohu.com
  • @ qq.com
  • @ 163.com
  • @ wo.cn
  • @ sina.com.cn
    Insert picture description here

用户名@ : ^[a-zA-Z][a-zA-Z0-9\-\.]{4,}[a-zA-Z0-9]@
子域名: [a-zA-Z0-9_\-\.]+
.顶级域: \.([a-zA-Z]{2,5})$

egrep "^[a-zA-Z][a-zA-Z0-9\-\.]{4,}[a-zA-Z0-9]@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$" email.txt
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_53496398/article/details/114747522