Regular expressions and text processors for Shell programming (grep, sort, uniq, tr, cut)


Regular Expression Concept

REGEXP: Regular Expressions, a pattern written by a class of special characters and text characters, some characters (metacharacters) do not represent the literal meaning of characters, but represent control or wildcard functions, similar to the enhanced wildcard function, but with Wildcards are different. The wildcard function is used to process file names, while regular expressions are used to process characters in text content.

Regular expressions are widely supported by many programs and development languages: vim, less, grep, sed, awk, nginx, mysql, etc.

The role of regular expressions

Mainly used to match strings (command results, text content)

Wildcards are only used to match file names, directory names, etc., and cannot be used to match file contents. (and an existing file or directory)

Wildcards are mainly for the convenience of users to describe files or directories. For example, when users only need files ending with ".sh", using wildcards can be easily realized.
Each version of the shell has wildcards. These wildcards are some special characters. Users can use these characters in the parameters of the command line to match file names or path names.
The shell will take all file names or path names that match the matching rules specified in the command line as parameters of the command, and then execute the command.

*: Wildcard matches any one or more characters
ls *.txt

?: wildcard
match any character
[root@localhost opt]# ls ?.txt

[] wildcard
[list] matches any single character in the list
ls [az].txt

metacharacter

.Match any single character, which can be a Chinese character
[] Match any single character within the specified range, example: [dn] [0-9] [] [a-zA-Z] [:alpha:] [^] Match the
specified Any single character outside the range, example: [^dn] [^az]
[:alnum:] letters and numbers [0-9] [az] [0-9] [az]
[:alpha:] represents any English size Write characters, that is, [AZ], [az]
[:lower:] lowercase letters, example: [[:lower:]], equivalent to [az]
[:upper:] uppercase letters [AZ]
[[:blank: ]] Blank characters (spaces and tabs)
[:space:] Including spaces, tabs (horizontal and vertical), newlines, carriage returns, etc. Canton
[:cntrl:] non-printable control characters (backspace, delete, alarm bell...)
[:digit:] decimal digits
[:xdigit:] hexadecimal digits
[:graph:] printable non-blank characters
[ :print:] printable characters
[:punct:] punctuation marks
\w #matching word components, equivalent to [ [:alnum:]]
\W #matching non-word components, equivalent to [^
[:alnum: ]]
\S # Match any non-blank character. Equivalent to [^ \f\n\r\t\v].
\s #Match any whitespace characters, including spaces, tabs, form feeds, etc. Equivalent to [ \f\n\r\t\v].

Metacharacter dot (.)

[root@localhost ~]#ls /etc/|grep rc[.0-6]
The ls /etc/ command will list all files and subdirectories under the /etc/ directory, and the
| symbol is used to output its output as the grep command enter.
The grep command uses the regular expression rc[.0-6] to match the file or directory name containing rc plus numbers (0-6) in the file name, where the
square brackets [] represent the character set, and [.0-6] represent the match . and any one of these characters 0-6.
#The dots here represent characters

The grep command looks for a specified string in text

grep [options]... find conditional object files

options:

-m  # 匹配#次后停止
grep -m 1 root /etc/passwd   #多个匹配只取第一个

-v 显示不被pattern匹配到的行,即取反
grep -Ev '^[[:space:]]*#|^$' /etc/fstab

-i 忽略字符大小写  #可有可无
-n 显示匹配的行号
-c 统计匹配的行数
-o 仅显示匹配到的字符串
-q 静默模式,不输出任何信息
-A # after, 后#行 
-B # before, 前#行
-C # context, 前后各#行
-e 实现多个选项间的逻辑or关系,如:grep –e ‘cat ' -e ‘dog' file
-w 匹配整个单词
-E   使用ERE,相当于egrep,使用扩展正则
-F   不支持正则表达式
-f   file 根据模式文件,处理两个文件相同内容 把第一个文件作为匹配条件
-r   递归目录,但不处理软链接
-R   递归目录,但处理软链接

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

insert image description here

insert image description here

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

insert image description here
insert image description here
insert image description here

表示次数
*  #匹配前面的字符任意次,包括0次,贪婪模式:尽可能长的匹配
.* #任意长度的任意字符,不包括0次,也就是匹配所有
\? #匹配其前面的字符出现0次或1次,即:可有可无
\+ #匹配其前面的字符出现最少1次,即:肯定有且 >=1 次
\{n\}   #匹配前面的字符=n次
\{m,n\} #匹配前面的字符至少m次,至多n次
\{,n\}  #匹配前面的字符至多n次,<=n 
\{n,\}  #匹配前面的字符至少n次

Example:
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

sort command sorting

Sort the file content in line units, and also sort according to different data types

Syntax format:
sort option parameter
cat file | sort option

Commonly used options:
-f: Ignore case, the default will be uppercase letters in front
-b: Ignore spaces in front of each line
-n: Sort by numbers
-r: Reverse sort
-u: Equivalent to uniq, indicating that the same data only Display one line, deduplicate
-t: Specify the field separator, the default is to use the tab key to separate
-k: Specify the sorting field
-o <output file>: Transfer the sorted results to the specified file

insert image description here
insert image description here
insert image description here
insert image description here

The uniq command is quick to deduplicate

The uniq command is used to report or ignore consecutive duplicate lines in a file, and is often used in conjunction with the sort command.

Format:
uniq [option] parameter
​cat file | uniq option
-c counts the number of consecutive repeated lines, and merges repeated lines
-u displays lines that occur only once (including non-consecutive repeated lines)
-d displays only repeated occurrences of lines (must be consecutive repeating lines)

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

The tr command replaces, compresses, and deletes

Commonly used to replace, compress, and delete characters from standard input

Syntax format:
tr option parameter

Common options:
-c: Keep the characters of character set 1, and replace other characters (including newline \n) with character set 2
-d: Delete all characters belonging to character set 1
-s: Compress repeated strings into A character string, replace character set 1 with character set 2
-t: replace character set 1 with character set 2, and do not add it

parameter:

Character set 1:
Specify the original character set to be converted or deleted. When performing a conversion operation,
the parameter "Character Set 2" must be used to specify the conversion operation, and the parameter "Character Set 2" must be used to specify the target character set for conversion.
But when performing a delete operation, the parameter "charset 2" is not required

Character set 2:
Specify the target character set to convert to

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

cut command quick crop command

expr substr interception method

insert image description here

cut interception method

Intercept and trim the fields
Format:
Format 1: cut [option] Parameter
Format 2: cat file | cut [option]

-d specifies the delimiter (the default delimiter is Tab)
-f intercepts by field. Specify the nth field;
-b intercepts in bytes
-c intercepts in characters
–complement excludes the specified fields
–output-delimiter changes the delimiter of the output content

insert image description here
insert image description here
insert image description here

split command file split

The split command is used to split a large file into several small files under Linux.

Format: split option parameter original file split file name prefix
-l specifies the number of lines
-b specifies the size of the file

insert image description here
insert image description here

paste command file merge

Merge files according to fields
Format:

paste [options] file1 file2
-d is used to specify the delimiter of the file (tab "\n" by default)
-s exchange the contents of columns and rows

Key point: the difference between paste ab and cat ab?
cat is up and down merge, paste is left and right splicing

Interview questions count the connection status of the current host
insert image description here
Count the number of currently connected hosts

insert image description here

eval variable scanner

If you add eval before the command word, the shell will scan it twice before executing the command. The eval command will first scan the command line for all replacements, and then execute the command. This command is suitable for variables that cannot be realized by one scan. The command scans the variable twice.

Script application test:

#!/bin/bash
 
#这是一个验证eval扫描的脚本
a=100
b=a
 
echo  "普通echo输出的变量b的值为:"  \$$b
eval echo "经过eval扫描输出变量b的值为:" \$$b

insert image description here

position anchor

^ #行首锚定, 用于模式的最左侧
$ #行尾锚定,用于模式的最右侧
^root$    #用于模式匹配整行 (单独一行  只有root)
^$ #空行
^[[:space:]]*$ #空白行

< or \b #beginning anchor, for the left side of the word mode (continuous numbers, letters, underscores are counted as inside the word) >
or \b #end anchor, for the right side of the word mode
<root> # match whole word
insert image description here
insert image description here
insert image description here

group or other

Grouping: () bundle multiple characters together and treat them as a whole
or: \ |

insert image description here

Extended regular expressions

Indicates that the characters are not much different

grep -E must use sed -r
or
egrep

Show times

*   匹配前面字符任意次
?   01+   1次或多次
{
    
    n} 匹配n次
{
    
    m,n} 至少m,至多n次
{
    
    ,n}  #匹配前面的字符至多n次,<=n,n可以为0
{
    
    n,}  #匹配前面的字符至少n次,<=n,n可以为0

Indicates grouping

() Grouping
Grouping: () bundle multiple characters together and treat them as a whole, such as: (root)
| or
a|b #a or b
C|cat #C or cat
(C|c)at #Cat or cat

insert image description here
insert image description here
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/ll945608651/article/details/129762615