Shell编程之正则表达式及grep指令

# 实际上，正则表达式完成了数据过滤，将不满足的正则表达式定义的数据拒绝掉，剩下的与正则表达式匹配的数据

# 元字符：shell赋予了它们超越字面意思的意义

# 掌握正则表达式基本元素主要是对正则表达式中元字符意义的掌握

*:匹配前面一个普通字符0次或多次重复
   hel*o:helo hello hellllo都可以匹配
.:匹配任意一个字符
   ...73.：值得注意的是"."符号可以匹配一个空格
^:匹配行首
   ^cloud:匹配以cloud开头的行

# ^...X86*

$:匹配行尾
micky$

# 匹配所有的空行 ^$
# 包含一个字符的行 ^.*

[]:匹配字符集合，该符号支持穷举方法列出字符集合的所有元素，也支持使用'-'符号表示字符集合范围，表明字符集范围从'-'左边符开始，到'-'右边结束
# 匹配任意一个数字
[0123456789]
[0-9]
# 匹配字母
[a-z]
[A-Z]
[b-p]
# ^与[]结合表示取反
# 除了b-d之外的其他字母
[^b-d]
# 匹配所有的英文字母
# 以一个任意的字母开头，再以任意字母进行0次或任意次重复
[A-Za-z] [A-Za-z]*

\<\>:精确匹配符号，该符号利用'\'符号屏蔽'<>'符号
# 精确匹配the
\<the\>

\{\}:与*符号类似，都表示前一个字符的重复，但是*符号表示重复0次或任意次，而\{\}可以指定重复次数
# \{n\}:匹配前面字符出现n次
# \{n,\}:匹配前面字符至少出现n次
# \{n,m\}:匹配前面字符出现n~m次
JO\{3}\B # 重复字符 O 3次
JO\{3,\}\B # 重复字符 O 至少3次
JO\{3,6\}\B # 重复字符 O 3~6次

# 精确匹配5个小写字母
[a-z]\{5\}

awk还支持正则表达式的扩展
?:?之前的那个字符0次或1次
JO?B #JOB JOOB
# 最多可以匹配1个字符

+:匹配+前的字符多次(至少一次与*不同 *可以匹配0次)
S+EU #SSEU SSSSEU SEU不能由S+EU匹配

()和|:结合使用表示一组可选字符的集合
re(a|e|o)d == re[aeo]d
# 在a e o中任意选择一个字符

"""
bash shell本身不支持正则表达式，使用正则表达式的是shell命令和工具，如grep sed awk等 bash shell可以使用正则表达式中的一些元字符实现通配的功能
"""

grep命令
# Linux shell编程从初学到精通 52页

1.-c输出匹配字符串行的数量默认情况下，grep命令打印出包含模式的所有行，一旦加上-c选项，就只显示包含模式行的数量
[root@server1 ~]# grep -c root /etc/passwd
2

2.-n列出所有匹配行，并显示行号

[root@server1 ~]# grep -n root /etc/passwd
1:root:x:0:0:root:/root:/bin/bash
10:operator:x:11:0:operator:/root:/sbin/nologin

3.-v显示不包含模式的所有行

[root@server1 ~]# grep -v root /etc/passwd
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
# 结合-c参数显示不包含关键字的行数
[root@server1 ~]# grep -vc root /etc/passwd
19

4.-i不区分大小写

# grep对大小写敏感
[root@server1 ~]# grep -i root /tmp/passwd
root:x:0:0:root:/root:/bin/bash
ROOT:x:0:0:ROOT:/ROOT:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

5.-s不显示错误信息

[root@server1 ~]# grep -s root dd
[root@server1 ~]# grep  root dd
grep: dd: No such file or directory

6.-r grep命令只对当前目录下的文件进行搜索，而不对子目录中的文件进行搜索 -r选项表示地归搜索

7.-w grep命令的模式是支持正则表达式的，正则表达式的元字符被解释成特殊的含义，-w选项表示匹配整句，即以模式的字面含义去解析它
，因此，grep命令使用-w选项后，元字符不再被解释为特殊的含义

[root@server1 ~]# grep roo* /tmp/passwd 
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
[root@server1 ~]# grep roo* /tmp/passwd 
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
[root@server1 ~]# grep roo* /tmp/passwd 
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
[root@server1 ~]# grep -w roo* /tmp/passwd

8.-x是匹配整行，即只有当前文件有整行内容与模式匹配时，grep命令才输出该行结果

[root@server1 ~]# grep -w 'World' world.txt 
Hello World
World
World Cup
[root@server1 ~]# grep -x 'World' world.txt 
World
[root@server1 ~]# cat world.txt 
Hello World
World
World Cup
Westos
One One world

9.-q grep将不再输出任何结果，而是以退出状态表示搜索成功与否

[root@server1 ~]# grep -q -x 'World' world.txt 
[root@server1 ~]# echo $?
0
[root@server1 ~]# grep -q -x 'World dd' world.txt 
[root@server1 ~]# echo $?
1
[root@server1 ~]# grep -q -x 'World' world
grep: world: No such file or directory

grep和正则表达式结合使用的一组列子
# 以d为后缀，以-开头的行
[root@server1 ~]# grep ^- *d
----------------BEGIN
----------------BEGIN
----------------BEGIN
----------------BEGIN
----------------BEGIN
----------------BEGIN

# 查找空白行(只打印行数)
[root@server1 ~]# grep -c ^$ /tmp/passwd 
3

# 查找非空白行(只打印行数)
[root@server1 ~]# grep -c ^[^$] /tmp/passwd 
22

# 利用[]来实现grep不区分大小写
[root@server1 ~]# grep -n [Rr]oot /tmp/passwd 
1:root:x:0:0:root:/root:/bin/bash
2:Root:x:0:0:Root:/Root:/bin/bash
15:operator:x:11:0:operator:/root:/sbin/nologin

# 以/开头 中间4个任意字符 第六个字符仍为/的行
[root@server1 ~]# grep ^/..../ /tmp/passwd 

sed

我们知道，Vim 采用的是交互式文本编辑模式，你可以用键盘命令来交互性地插入、删除或替换数据中的文本。但本节要讲的 sed 命令不同，它采用的是流编辑模式，最明显的特点是，在 sed 处理数据之前，需要预先提供一组规则，sed 会按照此规则来编辑数据

sed 会根据脚本命令来处理文本文件中的数据，这些命令要么从命令行中输入，要么存储在一个文本文件中，此命令执行数据的顺序如下：
每次仅读取一行内容；
根据提供的规则命令匹配并修改数据。注意，sed 默认不会直接修改源文件数据，而是会将数据复制到缓冲区中，修改也仅限于缓冲区中的数据；
将执行结果输出。

当一行数据匹配完成后，它会继续读取下一行数据，并重复这个过程，直到将文件中所有数据处理完毕

sed 命令的基本格式如下：
[root@localhost ~]# sed [选项] [脚本命令] 文件名sed s 替换脚本命令
此命令的基本格式为：
[address]s/pattern/replacement/flags
其中，address 表示指定要操作的具体行，pattern 指的是需要替换的内容，replacement 指的是要替换的新内容
# 指定 sed 用新文本替换第几处模式匹配的地方
# 可以看到，使用数字 2 作为标记的结果就是，sed 编辑器只替换每行中第 2 次出现的匹配模式

[root@server1 ~]# sed 's/Tue/Tua/2' date.txt 
Tue Tua Dec 17 15:40:54 CST 2019
Tue Tua Dec 17 15:40:57 CST 2019
Tue Tua Dec 17 15:40:57 CST 2019
Tue Dec 17 15:40:59 CST 2019
Tue Dec 17 15:41:00 CST 2019
Tue Dec 17 15:41:01 CST 2019
[root@server1 ~]# cat date.txt 
Tue Tue Dec 17 15:40:54 CST 2019
Tue Tue Dec 17 15:40:57 CST 2019
Tue Tue Dec 17 15:40:57 CST 2019
Tue Dec 17 15:40:59 CST 2019
Tue Dec 17 15:41:00 CST 2019
Tue Dec 17 15:41:01 CST 2019

# 如果要用新文件替换所有匹配的字符串，可以使用 g 标记
[root@server1 ~]# cat date.txt 
Tue Dec 17 15:40:54 CST 2019
Tue Dec 17 15:40:57 CST 2019
Tue Dec 17 15:40:57 CST 2019
Tue Dec 17 15:40:59 CST 2019
Tue Dec 17 15:41:00 CST 2019
Tue Dec 17 15:41:01 CST 2019
[root@server1 ~]# sed 's/Tue/Tua/g' date.txt 
Tua Dec 17 15:40:54 CST 2019
Tua Dec 17 15:40:57 CST 2019
Tua Dec 17 15:40:57 CST 2019
Tua Dec 17 15:40:59 CST 2019
Tua Dec 17 15:41:00 CST 2019
Tua Dec 17 15:41:01 CST 2019
[root@server1 ~]# cat date.txt 
Tue Dec 17 15:40:54 CST 2019
Tue Dec 17 15:40:57 CST 2019
Tue Dec 17 15:40:57 CST 2019
Tue Dec 17 15:40:59 CST 2019
Tue Dec 17 15:41:00 CST 2019
Tue Dec 17 15:41:01 CST 2019

# w 标记会将匹配后的结果保存到指定文件中，比如：
[root@server1 ~]# sed 's/Tue/Tua/w date2.txt' date.txt 
Tua Tue Dec 17 15:40:54 CST 2019
Tua Tue Dec 17 15:40:57 CST 2019
Tua Tue Dec 17 15:40:57 CST 2019
Tua Dec 17 15:40:59 CST 2019
Tua Dec 17 15:41:00 CST 2019
Tua Dec 17 15:41:01 CST 2019
[root@server1 ~]# cat date2.txt 
Tua Tue Dec 17 15:40:54 CST 2019
Tua Tue Dec 17 15:40:57 CST 2019
Tua Tue Dec 17 15:40:57 CST 2019
Tua Dec 17 15:40:59 CST 2019
Tua Dec 17 15:41:00 CST 2019
Tua Dec 17 15:41:01 CST 2019

# 我们知道，-n 选项会禁止 sed 输出，但 p 标记会输出修改过的行，将二者匹配使用的效果就是只输出被替换命令修改过的行，例如：

[root@server1 ~]#  sed -n 's/test/trial/p' data3.txt
This is a trial line.
[root@server1 ~]# cat data3.txt 
This is a test line.
This is a different line.

# 在使用 s 脚本命令时，替换类似文件路径的字符串会比较麻烦，需要将路径中的正斜线进行转义，例如：

[root@server1 ~]# sed 's/\/bin\/bash/\/bin\/csh/' /tmp/passwd 
root:x:0:0:root:/root:/bin/csh
Root:x:0:0:Root:/Root:/bin/csh
ROOT:x:0:0:ROOT:/ROOT:/bin/csh
bin:x:1:1:bin:/bin:/sbin/nologin

daemon:x:2:2:daemon:/sbin:/sbin/nologin

adm:x:3:4:adm:/var/adm:/sbin/nologin

lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
varnishlog:x:998:996:varnishlog user:/dev/null:/sbin/nologin
varnish:x:997:996:Varnish Cache:/var/lib/varnish:/sbin/nologin

# http://c.biancheng.net/view/4028.html

sed d 替换脚本命令
此命令的基本格式为：
[address]d

# 如果需要删除文本中的特定行，可以用 d 脚本命令，它会删除指定行中的所有内容。但使用该命令时要特别小心，如果你忘记指定具体行的话，文件中的所有内容都会被删除，举个例子：

[root@server1 ~]# cat data4.txt 
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
# #什么也不输出，证明成了空文件
[root@server1 ~]# sed 'd' data4.txt
[root@server1 ~]# cat data4.txt 
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog# 通过行号指定，比如删除 data6.txt 文件内容中的第 3 行：

[root@server1 ~]# cat data6.txt 
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
[root@server1 ~]# sed '3d' data6.txt
This is line number 1.
This is line number 2.
This is line number 4.
[root@server1 ~]# cat data6.txt 
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.

# 或者通过特定行区间指定，比如删除 data6.txt 文件内容中的第 2、3行：
[root@server1 ~]# sed '2,3d' data6.txt
This is line number 1.
This is line number 4.

# 在此强调，在默认情况下 sed 并不会修改原始文件，这里被删除的行只是从 sed 的输出中消失了，原始文件没做任何改变

sed a 和 i 脚本命令
a 命令表示在指定行的后面附加一行，i 命令表示在指定行的前面插入一行，这里之所以要同时介绍这 2 个脚本命令，因为它们的基本格式完全相同，如下所示：

[address]a（或 i）\新文本内容

# 将一个新行插入到数据流第三行前，执行命令如下：

[root@server1 ~]# sed '3i\
> This is an inserted line.' data6.txt
This is line number 1.
This is line number 2.
This is an inserted line.
This is line number 3.
This is line number 4.

# 再比如说，将一个新行附加到数据流中第三行后，执行命令如下：

[root@server1 ~]# sed '3a\
> This is an appended line.' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is an appended line.
This is line number 4.

# 如果你想将一个多行数据添加到数据流中，只需对要插入或附加的文本中的每一行末尾（除最后一行）添加反斜线即可，例如

[root@server1 ~]# sed '1i\
> This is one line of new text.\
>  This is another line of new text.' data6.txt
This is one line of new text.
 This is another line of new text.
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.

# 可以看到，指定的两行都会被添加到数据流中

sed c 替换脚本命令
c 命令表示将指定行中的所有内容，替换成该选项后面的字符串。该命令的基本格式为：

[address]c\用于替换的新文本
[root@server1 ~]# sed '3c\
> This is a changed line of text.' data6.txt
This is line number 1.
This is line number 2.
This is a changed line of text.
This is line number 4.

# 在这个例子中，sed 编辑器会修改第三行中的文本，其实，下面的写法也可以实现此目的

[root@server1 ~]# sed '/number 3/c\
> This is a changed line of text.' data6.txt
This is line number 1.
This is line number 2.
This is a changed line of text.
This is line number 4.

sed y 转换脚本命令
y 转换命令是唯一可以处理单个字符的 sed 脚本命令，其基本格式如下：

[address]y/inchars/outchars/

转换命令会对 inchars 和 outchars 值进行一对一的映射，即 inchars 中的第一个字符会被转换为 outchars 中的第一个字符，第二个字符会被转换成 outchars 中的第二个字符...这个映射过程会一直持续到处理完指定字符。如果 inchars 和 outchars 的长度不同，则 sed 会产生一条错误消息

[root@server1 ~]# cat data6.txt 
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
[root@server1 ~]#  sed 'y/123/789/' data6.txt
This is line number 7.
This is line number 8.
This is line number 9.
This is line number 4.

# 可以看到，inchars 模式中指定字符的每个实例都会被替换成 outchars 模式中相同位置的那个字符

# 转换命令是一个全局命令，也就是说，它会文本行中找到的所有指定字符自动进行转换，而不会考虑它们出现的位置，再打个比方：
[root@server1 ~]# echo "This 1 is a test of 1 try." | sed 'y/123/456/'
This 4 is a test of 4 try.
# sed 转换了在文本行中匹配到的字符 1 的两个实例，我们无法限定只转换在特定地方出现的字

sed p 打印脚本命令
p 命令表示搜索符号条件的行，并输出该行的内容，此命令的基本格式为：
[address]p

p 命令常见的用法是打印包含匹配文本模式的行，例如：

[root@server1 ~]# cat data6.txt 
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
[root@server1 ~]# sed -n '/number 3/p' data6.txt
This is line number 3.
# 可以看到，用 -n 选项和 p 命令配合使用，我们可以禁止输出其他行，只打印包含匹配文本模式的行

sed w 脚本命令
w 命令用来将文本中指定行的内容写入文件中，此命令的基本格式如下：
[address]w filename

# 这里的 filename 表示文件名，可以使用相对路径或绝对路径，但不管是哪种，运行 sed 命令的用户都必须有文件的写权限

# 将数据流中的前两行打印到一个文本文件中：

[root@server1 ~]# sed '1,2w test.txt' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
[root@server1 ~]# cat test.txt 
This is line number 1.
This is line number 2.

当然，如果不想让行直接输出，可以用 -n 选项，再举个例子：

[root@server1 ~]# sed -n '/Browncoat/w Browncoats.txt' data11.txt
[root@server1 ~]# cat Browncoats.txt 
Blum, R       Browncoat
Bresnahan, C  Browncoat
[root@server1 ~]# cat data11.txt 
Blum, R       Browncoat
McGuiness, A  Alliance
Bresnahan, C  Browncoat
Harken, C     Alliance

# 可以看到，通过使用 w 脚本命令，sed 可以实现将包含文本模式的数据行写入目标文件

sed r 脚本命令
r 命令用于将一个独立文件的数据插入到当前数据流的指定位置，该命令的基本格式为：
[address]r filename

# sed 命令会将 filename 文件中的内容插入到 address 指定行的后面

[root@server1 ~]# cat data12.txt 
This is an added line.
This is the second added line.
[root@server1 ~]#  sed '3r data12.txt' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is an added line.
This is the second added line.
This is line number 4.

sed q 退出脚本命令
q 命令的作用是使 sed 命令在第一次匹配任务结束后，退出 sed 程序，不再进行对后续数据的处理# 可以看到，sed 命令在打印输出第 1 行之后，就停止了，是 q 命令造成的

[root@server1 ~]# sed '1q' test.txt
This is line number 1.
[root@server1 ~]# cat test.txt 
This is line number 1.
This is line number 2.

以数字形式指定行区间
当使用数字方式的行寻址时，可以用行在文本流中的行位置来引用。sed 会将文本流中的第一行编号为 1，然后继续按顺序为接下来的行分配行号。
在脚本命令中，指定的地址可以是单个行号，或是用起始行号、逗号以及结尾行号指定的一定区间范围内的行。这里举一个 sed 命令作用到指定行号的例子：

[root@server1 ~]# cat data4.txt 
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
[root@server1 ~]# sed '2s/dog/cat/' data4.txt
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy cat
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog

# 可以看到，sed 只修改地址指定的第二行的文本。下面的例子中使用了行地址区间

[root@server1 ~]# sed '2,3s/dog/cat/' data4.txt 
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy cat
The quick brown fox jumps over the lazy cat
The quick brown fox jumps over the lazy dog

在此基础上，如果想将命令作用到文本中从某行开始的所有行，可以用特殊地址——美元符（$）：

[root@server1 ~]#  sed '2,$s/dog/cat/' data4.txt
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy cat
The quick brown fox jumps over the lazy cat
The quick brown fox jumps over the lazy cat

silence-1

发布了102 篇原创文章 · 获赞 14 · 访问量 2413

私信关注

Shell编程之正则表达式及grep指令

猜你喜欢