Linux：shell脚本进阶（sed/gawk）

sed命令进阶

保持空间和模式空间

Sed进阶篇实例应用

gawk命令进阶

#本文章为学习笔记

sed命令进阶

1>n命令：移动到下一行

小写n命令会告诉sed编辑器移动到数据流下一行进行处理

1.1 n命令原理

n命令简单来说就是提前读取下一行，覆盖模型空间前一行（并没有删除，因此依然打印至标准输出），如果命令未执行成功（并非跳过：前端条件不匹配），则放弃之后的任何命令，并对新读取的内容，重头执行sed。

例子：

从aaa文件中取出偶数行

cat aaa

This is 1

This is 2

This is 3

This is 4

This is 5

sed -n 'n;p' aaa //-n表示隐藏默认输出内容

This is 2

This is 4

注释：读取This is 1，执行n命令，此时模式空间为This is 2，执行p，打印模式空间内容This is 2，之后读取 This is 3，执行n命令，此时模式空间为This is 4，执行p，打印模式空间内容This is 4，之后读取This is 5，执行n 命令，因为没有了，所以退出，并放弃p命令。

因此，最终打印出来的就是偶数行。

2>N命令：合并文本行

大写N会将下一文本行添加到模式空间中已有的文本后，这样两个文本行合并到同一个模式空间中，文本行仍然用换行符分隔。如果要在文件中查找可能分散在两行中的文本，这个是很实用的功能。

2.1 N命令原理：

N命令简单来说就是追加下一行到模式空间，同时将两行看做一行，但是两行之间依然含有\n换行符，如果命令未执行成功（并非跳过：前端条件不匹配），则放弃之后任何命令，并对新读取的内容，重头执行sed。

例子：

从aaa文件中读取奇数行

cat aaa

This is 1

This is 2

This is 3

This is 4

This is 5

sed -n '$!N;P' aaa

This is 1

This is 3

This is 5

注释中1代表This is 1 2代表This is 2 以此类推

注释：读取1，$!条件满足（不是尾行），执行N命令，得出1\n2，执行P，打印得1，读取3，$!条件满足（不是尾行），执行N命令，得出3\n4，执行P，打印得3，读取5，$!条件不满足，跳过N，执行P，打印得5

2.2 配置示例

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.

$ sed '/first/{N;s/\n/ /}' data2.txt
this is the header line.
this is the first data line. this is the second data line.
this is the last line.

例2
$ cat data3.txt
on Tuesday,the linux system
administrator's group meeting will be held.
all system administrators should attend.
thank you for your attendance.

$ sed 'N;s/system.administrator/desktop user/' data3.txt
on Tuesday,the linux desktop user's group meeting will be held.
all desktop users should attend.
thank you for your attendance.

3>只删除前一行的命令，D命令

当有多行匹配出现时，D命令只会删除模式空间中的第一行

3.1 D命令工作原理

D命令是删除当前模式空间开端至\n的内容（不在传至标准输出），放弃之后的命令，但是对剩余模式空间重新执行sed。

D命令例子

从aaa文件中读取最后一行

cat aaa

This is 1

This is 2

This is 3

This is 4

This is 5

sed 'N;D' aaa

This is 5

注释：读取1，执行N，得出1\n2，执行D，得出2，执行N，得出2\n3，执行D，得出3，依此类推，得出5，执行N，条件失败退出，因无-n参数，故输出5

3.2 配置示例


$ cat data5.txt

this is the header line.
this is a data line.

this is the last line.

例1:
$ sed '/^$/{N;/header/D}' data5.txt
this is the header line.
this is a data line.

this is the last line.

例2:打印最后两行
sed '$!N;$!D' data5.txt

this is the last line.

4>只打印前一行，P命令

当有多行匹配出现时，P命令只会打印模式空间中的第一行

$ sed -n 'N;/system\nadministrator/P' data3.txt
on Tuesday,the linux system

5>排除命令（!）

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.

$ sed -n '/headeer/!p' data2.txt
this is the first data line.
this is the second data line.
this is the last line.

6>分支。用于改变数据流

格式：[address]b [:label]

address参数决定了哪些行的数据会触发分支命令，label参数定义了要跳转到的位置。

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.
$ sed '{2,3b;s/this is /is this/;s/line./test?/}' data2.txt
is this the header test?
this is the first data line.
this is the second data line.
is this the last test?

例2
$ sed '{/first/b jump1;s/this is the/no jump on/
>:jump1
>s/this is the/jump here on/}' data2.txt

no jump on header line
jump here on first data line
no jump on second data line
no jump on last line

例3
$ sed -n '{
> ：start
> s/,//1p
> b start
> }'

7>测试（test）命令。用来改变sed编辑器脚本的执行流程，相当于逻辑运算符里面的“或”

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.
$ sed '{
>s/first/matched/
>t
>s/this is the/no match on/
}' data2.txt
no match on header line
this is the matched data line
no match on second data line
no match on last line

例2
$ echo "this,is,a,test,to,remove,commas."|sed -n '{
>:start
>s/,/ /1p
>b start
}'
#当无需替换时，测试命令不会跳转而是继续执行剩下的脚本

8>替代模式/后向引用（&）

&符号用来代表替换命令中的匹配模式。当要匹配字符串的其中一部分内容时，可用圆括号定义模式中的子模式，用\1、\2等来代表调用子模式。

例1
$ echo 'the cat sleeps in his hat'|sed 's/.at/"&"/g'
the "cat" sleeps in his "hat"

例2
$ echo 'the system administrator manual'|sed '
>s/\(system\) administrator/\1 user/'
the system user manual

例3
$ echo '1234567'|sed '{
>:start
>s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/
>t start
>}'
1,234,567

保持空间和模式空间

模式空间：就是sed编辑器存放待处理文本的缓冲区。sed会每次读取一行文本，并放入模式空间，并执行编辑命令。

保持空间：临时保存一些行的缓冲区。

有五条命令可用来操作保持空间：

h	将模式空间复制到保持空间（会覆盖）
H	将模式空间追加到保持空间
g	将保持空间复制到模式空间
G	将保持空间追加到模式空间
x	交换模式空间和保持空间的内容

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.
$ sed -n '{1!G;h;$p}' data2.txt
this is the last line.
this is the second data line.
this is the first data line.
this is the header line.

Sed进阶篇实例应用

示例文本：

cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.


实例：
例1：加倍行间距，每行后面加一行空白行；
$ sed 'G' data2.txt
this is the header line.

this is the first data line.

this is the second data line.

this is the last line.

$ sed '$!G' data2.txt      #每行后面加一空白行，最后一行不加
this is the header line.

this is the first data line.

this is the second data line.

this is the last line.
$

例2：给文件中的行编号；
$ sed '=' data2.txt | sed 'N;s/\n'/ /'
1 this is the header line.
2 this is the first data line.
3 this is the second data line.
4 this is the last line.
#如果有空白行的话，需要注意先去除空白行

例3：打印末尾10行
$ sed '{
> :start
> $q;N;11,$D
> b start
> }' data7.txt

例4：删除空白行
$ sed '/./,/^$/!d' data8.txt     #删除连续空白行；
$ sed '/./,$!d' data9.txt        #删除开头n行空白行；
$ sed '{                         #删除末尾空白行；
> :start
> /^\n*$/{$d;N;b start}
> }'

gawk命令进阶

1>内建变量

FIELDWIDTHS	由空格分隔的一系列数字，定义了每个数据字段的确切宽度
FS	输入字段分隔符，默认是空格符
RS	输入记录分隔符，每行是一条记录，默认是换行符
OFS	输出字段分隔符，默认是空格符
ORS	输出记录分隔符，每行是一条记录，默认是换行符

数据变量：

NF	字段总数
NR	已处理的行数
ENVIRON	关联数组组成的shell环境变量值，格式如下 print ENVIRON["HOME"]

例1：将逗号分隔符替换为“-”，并输出前三项
$ cat data1
data11,data12,data13,data14,data15
data21,data22,data23,data24,data25
data31,data32,data33,data34,data35

$ gawk 'BEGIN{FS=",";OFS="-"} {print $1,$2,$3}' data1
data11-data12-data13
data21-data22-data23
data31-data32-data33

例2：将每行当作一个字段，把空白行当作记录分隔符
$ cat data2
riley mullen
123 main street
chicago,il 60601
555-1234

frank williams
456 oak street
indianapolis ,in 46201
5559876

haley snell
4231 elm street
detroit,mi 48201
555-4938

$ gawk 'BEGIN{FS="\n";RS=""} {print $1,$4}' data2
riley mullen 555-1234
frank williams 555-9876
haley snell 555-4938

例3：按固定宽度分隔字段
$ cat data1b
1005.3247596.37
115-2.349194.00
05810.1298100.1

$ gawk 'BEGIN{FIELDWIDTHS="3 5 2 5"}{print $1,$2,$3,$4}' data1b
100 5.324 75 96.37
115 -2.34 91 94.00
058 10.12 98 100.1

例4：在脚本中使用变量赋值
$ gawk '
> BEGIN{
> testing="this is a test"
> print testing
> testing=46
> print testing
> }'

$ this a testing
$ 45

遍历关联数组
$ gawk 'BEGIN{
> var[a]=1
> var[g]=2
> var[u]=3
> for (test in var)
> {
> print "index:",test," "- Value:",var[test]
> }
> }'

index: u  - Value: 3
index: a  - Value: 1
index: g  - Value: 2

删除数组元素值：
delete array[index]

匹配限定
$ gawk -F: '$1 ~ /rich/{print $1,$NF}' data1
rich /bin/bash
#这个例子会在第一个数据字段中查找文本rich。如果在记录中找到了这个模式，它会打印
该记录的第一个和最后一个数据字段值；

数学表达式应用
$ gawk -F: '$4 == 0 {print $1}' /etc/passwd
root
sync
shutdown
halt
operator

结构化命令，if语句
if (condition)
statement1
也可以将它放在一行上，像这样：
if (condition) statement1

$ gawk '{if ($1 > 20) print $1}' data4
$ gawk '{
> if ($1 > 20)
> {
> x=$1 * 2
> print x
> }
> }' data4


$ gawk '{
> if ($1 > 20)
> {
> x = $1 * 2
> print x
> } else
> {
> x = $1 / 2
> print x
> }}' data4

可以在单行上使用else子句，但必须在if语句部分之后使用分号。
if (condition) statement1; else statement2


结构化命令，while语句
while (condition)
{
statements
}

$ gawk '{
> total = 0
> i = 1
> while (i < 4)
> {
> total += $i
> i++
> }
> avg = total / 3
> print "Average:",avg
> }' data5


$ gawk '{
> total = 0
> i = 1
> while (i < 4)
> {
> total += $i
> if (i == 2)
> break
> i++
> }
> avg = total / 2
> print "The average of the first two data elements is:",avg
> }' data5

结构化命令，do-while语句类似于while语句，但会在检查条件语句之前执行命令。下面是do-while语
句的格式。
do
{
statements
} while (condition)
这种格式保证了语句会在条件被求值之前至少执行一次。当需要在求值条件前执行语句时，
这个特性非常方便。
$ gawk '{
> total = 0
> i = 1
> do
> {
> total += $i
> i++
> } while (total < 150)
> print total }' data5



for语句是许多编程语言执行循环的常见方法。gawk编程语言支持C风格的for循环。
for( variable assignment; condition; iteration process)
将多个功能合并到一个语句有助于简化循环。
$ gawk '{
> total = 0
> for (i = 1; i < 4; i++)
> {
> total += $i
> }
> avg = total / 3
> print "Average:",avg
> }' data5

格式化输出，printf，和c语言用法一样
c 将一个数作为ASCII字符显示
d 显示一个整数值
i 显示一个整数值（跟d一样）
e 用科学计数法显示一个数
f 显示一个浮点值
g 用科学计数法或浮点数显示（选择较短的格式）
o 显示一个八进制值
s 显示一个文本字符串
x 显示一个十六进制值
X 显示一个十六进制值，但用大写字母A~F


注意，你需要在printf命令的末尾手动添加换行符来生成新行。没添加的话，printf命令
会继续在同一行打印后续输出。

printf默认输出是右对齐，可通过“-”来控制成左对齐：
$ gawk 'BEGIN{FS="\n"; RS=""} {printf "%-16s %s\n", $1, $4}' data2
Riley Mullen (312)555-1234
Frank Williams (317)555-9876
Haley Snell (313)555-4938
$

处理浮点数：
$ gawk '{
> total = 0
> for (i = 1; i < 4; i++)
> {
> total += $i
> }
> avg = total / 3
> printf "Average: %5.1f\n",avg
> }' data5
Average: 128.3
Average: 137.7
Average: 176.7
$

#打印test.txt的第3行至第5行
awk 'NR==3,NR==5 {print}' test.txt
#打印test.txt的第3行至第5行的第一列与最后一列
awk 'NR==3,NR==5 {print $1,$NF}' test.txt

#打印test.txt中长度大于80的行号
awk 'length($0)>80 {print NR}' test.txt

#计算test.txt中第一列的总和
cat test.txt|awk '{sum+=$1}END{print sum}'

#添加自定义字符
ifconfig eth0|grep 'Bcast'|awk '{print "ip_" $2}'


#格式化输出
awk -F: '{printf "% -12s % -6s % -8s\n",$1,$2,$NF}' /etc/passwd

注：本章内容为读书笔记，摘自《Linux命令行与shell脚本编程大全》第3版。