跟散仙学shell编程(九)

上篇散仙说了如何在linux里面构建更好的交互式shell，本篇来看下linux里面的文本处理工具sed和gawk，在linux里面最常用的就是这两个命令。

sed编辑器是一个流编辑器，与vim交互式的编辑器不同，sed编辑器需要提前提供一组规则来编辑数据流。
sed的命令格式如下sed options script file

(1) -e script 在处理输入时，将script中指定的命令添加到运行的命令中
(2) -f file 在处理输入时，将file中指定的命令添加到运行的命令中
(3) -n 不要为每个命令生成输出，等待print命令来输出

[search@h1 819]$ echo "this is a test" | sed 's/test/big test/'
this is a big test
[search@h1 819]$

在上例中，s命令，会用斜线间指定的第二个文本字符串来替换第一个文本字符串，下面看下如何对一个文件修改替换：

[search@h1 819]$ cat abc.txt 


this is cat
this is a cat 
this is a big cat
this is cat
[search@h1 819]$ sed 's/cat/dog/' abc.txt 


this is dog
this is a dog 
this is a big dog
this is dog
[search@h1 819]$

sed编辑器，并不会修改原来的文件，只会将修改后的数据发送到STDOUT，如果你查看原来的文件，会发现原来的数据还存在。

下面在来看下如何在sed里面使用多个sed命令：

[search@h1 819]$ cat abc.txt 


this is cat
this is a cat 
this is a big cat
this is cat
[search@h1 819]$ sed -e 's/ is/ are/; s/cat/dog/' abc.txt 


this are dog
this are a dog 
this are a big dog
this are dog
[search@h1 819]$

注意is和are的空格，有时候，不生效，可以加个空格测试！

我们也可以将脚本，放在一个文件里，来使用：

[search@h1 819]$ cat abc.txt 


this is cat
this is a cat 
this is a big cat
this is cat
[search@h1 819]$ cat script 
s/cat/dog/
s/ is/ are/
[search@h1 819]$ sed -f script abc.txt 


this are dog
this are a dog 
this are a big dog
this are dog
[search@h1 819]$

下面介绍下gawk，sed有自身的限制，所以gawk可以很好弥补这个缺点，gawk是Unix中原始的awk程序的GNU版本，gawk让流编辑迈上了一个新的台阶，它提供了一种编程语言，而不只是编辑命令，在gawk中，你可以：
（1）定义变量保存数据
（2）使用算术和字符串操作符来处理数据
（3）使用结构化编程概念，比如if-then语句和循环，来数据处理，增加逻辑
（4）提取数据文件中的数据元素，并将他们按另一顺序重新放置，从而生成格式化报告

gawk的命令格式：
gawk options program file

1, -F fs 指定行中分隔数据字段分隔符
2，-f file 指定读取的文件名
3,-v var=value 定义gawk程序中的一个变量和默认值
4，-mf N 指定要处理数据文件中的最大字段数
5，-mr N 指定数据文件中最大数据行数
6， -W keyword 指定gawk的兼容模式或警告等级

gawk程序脚本用一对花括号，来定义，你必须将脚本命令放在两个括号里，由于命令行假设脚本是单个字符串，所以你必须将脚本放在单引号里面:

[search@h1 819]$ gawk '{ print "我是第一个gawk程序！ "}'
a
我是第一个gawk程序！ 
a
我是第一个gawk程序！ 
a
我是第一个gawk程序

当写完这个脚本时，直接回车运行，你会失望，因为你没有指定任何文件，默认是从控制台读入数据的，只有你输入一行数据，按回车，它才会打印，退出可以使用Ctrl+D命令来退出！

gawk的数据字符变量，默认情况下：

$0代表整个文本行
$1代表文本行里面的第一个字段
$2代表文本行里面的第二个字段
$n代表文本行里面的第n个字段

当它在读取以行文本时，默认的分隔符，是任意的空白字符，也就是空格：

[search@h1 819]$ cat abc.txt 


this is cat
this is a cat 
this is a big cat
this is cat
[search@h1 819]$ gawk  ' { print $1 }' abc.txt 


this
this
this
this
[search@h1 819]$

下面看下指定分隔符的例子：

[search@h1 819]$ gawk -F: ' { print $1}    '  /etc/passwd
root
bin
daemon
adm
lp
sync
shutdown
halt
mail
uucp
operator
games
gopher
ftp
nobody
vcsa
saslauth
postfix
sshd
mysql
search
[search@h1 819]$

下面看下如何使用多个命令：

[search@h1 819]$ echo "my name is solr"  | gawk  '{  $4="hadoop"; print $0  }'
my name is hadoop
[search@h1 819]$

下面看下，如何将gawk程序，存储在文件里，并从文件里执行：

[search@h1 819]$ cat script2 
{ print $1  "'s 目录是 " $6  }
[search@h1 819]$ gawk -F: -f script2  /etc/passwd
root's 目录是 /root
bin's 目录是 /bin
daemon's 目录是 /sbin
adm's 目录是 /var/adm
lp's 目录是 /var/spool/lpd
sync's 目录是 /sbin
shutdown's 目录是 /sbin
halt's 目录是 /sbin
mail's 目录是 /var/spool/mail
uucp's 目录是 /var/spool/uucp
operator's 目录是 /root
games's 目录是 /usr/games
gopher's 目录是 /var/gopher
ftp's 目录是 /var/ftp
nobody's 目录是 /
vcsa's 目录是 /dev
saslauth's 目录是 /var/empty/saslauth
postfix's 目录是 /var/spool/postfix
sshd's 目录是 /var/empty/sshd
mysql's 目录是 /var/lib/mysql
search's 目录是 /home/search
[search@h1 819]$

也可以在程序里面指定多个命令，如果这样，只需要将每个命令放新的行即可：

[search@h1 819]$ cat s3 
{
text = "的目录是 "
print $1 text $6


}
[search@h1 819]$ gawk  -F: -f s3  /etc/passwd
root的目录是 /root
bin的目录是 /bin
daemon的目录是 /sbin
adm的目录是 /var/adm
lp的目录是 /var/spool/lpd
sync的目录是 /sbin
shutdown的目录是 /sbin
halt的目录是 /sbin
mail的目录是 /var/spool/mail
uucp的目录是 /var/spool/uucp
operator的目录是 /root
games的目录是 /usr/games
gopher的目录是 /var/gopher
ftp的目录是 /var/ftp
nobody的目录是 /
vcsa的目录是 /dev
saslauth的目录是 /var/empty/saslauth
postfix的目录是 /var/spool/postfix
sshd的目录是 /var/empty/sshd
mysql的目录是 /var/lib/mysql
search的目录是 /home/search
[search@h1 819]$

在数据处理前，执行某个命令：

[search@h1 819]$ gawk  'BEGIN { print "你好，hadoop" }'
你好，hadoop
[search@h1 819]$

执行这个命令不需要等待，控制台输入

[search@h1 819]$ gawk  'BEGIN { print "开始读取了："  } {print $0 }  END { print "打印结束了"} '  abc.txt 
开始读取了：


this is cat
this is a cat 
this is a big cat
this is cat
打印结束了
[search@h1 819]$

多个命令之间，用大括号分开即可！

[search@h1 819]$ cat s
BEGIN {

print "我们要加个列头"
print "用户ID     shell "
print "-------      ------"
FS=":"
}


{

print $1 "      " $7

}

END {

print  "结束了....."

}
[search@h1 819]$ gawk  -f s /etc/passwd
我们要加个列头
用户ID     shell 
-------      ------
root      /bin/bash
bin      /sbin/nologin
daemon      /sbin/nologin
adm      /sbin/nologin
lp      /sbin/nologin
sync      /bin/sync
shutdown      /sbin/shutdown
halt      /sbin/halt
mail      /sbin/nologin
uucp      /sbin/nologin
operator      /sbin/nologin
games      /sbin/nologin
gopher      /sbin/nologin
ftp      /sbin/nologin
nobody      /sbin/nologin
vcsa      /sbin/nologin
saslauth      /sbin/nologin
postfix      /sbin/nologin
sshd      /sbin/nologin
mysql      /bin/bash
search      /bin/bash
结束了.....

上面简单的几个例子，gawk使用起来非常给力！

上面散仙简单介绍了sed 的命令，下面来看下sed更多的参数：
s/pattern/replacement/flags
最后的参数可以有如下：
数字代表替换第几处的地方
g代表全局替换所有的地方
p表示将内容打印出来
w file将替换结果写入一个文件

[search@h1 819]$ cat t.txt 
this name is name

this name is hadoop
[search@h1 819]$ sed 's/name/hadoop/2' t.txt 
this name is hadoop

this name is hadoop
[search@h1 819]$ sed 's/name/solr/2' t.txt       
this name is solr

this name is hadoop
[search@h1 819]$

全局替换：

[search@h1 819]$ cat t.txt 
this name is name

this name is hadoop
[search@h1 819]$ sed 's/name/solr/g' t.txt 
this solr is solr

this solr is hadoop
[search@h1 819]$

[search@h1 819]$ cat t.txt 
this name is name

this name is hadoop
[search@h1 819]$ sed 's/name/solr/p' t.txt 
this solr is name
this solr is name

this solr is hadoop
this solr is hadoop
[search@h1 819]$ sed -n  's/name/solr/p' t.txt 
this solr is name
this solr is hadoo

-n命令会禁止sed编辑输出，-p会替换标记输出修改的行

-w会将修改存入一个新文件：

[search@h1 819]$ sed   's/name/solr/w tt' t.txt    
this solr is name

this solr is hadoop
[search@h1 819]$ cat tt 
this solr is name
this solr is hadoop
[search@h1 819]$

如果遇到特殊的字符，则需要转义，使用/进行转义，这个在各种编程语言里都是这样。
在sed里面可以使用！符作为分隔，例如： /bin/bash!/bin/csh!

sed支持更灵活的字符操作：
可以使用行寻址：
例如： sed '2s/dog/cat/' data1 这个例子只会改变第二行的数据

sed '2,4s/dog/cat/' data1 这个代表一个范围
如果不知道到底多少行可以使用
sed '2,$s/dog/cat/' data1 代表以2开头，所有的行

下面看下sed 的删除命令：

sed 'd3' file 代表删除某个文件的第3行，如果什么也不加，则会删除所有
也可以指定范围删除sed '2,3d' file
或者某个范围到结尾删除 sed '3,$d' file

此外也可以在查找中删除：
sed '/number 1/d' file 代表删除number 1所在的行

下面看下插入和附加文本:
插入insert
追加append
插入

[search@h1 819]$ echo "1" | sed 'i\"one"'
"one"
1
[search@h1 819]$

追加命令：

[search@h1 819]$ echo "line2" | sed 'a\ line3'
line2
 line3
[search@h1 819]$

插入同样可以指定行数前面 :

sed '3i\some text' 会插入在第三行前面

sed '3a\some text' 会拼接在第三行后面

怎么才能直接追加到最后一行：

sed '$a\ some text '

下面看下修改行：
sed '3c\这是修改的行'

sed '3i\some text' 会插入在第三行前面

也可以在查询中修改

sed '/number 3/c\ some text '

也可以在地址区间修改 sed '2,3c\ some text '

除以之外，还有一个转换命令y：

[search@h1 819]$ echo "this 1 a test of 1 try" | sed 'y/123/456/'
this 4 a test of 4 try
[search@h1 819]$

另外，在sed中，p命令可以打印文本行
=号可以打印行号
l命令用来列行

[search@h1 819]$ echo "test a test" | sed 'p'
test a test
test a test
[search@h1 819]$ echo "test a test" | sed '='
1
test a test
[search@h1 819]$

-n命令可以用来禁止其他的行，只显示匹配上的行
sed -n '/number/p' file
sed -n '2,3p' file

查找含数字3的行，然后执行两条命令，打印当前行，然后替换在打印

 sed -n '/3{p s/ is/ test/p }' bb

l命令会列出行

[search@h1 819]$ sed -n 'l' bb
this is 3 txt$
this is 4 text$
this is a text$
this name hadoop$
[search@h1 819]$

sed -n '1,3w test' file 代表将文件file的1-3行的数据写入test文件里面，=n参数代表在写入过程中，不会将输出流在控制台显示

另外sed '3r data12' data7 代码从data7里面读入前行文件插入到data12里面

同样在查询时，也使用 sed '/number 2/r data12 ' data7

在文本末尾添加数据：
sed '$r data12' data7

跟散仙学shell编程(九)

猜你喜欢