Linux-Text processing three musketeers awk

awk overview

awk is spliced ​​from the first letters of the last names of its three founders (Alfred Aho, Peter Weinberger and Brian Kernighan).
awk, report generator, formatted text output, awk released by GNU/Linux is currently developed and maintained by the Free Software Foundation (FSF), and is usually also called GNU AWK

You can use "rpm -qi gawk" to view an official description of awk "The gawk package contains the GNU version of AWK text processing utility. AWK is a programming language designed for text processing and typically used as a dataextraction and reporting tool"

The key point is "AWK is a programming language..." Translation: "AWK is a programming language"... NB, it turns out to be a programming language, so it's so awesome. No wonder he is the leader of the Three Musketeers.

[root@rocky8 ~]# rpm -qi gawk
Name        : gawk
Version     : 4.2.1
Release     : 2.el8
Architecture: x86_64
Install Date: Sat 12 Nov 2022 05:24:52 PM CST
Group       : Unspecified
Size        : 2699078
License     : GPLv3+ and GPLv2+ and LGPLv2+ and BSD
Signature   : RSA/SHA256, Wed 19 May 2021 01:32:23 PM CST, Key ID 15af5dac6d745a60
Source RPM  : gawk-4.2.1-2.el8.src.rpm
Build Date  : Wed 19 May 2021 10:35:13 AM CST
Build Host  : ord1-prod-x86build001.svc.aws.rockylinux.org
Relocations : (not relocatable)
Packager    : [email protected]
Vendor      : Rocky
URL         : https://www.gnu.org/software/gawk/
Summary     : The GNU version of the AWK text processing utility
Description :
The gawk package contains the GNU version of AWK text processing utility. 
AWK is a programming language designed for text processing and typically used as a dataextraction and reporting tool.

The gawk utility can be used to do quick and easy text pattern matching,extracting or reformatting. 
It is considered to be a standard Linux tool for text processing.

Basic use of awk

awk [options] 'program' var= value file....
awk [options]  -f programfile var= value file... 

program格式:  pattern{
    
    action statements;...}
   pattern:决定动作语句何时触发事件,比如:BEGIN,END,正则表达式等
   action statements:对数据进行处理,放在{
    
    }内指明,常见:print,printf

   program通常是放在单引号中,并可以由三种部分组成(EGNIN语句块,模式匹配的通用语句块,END语句块)

   常用选项
     -F "分隔符"指明输入时用到的字段分隔符,默认的分隔符是若干个连续空白符
     -v var=value 变量赋值
      

Insert image description here
From the above example, we can see that "awk '{print "hello world"}'" is a standard input command, so you can use standard input redirection
Insert image description here

awk中打印字符串一定要使用""双引号,
其实java也一样,双引号括起来的都是字符串,否则awk视之为变量。

Insert image description here

[root@rocky8 ~]# awk --help
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:		GNU long options: (standard)
	-f progfile		--file=progfile
	-F fs			--field-separator=fs
	-v var=val		--assign=var=val
Short options:		GNU long options: (extensions)
	-b			--characters-as-bytes
	-c			--traditional
	-C			--copyright
	-d[file]		--dump-variables[=file]
	-D[file]		--debug[=file]
	-e 'program-text'	--source='program-text'
	-E file			--exec=file
	-g			--gen-pot
	-h			--help
	-i includefile		--include=includefile
	-l library		--load=library
	-L[fatal|invalid]	--lint[=fatal|invalid]
	-M			--bignum
	-N			--use-lc-numeric
	-n			--non-decimal-data
	-o[file]		--pretty-print[=file]
	-O			--optimize
	-p[file]		--profile[=file]
	-P			--posix
	-r			--re-interval
	-s			--no-optimize
	-S			--sandbox
	-t			--lint-old
	-V			--version

delimiter in awk default text

查看磁盘利用率时,df命令格式化输出了6列信息,可以看出这6列信息使用空格隔开。
awk则可以自动识别以空格或tab键分割的列,
awk中使用$1,$2,$3...$5..$n等表示第几列

Insert image description here
Insert image description here

Take out the username and uid in /etc/passwd

Insert image description here
Insert image description here

Count the top IPs that appear most frequently

Insert image description here
Use awk to filter

awk '{print $1}' /data/nginx.access.log-20230721 |sort |uniq -c|sort -nr|head -3

Insert image description here
Filter using cut

cut -d" " -f1  /data/nginx.access.log-20230721 |sort |uniq -c|sort -nr|head -3

Insert image description here

Get the number of partition utilization

Insert image description here
Insert image description here

Extract the ip address in the network card configuration file

ifconfig ens160 | sed -n '2p' | awk '{print $2}'

Insert image description here

awk common built-in variables

Variables in awk are divided into: built-in and custom variables

awk内置变量可以使用man帮助查看使用说明
   摁“/”开始搜索关键字,
   摁“n(小写)”查看下一个匹配,
   摁“N(大写)”查看上一个匹配)
[root@rocky8 ~]# man awk

Insert image description here

FS

The input field separator, a space by default. See Fields, above.

 FS    The input field separator, a space by default.  See Fields, above.

 FS:输入字段分隔符,默认为空白字符,功能相当于-F
 
	-F fs			--field-separator=fs

Select the first word in the /etc/passwd file

Insert image description here

awk -v FS=":" '{print $1}' /etc/passwd

Insert image description here

虽然"-F"也能达到"FS"一样的的效果,但是要知道这两者有着本质上的区别,
"FS"是变量,是变量就可以在不同的地方引用,而"-f"就不行。

/etc/passwd文件的内容中,使用“:”做分隔的较多,那么我在输出打印时,我可以继续引用"FS=":""这个变量作为我输出内容的格式分隔符。

Insert image description here
Insert image description here
Insert image description here

OFS

OFS         The output field separator, a space by default.
OFS:输出字段分隔符,默认空白字符。

Insert image description here
Insert image description here

RS

RS          The input record separator, by default a newline.
RS:输入记录record分隔符,指定输入时的换行符。

Use ";" semicolon as the delimiter to wrap the output content in new lines.
Insert image description here

awk -v RS=";" '{print}' test.txt

Insert image description here
Insert image description here

ORS

ORS         The output record separator, by default a newline.
ORS:输出记录分隔符,输出时用指定符号代替换行符

awk -v RS=";" -v FS="," -v ORS="->" '{print $1}' test.txt

Insert image description here

NF

 NF          The number of fields in the current input record.
 NF:当前输入记录中的字段数。

Insert image description here
There are 7 fields separated by ":", and "$NF" is the value of the last field.
Insert image description here

awk -F: '{print $NF}' /etc/passwd

Insert image description here

Get connection status information

netstat -nta | awk '{print $NF}'

Insert image description here

Get the penultimate field of the /etc/passwd file

Insert image description here
Insert image description here

Insert image description here

NR

 NR          The total number of input records seen so far.
 NR:输出记录的行号。

Insert image description here

FNR

FNR         The input record number in the current input file.
FNR:各文件分别计数,记录的行号。

Insert image description here

ARGC and ARGV

The number of output parameters and specific parameters
Insert image description here
Insert image description here
Insert image description here

pattern pattern

PATTERN:根据pattern条件,过滤匹配到的行,在做处理

1. If not specified: empty pattern, matches every line

  1.如果未指定:空模式,匹配每一行
     例如:awk -F: '{print $1,$3}' /etc/passwd

Insert image description here

2./regular expression/: Only processes lines that can match the pattern and needs to be enclosed in //

  2./regular expression/:仅处理能够模式匹配到的行,需要用//括起来
     例如:awk '/UUID/{print $1}' /etc/fstab
          awk '!/UUID/{print $1}' /etc/fstab    "!为取反"

Insert image description here
Insert image description here
Insert image description here

3.relational expression: Relational expression will be processed only if the result is "true"

 真:结果为非0值,非空字符串
 假:结果为0值或空字符串

Insert image description here
Insert image description here
Insert image description here
Insert image description here

4.line ranges: line ranges

不支持直接用行号,但可以使用变量NR间接指定行号
/part1/,/part2/   不支持直接给出数字格式

Insert image description here
Print the lines starting with b and ending with h in the /etc/passwd file
Insert image description here

Conditional judgment if-else

[root@rocky8 ~]# awk 'BEGIN{score=88;if(score>=80){print "good"}else if(score>=60){print "pass"}else{print "no pass"}}'
good
[root@rocky8 ~]# awk 'BEGIN{score=78;if(score>=80){print "good"}else if(score>=60){print "pass"}else{print "no pass"}}'
pass
[root@rocky8 ~]# awk 'BEGIN{score=58;if(score>=80){print "good"}else if(score>=60){print "pass"}else{print "no pass"}}'
no pass
[root@rocky8 ~]# 


[root@rocky8 ~]# cat score.txt 
name       score
zhangsan   100
lisi       90
zhaoliu    70
wangwu     55
[root@rocky8 ~]# awk 'NR!=1{score=$2;if(score>=80){print $1,"good"}else if(score>=60){print $1, "pass"}else{print $1, "no pass"}}' score.txt
zhangsan good
lisi good
zhaoliu pass
wangwu no pass
[root@rocky8 ~]# 

while loop

[root@rocky8 ~]# awk 'BEGIN{i=1;sum=0;while(i<=100){sum+=i;i++};print sum}'
5050
[root@rocky8 ~]# sum=0;for((i=1;i<=100;i++));do let sum+=i;done;echo $sum
5050
[root@rocky8 ~]# 

Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/flytalei/article/details/131852439