If you mostly work on the command line and deal with a lot of text files every day, then you should know about the uniq command. This command will help you easily find duplicate lines from files. Not only is it used to find duplicates, but we can also use it to remove duplicates, show the number of occurrences of duplicates, show only duplicate lines, show only unique lines, etc. Since the uniq command is part of the GNU coreutils package, it comes preinstalled on most Linux distributions, so we don't need to bother installing it. Let's look at some practical examples. |
Note that uniq will not remove duplicate lines unless they are adjacent. So you may need to sort them first, or combine the sort command with uniq to get the result. Let me show you some examples.
First, let's create a file with some duplicate lines:
vi ostechnix.txt welcome to ostechnix welcome to ostechnix Linus is the creator of Linux. Linux is secure by default Linus is the creator of Linux. Top 500 super computers are powered by Linux
As you can see in the above file, we have some duplicated lines (the first and second, third and fifth are duplicates).
If you use the uniq command without any arguments, it will remove all consecutive duplicate lines and show only the unique lines.
uniq ostechnix.txt
Sample output:
As you can see, the uniq command removes all consecutive duplicate lines in a given file. You may also notice that the output above still has the second and fourth lines repeated. This is because the uniq command deletes duplicate lines only if they are adjacent, of course, we can also delete non-consecutive duplicate lines. See the second example below.
sort ostechnix.txt | uniq
Sample output:
看到了吗?没有重复的行。换句话说,上面的命令将显示在 ostechnix.txt 中只出现一次的行。我们使用 sort 命令与 uniq 命令结合,因为,就像我提到的,除非重复行是相邻的,否则 uniq 不会删除它们。
为了只显示文件中唯一的一行,可以这样做:
sort ostechnix.txt | uniq -u
示例输出:
Linux is secure by default Top 500 super computers are powered by Linux
如你所见,在给定的文件中只有两行是唯一的。
同样的,我们也可以显示文件中重复的行,就像下面这样:
sort ostechnix.txt | uniq -d
示例输出:
Linus is the creator of Linux. welcome to ostechnix
这两行在 ostechnix.txt 文件中是重复的行。请注意 -d(小写 d) 将会只打印重复的行,每组显示一个。打印所有重复的行,使用 -D(大写 D),如下所示:
sort ostechnix.txt | uniq -D
在下面的截图中看两个选项的区别:
由于某种原因,你可能想要检查给定文件中每一行重复出现的次数。要做到这一点,使用 -c 选项,如下所示:
sort ostechnix.txt | uniq -c
示例输出:
Linus is the creator of Linux. Linux is secure by default Top 500 super computers are powered by Linux welcome to ostechnix
我们还可以按照每一行的出现次数进行排序,然后显示,如下所示:
sort ostechnix.txt | uniq -c | sort -nr
示例输出:
welcome to ostechnix Linus is the creator of Linux. Top 500 super computers are powered by Linux Linux is secure by default
We can use the -w option to limit the comparison to a certain number of characters in the file. For example, let's compare the first four characters in a file and display duplicate lines like this:
uniq -d -w 4 ostechnix.txt
Like limiting the comparison to the first N characters of a line in a file, we can also use the -s option to ignore comparing the first N characters.
The following command will ignore the first four characters of each line in the file for comparison:
uniq -d -s 4 ostechnix.txt
To ignore comparing the first N fields (LCTT Annotation: the first few columns) instead of characters, use the -f option in the above command.
For more details, please refer to the help section:
uniq --help
You can also use the man command to view:
man uniq
Be here today! I hope you now have a basic understanding of the uniq command and its purpose. If you find our guide useful, please share it on your social networks and continue to support us. More good things are coming, stay tuned
Address of this article: https://www.linuxprobe.com/linux-uniq-student.html