linux shell 命令笔记之如何去除文献引用的序号【正则表达式】

写在前面：我知道可以导出表格，只是我拿到手的时候就已经是这样了，几百篇重新返回去查不现实ok
即，将
[1]A. Ben Hameda,S. Elosta,J. Havel. Optimization of the capillary zone electrophoresis method for Huperzine A determination using experimental design and artificial neural networks[J]. Elsevier B.V.,2004,1084(1).
批量转变为
A. Ben Hameda,S. Elosta,J. Havel. Optimization of the capillary zone electrophoresis method for Huperzine A determination using experimental design and artificial neural networks[J]. Elsevier B.V.,2004,1084(1).

首先查找[1]…[999],使用的命令有以下几种：

grep：只能显示所在行数

grep “[[0-9]{1,}]” x.txt
查找“[至少一位数字]”
grep “[[0-9]{1,3}]” x.txt
查找“[一位到三位数字]”
grep “[[0-9]*]”
*查找“[任意数字]”这个最简单

cut：选择并输出

cut -d ‘]’ -f 2,3 x.txt >>y.txt
按]分割并提取后面内容到y.txt

以下代码将list.txt中的全部title提取到title.txt中；将去除序号的引文提取到rmnum.txt

#!/bin/bash
#提取出list.txt中全部title
if [ -a "./list.txt" ];then
cut -d ']' -f 2,3 list.txt>>rmnum.txt
fi
if [ -a "./rmnum.txt" ];then
cut -d "." -f 2,3,4,5,6 rmnum.txt > title_mid.txt
fi
if [ -a "./title_mid.txt" ];then
cut -d "[" -f 1 title_mid.txt> title.txt
print "GOOD JOB!"
fi

更改后代码：
以下代码将$1中的全部title提取到$2中；将去除序号的引文提取到rmnum.txt

#!/bin/bash
#提取出全部title
if [ -a "./$1" ] ;then
cut -d ']' -f 2,3 $1 > rmnum.txt
fi
if [ -a "./rmnum.txt" ] ;then
cut -d "." -f 2,3,4,5,6 rmnum.txt > title_mid.txt
fi
if [ -a "./title_mid.txt" ] ;then
cut -d "[" -f 1 title_mid.txt> $2
echo "GOOD JOB!"
fi

去除首行空格
cat x.txt | sed ‘s/¹*//g’ >>y.txt

去除末行
cat x.txt | sed ‘s/[.]*$//g’ >>y.txt

大小写替换
tr “[:upper:]” “[:lower:]” < x.txt > y.txt

合并重复项
sort ./x1.txt ./x2.txt | uniq -u > y.txt

——————————————————
以上放弃了，因为很多人名里加了.，这让人名与title分离造成了很大困难
所以讲人名和文章连在一起：

#!/bin/bash
#提取出全部title

tr "[:upper:]" "[:lower:]" < $1 > $1.x
if [ -a "$1.x" ] ;then
cut -d ']' -f 2,3 $1.x > 1.x
cat 1.x | sed 's/^[ \t]*//g' > 2.x
cat 2.x | sed 's/[.]*$//g' > 3.x

sed -i '/^\s*$/d' 3.x

cut -d "[" -f 2 3.x | grep -o [12][890][0-9][0-9] > 4.x
cut -d "[" -f 1 3.x  > 5.x
paste 4.x 5.x > 6.x

sort 6.x | uniq -u >>all
sort 6.x | uniq -d >>all
sort all | uniq -u > allnew
sort all | uniq -d >> allnew
sort allnew | uniq -u > all
sort allnew | uniq -d >> all
echo "success"
fi

\t ↩︎

linux shell 命令笔记之如何去除文献引用的序号【正则表达式】

linux shell 命令笔记之如何去除文献引用的序号【正则表达式】

猜你喜欢