linux shell 命令笔记之如何去除文献引用的序号【正则表达式】

linux shell 命令笔记之如何去除文献引用的序号【正则表达式】

写在前面:我知道可以导出表格,只是我拿到手的时候就已经是这样了,几百篇重新返回去查不现实ok
即,将
[1]A. Ben Hameda,S. Elosta,J. Havel. Optimization of the capillary zone electrophoresis method for Huperzine A determination using experimental design and artificial neural networks[J]. Elsevier B.V.,2004,1084(1).
批量转变为
A. Ben Hameda,S. Elosta,J. Havel. Optimization of the capillary zone electrophoresis method for Huperzine A determination using experimental design and artificial neural networks[J]. Elsevier B.V.,2004,1084(1).

首先查找[1]…[999],使用的命令有以下几种:

grep:只能显示所在行数

  1. grep “[[0-9]{1,}]” x.txt
    查找“[至少一位数字]”
  2. grep “[[0-9]{1,3}]” x.txt
    查找“[一位到三位数字]”
  3. grep “[[0-9]*]”
    *查找“[任意数字]”这个最简单

cut:选择并输出

  1. cut -d ‘]’ -f 2,3 x.txt >>y.txt
    按]分割并提取后面内容到y.txt

以下代码将list.txt中的全部title提取到title.txt中;将去除序号的引文提取到rmnum.txt

#!/bin/bash
#提取出list.txt中全部title
if [ -a "./list.txt" ];then
cut -d ']' -f 2,3 list.txt>>rmnum.txt
fi
if [ -a "./rmnum.txt" ];then
cut -d "." -f 2,3,4,5,6 rmnum.txt > title_mid.txt
fi
if [ -a "./title_mid.txt" ];then
cut -d "[" -f 1 title_mid.txt> title.txt
print "GOOD JOB!"
fi

更改后代码:
以下代码将$1中的全部title提取到$2中;将去除序号的引文提取到rmnum.txt

#!/bin/bash
#提取出全部title
if [ -a "./$1" ] ;then
cut -d ']' -f 2,3 $1 > rmnum.txt
fi
if [ -a "./rmnum.txt" ] ;then
cut -d "." -f 2,3,4,5,6 rmnum.txt > title_mid.txt
fi
if [ -a "./title_mid.txt" ] ;then
cut -d "[" -f 1 title_mid.txt> $2
echo "GOOD JOB!"
fi

去除首行空格
cat x.txt | sed ‘s/1*//g’ >>y.txt

去除末行
cat x.txt | sed ‘s/[.]*$//g’ >>y.txt

大小写替换
tr “[:upper:]” “[:lower:]” < x.txt > y.txt

合并重复项
sort ./x1.txt ./x2.txt | uniq -u > y.txt

——————————————————
以上放弃了,因为很多人名里加了.,这让人名与title分离造成了很大困难
所以讲人名和文章连在一起:

#!/bin/bash
#提取出全部title

tr "[:upper:]" "[:lower:]" < $1 > $1.x
if [ -a "$1.x" ] ;then
cut -d ']' -f 2,3 $1.x > 1.x
cat 1.x | sed 's/^[ \t]*//g' > 2.x
cat 2.x | sed 's/[.]*$//g' > 3.x

sed -i '/^\s*$/d' 3.x

cut -d "[" -f 2 3.x | grep -o [12][890][0-9][0-9] > 4.x
cut -d "[" -f 1 3.x  > 5.x
paste 4.x 5.x > 6.x

sort 6.x | uniq -u >>all
sort 6.x | uniq -d >>all
sort all | uniq -u > allnew
sort all | uniq -d >> allnew
sort allnew | uniq -u > all
sort allnew | uniq -d >> all
echo "success"
fi

  1. \t ↩︎

猜你喜欢

转载自blog.csdn.net/mushroom234/article/details/109019450