Python regular expression to extract/filter numbers in a string

Article Directory

background

  • Training the text classification model requires preprocessing of numbers and special symbols in the text

Ideas

1 Since it is to extract numbers, the form of numbers is generally: integer, decimal, integer plus decimal;

2 So it is generally in the form: ----.-----;

3 According to the meaning of the above regular expression, the following expression can be written: "\d+.?\d*";

4 \d+ matches one or more digits. Note that you should not write * here, because even if it is a decimal, there must be a digit before the decimal point; .? This matches the decimal point, which may or may not; \d*This is Match the number after the decimal point, so it is 0 or more

code

# -*- coding: cp936 -*-
import re
 
string="A1.45,b5,6.45,8.82"
print(re.findall(r"\d+\.?\d*",string))  # 查找
# ['1.45', '5', '6.45', '8.82']
res = re.sub(r"\d+\.?\d*", "", string)  # 过滤
  • Other similar:
  • Such as filtering Chinese and English punctuation and special symbols
  • Filter special symbols such as line breaks
# 替换 空格 \t \r \n
import re
 
str1='123  456  7\t8\r9\n10'
str1 = re.sub('[\s+]', '', str1)
print(str1)

Guess you like

Origin blog.csdn.net/m0_38024592/article/details/113667274