android: java 、python正则表达式的区别

在python java编程语言间来回切换，怎样使在 python中运行良好的正则表达式移植到java中，反之依然。

python的正则表达式有match/search之分
Python根据正则表达式提供两种不同的基本操作：match只在字符串的开始确认一个匹配，而search在字符串的任何匹配的位置都确认.

Regular expressions module of Python is called re,

The package for regular expressions in Java is called java.util.regex,

Package java.util.regex introduces one interface, two classes and one exception. A regular expression itself is represented with a Pattern class. Actual matching is performed by Matcher instance, so it is called an engine. The result of a match operation is represented by a MatchResult interface. And finally if your regular expression violates allowed syntax you will get PatternSyntaxException.

1】从名字上看java 中没有python 中search函数，但java中的matcher不是python中的match.[这个比较容易混淆的地方]

2】正则表达式中的转义字符，Java中是\\d, 而python 中只是\d

3】有关unicode，Java中默认是unicode 而python2.7中如果包含中文字符要写成 u"xxxx"

当然python实现的java也能实现，大概的对应关系如下：一般都是用python中的search
Java's matcher.find() and Python's re.search( regex, input ) match any part of the string.
Java's matcher.lookingAt() and Python's re.match( regex, input ) match the beginning of the string.

java中的两种使用方法：

boolean result = Pattern.matches("a*b", "aaaaab"); //第一个参数是正则表达式，第二个参数是要处理的字符串。

Pattern pattern = Pattern.compile("a*b");
Matcher matcher = pattern.matcher("aaaaab");
boolean result = matcher.matches();

1.1 正则表达式(Pattern.compile) -> pattern
Pattern pattern = Pattern.compile("a*b");
1.2 被处理字符串(pattern.matcher) -> mather
Matcher matcher = pattern.matcher("aaaaab");
1.3 matcher的处理函数
boolean result = matcher.matches();
boolean result = matcher.find();

1.3.2 怎样知道匹配的具体内容？
Matcher.group(0/1/2) 0:原文

最后还是看下具体的例子：

String testText = "8个文件";
Pattern pat = Pattern.compile("(\\d+)([\u4E00-\u9FA5]{3})");//可以是unicode编码也可直接汉字
Matcher mat = pat.matcher(testText);
if (mat.find()) {
    Log.i(TAG, "how many files: " + mat.group(1));
    Log.i(TAG, "what?: " + mat.group(2));
}
//匹配到汉字、阿拉伯数字、字母及一些标号的前半部如{（【等
Pattern p1 = Pattern.compile("([\u4E00-\u9FD5\\d\\w\u201c\uff08\u3010\u300a\\[(].*)");
Matcher m1 = p1.matcher("。新华社报道");
if (m1.find()) {
    Log.i(TAG, "去掉句子开头标点" + m1.group(1));
}
m1 = p1.matcher("[新华社报道");
if (m1.find()) {
    Log.i(TAG, "去掉句子开头标点[" + m1.group(1));
}
//去掉开头的序列号如１、1)等
Pattern p2 = Pattern.compile("(^[\uff08(]?[\uff10-\uff19\u4e00\u4e8c\u4e09\u56db\u4e94\u516d\u4e03\u516b\u4e5d\u53410-9]{1,2}[)\uff09\uff1a.\u3001])(.*)");
Matcher m2 = p2.matcher("（一）今天天气");
if (m2.find()) {
    Log.i(TAG, "去掉句子开头标点(1) " + m2.group(2));
}

//字符串的函数中也支持正则表达式
String str = "\n今天";
if (str.contains("\n")) {
    Log.i(TAG, "string include 换行符");
}

参考：

https://javastring.wordpress.com/2013/10/19/java-vs-python-regular-expressions/

android: java 、python正则表达式的区别

猜你喜欢