正则替换group(n)内容

正则替换指定内容本来应该是一件挺容易的事情，但由于某些原因，替换指定group的内容得自己实现。

先设定一个需求，把下面字符串第1个的01换成1234，第2个01换成2345，当然也可能会有更多的01或者其他字符串：

		String hex = "00 00 00 01 00 01";
		String regex = "[0-9a-zA-Z\\s]{6}[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})\\s[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})";

正则中的小括号即为提取参数，目的就是要将这些参数替换为其他内容。

探究API

Java的String类虽然可以使用replaceAll/replaceFirst正则替换内容，但那是全局的，针对整个字符串的；

//String.java
    public String replaceAll(String regex, String replacement) {
    
    
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }
    
    public String replaceFirst(String regex, String replacement) {
    
    
        return Pattern.compile(regex).matcher(this).replaceFirst(replacement);
    }

而Matcher类中的appendReplacement/appendTail（其实上文中String类的两个方法也是Matcher类中），也无济于事；

    public Matcher appendReplacement(StringBuffer sb, String replacement)
    public StringBuffer appendTail(StringBuffer sb)

前者appendReplacement适用于差异性替换，也就是用于匹配的正则不会匹配到其他内容，否则就会像这样:

		String hex = "00 00 00 01 00 01";
		String regex1 = "[0-9a-zA-Z]{2}";
		Pattern pattern = Pattern.compile(regex1);
		Matcher matcher = pattern.matcher(hex);

		StringBuffer sb = new StringBuffer();
		while (matcher.find()){
    
    
			matcher.appendReplacement(sb, "1234");
		}
		System.out.println(sb.toString());

输出：

1234 1234 1234 1234 1234 1234

把符合条件的字符替换了，但这里明显不能这么做；
后者appendTail只会将最后一次匹配的内容添加到StringBuffer中。

所以在API本身并没有找到适合的方法，就只能自行实现了。

取得索引

要替换内容，首先得知道需要替换的原内容的位置索引，然而这个索引位置从哪来？Matcher是怎么用group(n)截取的字符串？

		String hex = "00 00 00 01 00 01";
		String regex = "[0-9a-zA-Z\\s]{6}[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})\\s[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})";

		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(hex);
		if (matcher.matches()) {
    
    
			int count = matcher.groupCount();
			for (int i = 1; i <= count; i++) {
    
    
				System.out.println(matcher.group(i));
			}
		}

输出：

01
01

不要问，问就是group(n)必有蹊跷；

    public String group(int group) {
    
    
        if (first < 0)
            throw new IllegalStateException("No match found");
        if (group < 0 || group > groupCount())
            throw new IndexOutOfBoundsException("No group " + group);
        if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
            return null;
        return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
    }

    CharSequence getSubSequence(int beginIndex, int endIndex) {
    
    
        return text.subSequence(beginIndex, endIndex);
    }

group内部也是在截取字符串，groups数组是什么东西？为何使用group*2就可以取到？

public final class Matcher implements MatchResult {
    
    

    /**
     * The storage used by groups. They may contain invalid values if
     * a group was skipped during the matching.
     */
    int[] groups;

	...//略
}

这是个不对外的属性，也没有get方法或其他方法能取得，只好试一下反射；

	/**
	 * 反射得到group所在索引
	 *
	 * @param clazz           Matcher类
	 * @param matcherInstance Matcher实例
	 * @return 索引数组
	 */
	public static int[] getOffsets(Class<Matcher> clazz, Object matcherInstance) {
    
    
		try {
    
    
			Field field = clazz.getDeclaredField("groups");
			field.setAccessible(true);

			return (int[]) field.get(matcherInstance);
		} catch (NoSuchFieldException | IllegalAccessException e) {
    
    
			e.printStackTrace();
		}
		return null;
	}

来测试一下：

		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(hex);
		matcher.matches();

		System.out.println(Arrays.toString(getOffsets(Matcher.class,matcher)));

输出：

[0, 17, 9, 11, 15, 17, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1]

根据我们对正则的普遍了解，可以得知第1组"0,17"实际就是group(0)也就是全匹配的内容，也就是全匹配时的起始索引和末尾索引；
那么"9,11"就是group(1)的起始索引和末尾索引了；
依此类推；

这样也就可以理解为什么要使用groups[group * 2], groups[group * 2 + 1]就可以用来截取字符串了。

显然，正则在匹配之后，已经将对应的边界索引记录到groups数组中了。
那岂不是…？

实现替换

索引一拿到，就是万事俱备了，只欠自行实现切割拼接字符串的“东风”了；
那么就有

	/**
	 * 替换对应group(n)的内容
	 *
	 * @param origin      原始字符串
	 * @param regex       全匹配正则，需要替换的内容加小括号提取参数
	 * @param groupIndice group索引
	 * @param content     最终要得到的内容数组
	 * @return 最终内容
	 */
	public static String replaceMatcherContent(String origin, String regex, int[] groupIndice, String... content) {
    
    
		if (groupIndice.length != content.length) {
    
    
			return origin;
		}
		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(origin);
		if (matcher.matches()) {
    
    
			int count = matcher.groupCount();
			String[] resSubArray = new String[count * 2 + 1];
			int[] offsets = getOffsets(Matcher.class, matcher);
			if (offsets == null) {
    
    
				return origin;
			}
			//分离出解析的内容
			int lastIndex = 0;
			for (int i = 1; i <= count; i++) {
    
    
				int startIndex = offsets[i * 2];
				int endIndex = offsets[i * 2 + 1];
				resSubArray[i * 2 - 2] = origin.substring(lastIndex, startIndex);
				resSubArray[i * 2 - 1] = origin.substring(startIndex, endIndex);
				lastIndex = endIndex;
			}
			resSubArray[count * 2] = origin.substring(lastIndex);

			//替换对应位置的内容
			for (int i = 0; i < groupIndice.length; i++) {
    
    
				resSubArray[groupIndice[i] * 2 - 1] = content[i];
			}

			//合并字符串
			StringBuilder sb = new StringBuilder();
			for (String sub : resSubArray) {
    
    
				sb.append(sub);
			}
			return sb.toString();
		}

		return origin;
	}

最终写到一个工具类中，然后再来测试：

	public static void main(String[] args) {
    
    
		String hex = "00 00 00 01 00 01";
		String regex = "[0-9a-zA-Z\\s]{6}[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})\\s[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})";

		System.out.println(TextUtil.replaceMatcherContent(hex, regex, new int[]{
    
    1, 2}, new String[]{
    
    "1234", "2345"}));
	}

输出：

00 00 00 1234 00 2345

成功。
或许还有一些小问题没有想到，但目前的基本思路是这样的。

代码链接可以点击这里。