诡异的appendReplacement和replaceAll
?
一、起源
? ? 这段代码的作用是将字符串中${param}替换为map中的数据
private static String replaceVariantOldVersion(String str, Map<String,String> variantMap){Matcher m = Pattern.compile("\\$\\{.*?\\}").matcher(str);StringBuffer rtn = new StringBuffer();while(m.find()){String foundStr = m.group();String key = foundStr.substring("${".length(),foundStr.length()-1).trim();String value = variantMap.get(key);if(null == value){value = "{UNKNOWN_VALUE}";}m.appendReplacement(rtn, value);}m.appendTail(rtn);return rtn.toString();}
? ?例如
?
String str = "my name is ${name}";Map<String,String> map = new HashMap<String,String>();map.put("name", "lazy");System.out.println(replaceVariantOldVersion(str,map));//输出my name is lazy
?
?
但如果是
?
String str = "my name is ${name}";Map<String,String> map = new HashMap<String,String>();map.put("name", "$.lazy");System.out.println(replaceVariantOldVersion(str,map));?
则会抛出异常
?
Exception in thread "main" java.lang.IllegalArgumentException: Illegal group reference
at java.util.regex.Matcher.appendReplacement(Unknown Source)
at lgc.tools.struts2.ConventionUtil.replaceVariantOldVersion(ConventionUtil.java:241)
at lgc.tools.struts2.ConventionUtil.main(ConventionUtil.java:269)
二、 分析分析下appendReplacement源代码public Matcher appendReplacement(StringBuffer sb, String replacement) { // If no match, return error if (first < 0) throw new IllegalStateException("No match available"); // Process substitution string to replace group references with groups int cursor = 0; String s = replacement; StringBuffer result = new StringBuffer(); while (cursor < replacement.length()) { char nextChar = replacement.charAt(cursor); if (nextChar == '\\') { cursor++; nextChar = replacement.charAt(cursor); result.append(nextChar); cursor++; } else if (nextChar == '$') { // Skip past $ cursor++; // The first number is always a group int refNum = (int)replacement.charAt(cursor) - '0'; if ((refNum < 0)||(refNum > 9)) throw new IllegalArgumentException( "Illegal group reference"); cursor++; // Capture the largest legal group string boolean done = false; while (!done) { if (cursor >= replacement.length()) { break; } int nextDigit = replacement.charAt(cursor) - '0'; if ((nextDigit < 0)||(nextDigit > 9)) { // not a number break; } int newRefNum = (refNum * 10) + nextDigit; if (groupCount() < newRefNum) { done = true; } else { refNum = newRefNum; cursor++; } } // Append group if (group(refNum) != null) result.append(group(refNum)); } else { result.append(nextChar); cursor++; } } // Append the intervening text sb.append(getSubSequence(lastAppendPosition, first)); // Append the match substitution sb.append(result.toString()); lastAppendPosition = last;return this; }?原来第二个参数replacement可以使用$n来引用分组!所以‘$’和‘\’都被当做特殊字符处理!replacement中的字符串$1代表匹配的分组1!
什么是分组?看程序
Matcher m = Pattern.compile("((\\d\\d)(\\w))").matcher("11a22b");StringBuffer sb = new StringBuffer();while(m.find()){m.appendReplacement(sb, "$0,$1,$2,$3;");}m.appendTail(sb);System.out.println(sb);?输出的结果是11a,11a,11,a;22b,22b,22,b;正则表达式((\d\d)(\w))有三个分组,每个括号包含的内容称作一个分组,并按照左括号出现的顺序给每个分组给予编号1,2,3,...,编号为0的分组代表整个被匹配的字符串。例子程序中,“11a”被?((\d\d)(\w))匹配,$0是整个字符串,等于11a,此时sb是:11a$1是(\d\d)(\w)撇配的内容,等于11a,此时sb是11a,11a$2是(\d\d)匹配的内容,等于11,此时sb是11a,11a,11$3是(\w)匹配的内容,等于a,此时sb是11a,11a,11,a;相信大家到此应该大致明白分组的意义,如有问题请留言。三、结论回到正题,既然 appendReplacement(StringBuffer sb, String replacement)的replacement参数中'$'和'\',那么我就对这两个字符进行转义。
(其中'\'字符是被当做特殊字符处理是因为,java将\$当做普通的$进行处理,所以也'\'被当做了特殊字符。你可以尝试运行
String str = "my name is ${name}";Map<String,String> map = new HashMap<String,String>();map.put("name", "\\.lazy");System.out.println(replaceVariantOldVersion(str,map));同样会因为特殊字符而报错。)
最终,我修正后的代码是
private static String replaceVariant(String str, Map<String,String> variantMap){Matcher m = Pattern.compile("\\$\\{.*?\\}").matcher(str);StringBuffer rtn = new StringBuffer();while(m.find()){String foundStr = m.group();String key = foundStr.substring("${".length(),foundStr.length()-1).trim();String value = variantMap.get(key);if(null == value){value = "{UNKNOWN_VALUE}";}String valueReplacement = value.replaceAll("\\\\","\\\\\\\\").replaceAll("\\$", "\\\\\\$");m.appendReplacement(rtn, valueReplacement);}m.appendTail(rtn);return rtn.toString();}我们重点关注String valueReplacement = value.replaceAll("\\\\","\\\\\\\\").replaceAll("\\$", "\\\\\\$");
?其实就是将一个 \ 变成 \\ ,一个 $ 变成 \$
可能会有人问,replaceAll的第一个参数是正则表达式,所以需要4个\,为什么第二个参数也需要那么多个\?这也是我开始写代码是遇到的疑惑,实际上,replaceAll最终也是调用我们刚分析过的函数
public Matcher appendReplacement(StringBuffer sb, String replacement)replaceAll的第二个参数将会传到appendReplacement的replacement,所以
? ??replaceAll的第二个参数中的\和$也属于特殊字符串!
这也解释了,我们将文件路径分隔符/替换为\的时候,为什么需要那么多\\\\了。
String s = "E:/mydir/mydir2/mdir3";//System.out.println(s.replaceAll("/","\\"));报错System.out.println(s.replaceAll("/","\\\\"));//正确使用replaceAll的时候,我们需要记住的是,不论是第一个参数还是第二参数,4个\才代表自然字符串中的一个\。在这里我也期待java能推出自然字符串的语法,像python的r"\n"一样。
?
谢谢您宝贵的回复!
看了下源代码,跟我的想法一致,就是在$或\前加上一个\
public static String quoteReplacement(String s) { if ((s.indexOf('\\') == -1) && (s.indexOf('$') == -1)) return s; StringBuffer sb = new StringBuffer(); for (int i=0; i<s.length(); i++) { char c = s.charAt(i); if (c == '\\') { sb.append('\\'); sb.append('\\'); } else if (c == '$') { sb.append('\\'); sb.append('$'); } else { sb.append(c); } } return sb.toString();}