Chinese pinyin pinyin4j processing tool class

We often encounter some scenarios where we deal with strings, initials of Chinese characters, etc.:

  • For example, if you want to get the AZ of the customer's initials, the address book effect. "Zhao"->"Z"
  • For example, common insurance company abbreviations, bank abbreviations, etc. are all letter compression of Chinese names. You want to generate a certain abbreviation based on the Chinese name, "Program Life" -> "CXRS"
  • Chinese characters are converted into Hanyu Pinyin, and English characters remain unchanged. "Qiuyuejava"->"qiuyuejava"
  • The first letter of Chinese Pinyin and the English characters remain unchanged. "Programmer a"->CXYa''

Listed below are some commonly used methods, which can be encapsulated into your own tool class for use.

Let’s take a look at the test results first:
Insert image description here


First introduce pinyin4j dependency

<!--拼音-->
<dependency>
			<groupId>com.belerweb</groupId>
			<artifactId>pinyin4j</artifactId>
			<version>2.5.0</version>
</dependency>

1. Regular expression to determine whether a string contains letters

Regular expression to determine whether a string contains letters

   /**
     * 使用正则表达式来判断字符串中是否包含字母
     * @param str 待检验的字符串
     * @return 返回是否包含
     * true: 包含字母 ;false 不包含字母
     */
    public static boolean judgeContainsStr(String str) {
    
    
        String regex=".*[a-zA-Z]+.*";
        Matcher m= Pattern.compile(regex).matcher(str);
        return m.matches();
    }

2. Obtain the initial letters of Chinese characters (usually used for address book retrieval)

Get the first letter of Chinese characters (usually used for address book retrieval)

 public static String getAlpha(String chines) {
    
    
        String pinyinName = "";
        char[] nameChar = chines.toCharArray();
        HanyuPinyinOutputFormat defaultFormat = new HanyuPinyinOutputFormat();
        defaultFormat.setCaseType(HanyuPinyinCaseType.UPPERCASE);
        defaultFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);
        for (int i = 0; i < nameChar.length; i++) {
    
    
            System.out.println(nameChar[i]);
            if (nameChar[i] > 128) {
    
    
                try {
    
    
                    String[] ac = PinyinHelper.toHanyuPinyinStringArray(nameChar[i], defaultFormat);
                    if(ac!=null&&ac.length>0){
    
    
                        pinyinName += PinyinHelper.toHanyuPinyinStringArray(nameChar[i], defaultFormat)[0].charAt(0);
                    }
                } catch (BadHanyuPinyinOutputFormatCombination e) {
    
    
                    return "#";
                } catch (Exception e){
    
    
                    return "#";
                }
            } else {
    
    
                if(judgeContainsStr(String.valueOf(nameChar[i]))){
    
    
                    pinyinName += nameChar[i];
                }else{
    
    
                    pinyinName += "#";
                }

            }
        }
        return pinyinName;
    }

3. Obtain the abbreviation of the first letter of Chinese Pinyin

Get the Chinese Pinyin abbreviation

public static String getAlphaDefaultCode(String chines) {
    
    
    String pinyinName = "";
    char[] nameChar = chines.toCharArray();
    HanyuPinyinOutputFormat defaultFormat = new HanyuPinyinOutputFormat();
    defaultFormat.setCaseType(HanyuPinyinCaseType.UPPERCASE);
    defaultFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);
    for (int i = 0; i < nameChar.length; i++) {
    
    
        if (nameChar[i] > 128) {
    
    
            try {
    
    
                String[] ac = PinyinHelper.toHanyuPinyinStringArray(nameChar[i], defaultFormat);
                if(ac!=null&&ac.length>0){
    
    
                    pinyinName += PinyinHelper.toHanyuPinyinStringArray(nameChar[i], defaultFormat)[0].charAt(0);
                }
            } catch (BadHanyuPinyinOutputFormatCombination e) {
    
    
                return "#";
            } catch (Exception e){
    
    
                return "#";
            }
        } else {
    
    
            if(judgeContainsStr(String.valueOf(nameChar[i]))){
    
    
                pinyinName += nameChar[i];
            }else{
    
    
                pinyinName += "#";
            }
        }
    }
    if(StringUtils.isBlank(pinyinName)){
    
    
        pinyinName = "#";
    }
    return pinyinName;
}

4. Convert the Chinese characters in the string to Pinyin, leaving the English characters unchanged.

Convert the Chinese characters in the string to Pinyin, leaving the English characters unchanged

  /**
     * 将字符串中的中文转化为拼音,英文字符不变
     *
     * @param inputString
     *            汉字
     * @return
     */
    public static String getPingYin(String inputString) {
    
    
        inputString = cleanChar(inputString);
        HanyuPinyinOutputFormat format = new HanyuPinyinOutputFormat();
        format.setCaseType(HanyuPinyinCaseType.LOWERCASE);
        format.setToneType(HanyuPinyinToneType.WITHOUT_TONE);
        format.setVCharType(HanyuPinyinVCharType.WITH_V);
        String output = "";
        if (inputString != null && inputString.length() > 0 && !"null".equals(inputString)) {
    
    
            char[] input = inputString.trim().toCharArray();
            try {
    
    
                for (int i = 0; i < input.length; i++) {
    
    
                    if (Character.toString(input[i]).matches("[\\u4E00-\\u9FA5]+")) {
    
    
                        String[] temp = PinyinHelper.toHanyuPinyinStringArray(input[i], format);
                        output += temp[0];
                    } else
                        output += Character.toString(input[i]);
                }
            } catch (BadHanyuPinyinOutputFormatCombination e) {
    
    
                e.printStackTrace();
            }
        } else {
    
    
            return "*";
        }
        return output;
    }

5. Chinese characters are converted to the first letter of Chinese Pinyin, and English characters remain unchanged.

Chinese characters are converted to the first letter of Chinese Pinyin, and the English characters remain unchanged.

 /**
     * 汉字转换位汉语拼音首字母,英文字符不变
     *
     * @param chines
     *            汉字
     * @return 拼音
     */
    public static String converterToFirstSpell(String chines) {
    
    
        chines = cleanChar(chines);
        String pinyinName = "";
        char[] nameChar = chines.toCharArray();
        HanyuPinyinOutputFormat defaultFormat = new HanyuPinyinOutputFormat();
        defaultFormat.setCaseType(HanyuPinyinCaseType.UPPERCASE);
        defaultFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);
        for (int i = 0; i < nameChar.length; i++) {
    
    
            if (nameChar[i] > 128) {
    
    
                try {
    
    
                    pinyinName += PinyinHelper.toHanyuPinyinStringArray(nameChar[i], defaultFormat)[0].charAt(0);
                } catch (BadHanyuPinyinOutputFormatCombination e) {
    
    
                    e.printStackTrace();
                }
            } else {
    
    
                pinyinName += nameChar[i];
            }
        }
        return pinyinName;
    }

6. Clean up special characters

Clean special characters

    /**
     * 清理特殊字符
     * @param chines
     * @return
     */
   public static String cleanChar(String chines) {
    
    
        chines = chines.replaceAll("[\\p{Punct}\\p{Space}]+", ""); // 正则去掉所有字符操作
        // 正则表达式去掉所有中文的特殊符号
        String regEx = "[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}<>《》【】‘;:”“’。,、?]";
        Pattern pattern = Pattern.compile(regEx);
        Matcher matcher = pattern.matcher(chines);
        chines = matcher.replaceAll("").trim();
        return chines;
    }

Insert image description here

Guess you like

Origin blog.csdn.net/weixin_47061482/article/details/131833736