[Java code] The Java version of NGender guesses its gender and masculine/feminine degree based on the Chinese name (Python version address + Java version source code + basic data)

This article has participated in the "Newcomer Creation Ceremony" event to start the road of gold creation together.

【Resource link】

Link: pan.baidu.com/s/1NSH5T0qk…

Extraction code: nnx6

【Include files】

insert image description here

1. Requirements description

Since the project needs to judge the gender by the name, Pythonthe NGenderpackage found on the Internet, but the technology stack of the project is Java, the first thing that comes to mind is to use jython-standaloneto execute the Python code, which is successfully called in the idea, but the module cannot be found during deployment, and the deployment cannot be NGendersolved in the end. The problem is therefore the Java version of NGender :smile: Small partners who have successfully deployed can share their experience. Java Release Notes:

  • 82% accuracy (same as python version)
  • Can be used to guess gender
  • Can be used to determine how masculine/feminine a name is

2. Code implementation

2.1 Dependencies

It is used to parse csvtype files. It is not required and can be parsed by itself.

<!-- 用于解析csv文件 -->
<dependency>
	<groupId>cn.hutool</groupId>
	<artifactId>hutool-all</artifactId>
	<version>5.6.6</version>
</dependency>
复制代码

2.2 Source code

The source code is converted from Python code and not optimized.

@Slf4j
@Component
public class GenderUtils {
    
    private Map<String, String> genderMap = new HashMap<>(9443);
    private int maleTotal = 0;
    private int femaleTotal = 0;
    private int genderTotal = 0;

    @PostConstruct
    private void init() {
        // 加载文件
        File toFile = new File("data/ngender/charfreq.csv");
        // 解析CSV文件
        CsvData rows = CsvUtil.getReader().read(toFile);
        for (int i = 1, rowCount = rows.getRowCount(); i < rowCount; i++) {
            CsvRow row = rows.getRow(i);
            maleTotal += Integer.parseInt(row.get(1));
            femaleTotal += Integer.parseInt(row.get(2));
        }
        genderTotal = maleTotal + femaleTotal;
        // 封装对象
        for (int i = 1, rowCount = rows.getRowCount(); i < rowCount; i++) {
            CsvRow row = rows.getRow(i);
            String nameChar = row.get(0);
            int maleNum = Integer.parseInt(row.get(1));
            int femaleNum = Integer.parseInt(row.get(2));
            genderMap.put(nameChar, 1.0 * femaleNum / femaleTotal + "," + 1.0 * maleNum / maleTotal);
        }
    }

    /**
     * 根据姓名判断性别(仅支持中文)
     *
     * @param nameString 姓名
     * @return 性别信息
     */
    public Map<String, String> guessGenderByName(String nameString) {
        // 截取【名】的全部字符字符
        char[] nameChars = nameString.substring(1).toCharArray();
        // 获取性别可能性数据
        double maleProb = getGenderProb(nameChars, 1);
        double femaleProb = getGenderProb(nameChars, 0);
        // 返回结果
        if (maleProb > femaleProb) {
            return new HashMap<String, String>(2) {{
                put("male", String.valueOf(maleProb / (maleProb + femaleProb)));
            }};
        } else if (femaleProb > maleProb) {
            return new HashMap<String, String>(2) {{
                put("female", String.valueOf(femaleProb / (maleProb + femaleProb)));
            }};
        } else {
            return new HashMap<String, String>(2) {{
                put("unknown", "0");
            }};
        }
    }

    /**
     * 计算性别可能性
     *
     * @param nameChars  【名】的全部字符字符
     * @param genderFlag 0 female 1 male
     * @return 性别及可能性
     */
    private double getGenderProb(char[] nameChars, int genderFlag) {
        double baseProb;
        if (genderFlag == 0) {
            baseProb = 1.0 * femaleTotal / genderTotal;
        } else {
            baseProb = 1.0 * maleTotal / genderTotal;
        }
        for (char nameChar : nameChars) {
            baseProb *= Double.parseDouble(MapUtils.getString(genderMap, nameChar + "", "0,0").split(",")[genderFlag]);
        }
        return baseProb;
    }

}
复制代码

2.3 call

charfreq.csvThere are 9943 files, which are required for the loading of the entire tool class 88ms【仅测试一次】.

Map<String, String> resultMap = genderUtils.guessGenderByName("刘芳芳");
// "female": "0.9835037905504539"
复制代码

3. Other

Python version of NGender download address and introduction.

Guess you like

Origin juejin.im/post/7087402016871284743