Java【代码 20】写入csv文件的字段值含有特殊字符的转义处理

csv字段值特殊字符转义处理

1.csv 的格式

rfc4180说明

  1. Each record is located on a separate line, delimited by a line break (CRLF).

每个记录位于单独的行上,由换行符(CRLF)分隔。

aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
  1. The last record in the file may or may not have an ending line break.

文件中的最后一个记录可能有也可能没有结束换行符。

aaa,bbb,ccc CRLF
zzz,yyy,xxx
  1. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and should contain the same number of fields as the records in the rest of the file (the presence or absence of the header line should be indicated via the optional “header” parameter of this MIME type).

可能有一个可选的头行出现在文件的第一行,格式与普通记录行相同。这个报头将包含与文件中的字段相对应的名称,并且应该包含与文件其余部分中的记录相同数量的字段(报头行的存在或不存在应该通过此MIME类型的可选“header”参数表示)。

field_name,field_name,field_name CRLF
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
  1. Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered partof a field and should not be ignored. The last field in the record must not be followed by a comma.

在标题和每个记录中,可以有一个或多个以逗号分隔的字段。在整个文件中,每行应该包含相同数量的字段。空格被认为是字段的一部分,不应该被忽略。记录中的最后一个字段不能后跟逗号。

aaa,bbb,ccc
  1. Each field may or may not be enclosed in double quotes (howeversome programs, such as Microsoft Excel, do not use double quotesat all). If fields are not enclosed with double quotes, thendouble quotes may not appear inside the fields.

每个字段可以用双引号括起来,也可以不用双引号括起来(但是有些程序,例如Microsoft Excel,根本不使用双引号)。如果字段没有用双引号括起来,那么双引号可能不会出现在字段中。

"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
  1. Fields containing line breaks (CRLF), double quotes, and commasshould be enclosed in double-quotes.

包含换行符(CRLF)、双引号和逗号的字段应该用双引号括起来。

"aaa","b CRLF bb","ccc" CRLF
zzz,yyy,xxx
  1. If double-quotes are used to enclose fields, then a double-quoteappearing inside a field must be escaped by preceding it withanother double quote.

如果使用双引号括住字段,那么出现在字段内的双引号必须通过在其前面加上另一个双引号进行转义。

"aaa","b""bb","ccc"

表格汇总:

字段处理前 字段处理后
abc,d “abc,d”
ab"c,d “ab”“c,d”
"abcd “”“abcd”
ab CRLF cd “ab CRLF cd”
“” “”“”“”

2.Java 编程

private String escapeCsvString(String fieldValue) {
    
    
    String CSV_DELIM = ",";
    String CSV_QUOTE = "\"";
    String CSV_CRLF = "\r\n";
    // 如果字段中包含逗号、双引号、换行符(规则6包含换行符(CRLF)、双引号和逗号的字段应该用双引号括起来)
    if (fieldValue.contains(CSV_DELIM) || fieldValue.contains(CSV_QUOTE) || fieldValue.contains(CSV_CRLF)) {
    
    
        // 替换单个双引号为两个双引号(规则7出现在字段内的双引号必须通过在其前面加上另一个双引号进行转义)
        fieldValue = "\"" + fieldValue.replace("\"", "\"\"") + "\"";
    }
    return fieldValue;
}