在 Java 中，如何处理需要解析的 CSV 中的双引号

Question

这是我想做的，

这是我的 spend.csv 文件：

"Date","Description","Detail","Amount"
"5/03/21","Cinema","Batman","7.90"
"15/02/20","Groceries","Potatoes","23.00"
"9/12/21","DIY","Wood Plates","33.99"
"9/07/22","Fuel","Shell",".00"
"23/08/19","Lamborghini","Aventador","800,000.00"

来自 table 视图：

Table View of the csv

这就是我想要的输出文件 spend.xml :

 <?xml version="1.0" encoding="UTF-8"?>
    <SPEND>
    <RECORD DATE="5/03/21">
        <DESC>Cinema</DESC>
        <DETAIL>Batman</DETAIL>
        <AMOUNT>7.90</AMOUNT>
    </RECORD>
    <RECORD DATE="15/02/20">
        <DESC>Groceries</DESC>
        <DETAIL>Potatoes</DETAIL>
        <AMOUNT>23.00</AMOUNT>
    </RECORD>
    <RECORD DATE="9/12/21">
        <DESC>DIY</DESC>
        <DETAIL>Wood Plates</DETAIL>
        <AMOUNT>33.99</AMOUNT>
    </RECORD>
    <RECORD DATE="9/07/22">
        <DESC>Fuel</DESC>
        <DETAIL>Shell</DETAIL>
        <AMOUNT>.00</AMOUNT>
    </RECORD>
    <RECORD DATE="23/08/19">
        <DESC>Lamborghini</DESC>
        <DETAIL>Aventador</DETAIL>
        <AMOUNT>800,000.00</AMOUNT>
    </RECORD>
    </SPEND>

为了做到这一点，我在这里和那里找到了一些东西并设法得到了这个：

    public class Main {
    
       public static void main(String[] args) throws FileNotFoundException {
         
            List<String> headers = new ArrayList<String>(5);
    
            File file = new File("spend.csv");
            BufferedReader reader = null;
    
            try {
    
                DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
                DocumentBuilder domBuilder = domFactory.newDocumentBuilder();
    
                Document newDoc = domBuilder.newDocument();
                // Root element
                Element rootElement = newDoc.createElement("XMLCreators");
                newDoc.appendChild(rootElement);
    
                reader = new BufferedReader(new FileReader(file));
                int line = 0;
    
                String text = null;
                while ((text = reader.readLine()) != null) {
    
                    StringTokenizer st = new StringTokenizer(text, "", false);
    
                    int index = 0;
    
    
                    String[] rowValues = text.split(",");
    
                    if (line == 0) { // Header row
                        for (String col : rowValues) {
                            headers.add(col);
                        }
                    } else { // Data row
                        Element rowElement = newDoc.createElement("RECORDS");
                        rootElement.appendChild(rowElement);
                        for (int col = 0; col < headers.size(); col++) {
                            String header = headers.get(col);
                            String value = null;
    
                            if (col < rowValues.length) {
                                value = rowValues[col];
                            } else {
                                value = "";
                            }
    
                            Element curElement = newDoc.createElement(header);
                            curElement.appendChild(newDoc.createTextNode(value));
                            rowElement.appendChild(curElement);
                        }
                    }
                    line++;
                }
    
                ByteArrayOutputStream baos = null;
                OutputStreamWriter osw = null;
    
                try {
                    baos = new ByteArrayOutputStream();
                    osw = new OutputStreamWriter(baos);
    
                    TransformerFactory tranFactory = TransformerFactory.newInstance();
                    Transformer aTransformer = tranFactory.newTransformer();
                    aTransformer.setOutputProperty(OutputKeys.INDENT, "yes");
                    aTransformer.setOutputProperty(OutputKeys.METHOD, "xml");
                    aTransformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
    
                    Source src = new DOMSource(newDoc);
                    Result result = new StreamResult(osw);
                    aTransformer.transform(src, result);
    
                    osw.flush();
                    System.out.println(new String(baos.toByteArray()));
                } catch (Exception exp) {
                    exp.printStackTrace();
                } finally {
                    try {
                        osw.close();
                    } catch (Exception e) {
                    }
                    try {
                        baos.close();
                    } catch (Exception e) {
                    }
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
    
        }
    }

此时程序应该在终端中打印 XML 文件但是；

遗憾的是，由于我的 CSV 文件中每个值的双引号，我遇到了这个问题：

java org.w3c.dom.domexception invalid_character_err 指定了无效或非法的 xml 字符

我想我在这些行周围遗漏了一些东西：


    StringTokenizer st = new StringTokenizer(text, "", false);
    int index = 0;
    String[] rowValues = text.split(",");

我想在我的 CSV 中保留双引号，如果有人有想法请随时告诉我！

Answer 1

在您运行转换之前，请执行

String.replaceAll("\"", "####")

然后运行转换，当转换完成后，将其反转并将字符串中的所有“####”替换为双引号

Answer 2

使用 OpenCsv 和 Jackson 的另一种可能方法：

public class FileProcessor {
    public static void main(String[] args) throws IOException {
        List<DataStructure> importList =  new CsvToBeanBuilder<DataStructure>(
                new FileReader("pathIn"))
                    .withIgnoreEmptyLine(true)
                    .withType(DataStructure.class)
                    .build()
                    .parse();

        ListLoader exportList = new ListLoader(importList);

        XmlMapper xmlMapper = new XmlMapper();
        xmlMapper.configure(ToXmlGenerator.Feature.WRITE_XML_DECLARATION, true)
                .enable(SerializationFeature.INDENT_OUTPUT)
                .writeValue(new File("pathOut"), exportList);
    }
}

Class 序列化每个元素：

@Data
public class DataStructure {
    @CsvBindByName
    @JacksonXmlProperty(isAttribute = true, localName = "DATE")
    private String date;
    @CsvBindByName
    @JacksonXmlProperty(localName = "DESC")
    private String description;
    @CsvBindByName
    @JacksonXmlProperty(localName = "DETAIL")
    private String detail;
    @CsvBindByName
    @JacksonXmlProperty(localName = "AMOUNT")
    private String amount;
}

Class 序列化完整列表：

@JacksonXmlRootElement(localName = "SPEND")
public class ListLoader {
    @JacksonXmlElementWrapper(useWrapping = false)
    @JacksonXmlProperty(localName = "RECORD")
    private List<DataStructure> list;

    public ListLoader(List<DataStructure> list){
        this.list = list;
    }
}

在 Java 中，如何处理需要解析的 CSV 中的双引号

In Java, how do you deal with double quote inside of a CSV that you need to parse

java

xml

csv

parsing