"Content is not allowed in trailing section." 使用 SAX 解析时 java

"Content is not allowed in trailing section." when parsing with SAX java

这是续作。当我尝试解析我的 xml 文件时出现此错误。

Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 68; columnNumber: 12; Content is not allowed in trailing section.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$TrailingMiscDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at convert.ExcelXmlReader.getAndParseFile(ExcelXmlReader.java:55)
at convert.ExcelXmlReader.main(ExcelXmlReader.java:24)

"lineNumber: 68; columnNumber: 12;" 部分与我的 xml 文件中最后一个“>”匹配。当我尝试删除它后面的空 space 时,它仍然给我错误。我试着把它扔进 xml validator,但它什么也没有。我真的不确定我在做什么。我尝试了其他堆栈溢出问题的其他解决方案(查看我的文件以找到 xml 文件后的任何奇怪字符,确保所有标签都已关闭)但其中 none 对我有用。

有人告诉我现在应该去哪里吗?哪个方向最好?

<?xml version="1.0" encoding="utf-16"?>
<?mso-application progid="Excel.Sheet"?>

<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>marc</Author>
<LastAuthor>ESDI</LastAuthor>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7560</WindowHeight>
<WindowWidth>12300</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>135</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<NumberFormat ss:Format="Short Date"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table x:FullColumns="1" x:FullRows="1">
            <Row>
    <Cell><Data ss:Type="String">Crt. Dte</Data></Cell>
    <Cell><Data ss:Type="String">WR Status</Data></Cell>
    <Cell><Data ss:Type="String">Request Plant</Data></Cell>
    <Cell><Data ss:Type="String">Request #</Data></Cell>    
    <Cell><Data ss:Type="String">Item#</Data></Cell>
    <Cell><Data ss:Type="String">Request Cost Center</Data></Cell>
    <Cell><Data ss:Type="String">WR Description</Data></Cell>
    <Cell><Data ss:Type="String">W/O No</Data></Cell>
    <Cell><Data ss:Type="String">Charge Plant</Data></Cell>
    <Cell><Data ss:Type="String">Charge Cost Center</Data></Cell>
    <Cell><Data ss:Type="String">Equip NO</Data></Cell>
    <Cell><Data ss:Type="String">Equipment Name</Data></Cell>
    <Cell><Data ss:Type="String">Required Date</Data></Cell>
    <Cell><Data ss:Type="String">WO Type</Data></Cell>
    <Cell><Data ss:Type="String">Exec. C/C</Data></Cell>
    <Cell><Data ss:Type="String">Exec. Plant</Data></Cell>  
    <Cell><Data ss:Type="String">Plant1</Data></Cell>
    <Cell><Data ss:Type="String">Area</Data></Cell>
    <Cell><Data ss:Type="String">Confirmed</Data></Cell>
    <Cell><Data ss:Type="String">WO Status</Data></Cell>
    <Cell><Data ss:Type="String">W/R Requester</Data></Cell>

            </Row>

</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>

当前解析代码。其他大部分代码都在上面链接的前一个问题中。

private static void getAndParseFile() throws Exception {
        System.out.println("getAndParseFile");
        String fileName="C:\Users\windowsUserName\Downloads\F7BAH1P_List.xml";

        File file = new File(fileName);

        removeLineFromFile(file.getAbsolutePath());
        System.out.println("Finished Removing Lines");


        String fileContent = IOUtils.toString(new FileInputStream(file));
        fileContent = fileContent.substring(0, fileContent.lastIndexOf('>')+1);
        fileContent = fileContent.replaceAll("&#","");



        PrintWriter pw = null;
        pw = new PrintWriter(new FileWriter("C:\Users\windowsUserName\Downloads\tempfile.txt"));
        pw.println(fileContent);
        pw.flush();

        ByteArrayInputStream bis = new ByteArrayInputStream(Charset.forName("UTF-16").encode(fileContent).array());


        SAXParserFactory parserFactor = SAXParserFactory.newInstance();
        SAXParser parser = parserFactor.newSAXParser();
        SAXHandler handler = new SAXHandler();

        parser.parse(bis, handler);

    }

RemoveLineFromFile 从 xml 文件的开头和结尾删除 2 <row></row> 个空白或包含一些 counter/title 数据。

private static void removeLineFromFile(String file) {

        BufferedReader br = null;
        PrintWriter pw = null;
        try {
            File inFile = new File(file);
            if (!inFile.isFile()) {
                return;
            }

            br = new BufferedReader(new FileReader(file));

            String line = null;
            int totalRows=0;
            boolean continueMethod = false;
            //Count total number of rows in file
            while ((line = br.readLine()) != null) {
                //check if file is already formatted
                if (line.contains("List for Work")){
                    continueMethod = true;
                }

                if (line.toLowerCase().contains("</row>")){
                        ++totalRows;
                    }
                }

            if (continueMethod)
            {
                //Create a temporary file to hold the file with deleted lines.
                File tempFile = new File(inFile.getAbsolutePath() + ".tmp");
                pw = new PrintWriter(new FileWriter(tempFile));

                line = null;
                br.close();
                br = null;
                br = new BufferedReader(new FileReader(file));
                boolean ignoreMe = false;
                int rowCounter = 0;
                int rowCloser = 0;
                //begin cycling through file and writing to new one.
                while((line = br.readLine()) != null)
                {
                    //if runs into a row, count it.
                    if (line.toLowerCase().contains("<row>")){
                        rowCounter++;
                    }
                    if (line.toLowerCase().contains("</row>")){
                        rowCloser++;
                    }
                    //Delete the first two, and last two lines
                    if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
                    {
                        ignoreMe = true;
                        //If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
                        if (rowCloser==totalRows)
                            rowCounter++;                   
                    }
                    else
                    {
                        ignoreMe = false;
                    }
                    //copy over other lines
                    if (!ignoreMe)
                    {
                        pw.println(line);
                        pw.flush();
                    }
                }   
                br.close();
                pw.close();
                //Delete the original file
                if (!inFile.delete()) {
                    System.out.println("Could not delete original file");
                    return;
                }

                //Rename the new file to the filename the original file had.
                if (!tempFile.renameTo(inFile))
                    System.out.println("Could not rename temp file");
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

这是通过 "removelinefromfile"

之前的 xml 文件
<?xml version="1.0" encoding="utf-16"?>
<?mso-application progid="Excel.Sheet"?>

<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>marc</Author>
<LastAuthor>ESDI</LastAuthor>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7560</WindowHeight>
<WindowWidth>12300</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>135</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<NumberFormat ss:Format="Short Date"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table x:FullColumns="1" x:FullRows="1">
<Row>
<Cell><Data ss:Type="String">List for Work Request(F7BAH1P)</Data></Cell>
</Row>
<Row>
</Row>
            <Row>
    <Cell><Data ss:Type="String">Crt. Dte</Data></Cell>
    <Cell><Data ss:Type="String">WR Status</Data></Cell>
    <Cell><Data ss:Type="String">Request Plant</Data></Cell>
    <Cell><Data ss:Type="String">Request #</Data></Cell>    
    <Cell><Data ss:Type="String">Item#</Data></Cell>
    <Cell><Data ss:Type="String">Request Cost Center</Data></Cell>
    <Cell><Data ss:Type="String">WR Description</Data></Cell>
    <Cell><Data ss:Type="String">W/O No</Data></Cell>
    <Cell><Data ss:Type="String">Charge Plant</Data></Cell>
    <Cell><Data ss:Type="String">Charge Cost Center</Data></Cell>
    <Cell><Data ss:Type="String">Equip NO</Data></Cell>
    <Cell><Data ss:Type="String">Equipment Name</Data></Cell>
    <Cell><Data ss:Type="String">Required Date</Data></Cell>
    <Cell><Data ss:Type="String">WO Type</Data></Cell>
    <Cell><Data ss:Type="String">Exec. C/C</Data></Cell>
    <Cell><Data ss:Type="String">Exec. Plant</Data></Cell>  
    <Cell><Data ss:Type="String">Plant1</Data></Cell>
    <Cell><Data ss:Type="String">Area</Data></Cell>
    <Cell><Data ss:Type="String">Confirmed</Data></Cell>
    <Cell><Data ss:Type="String">WO Status</Data></Cell>
    <Cell><Data ss:Type="String">W/R Requester</Data></Cell>

            </Row>






 <Row>
</Row>
<Row>
<Cell><Data ss:Type="String">Count: 244</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>

如果您的文件编码与 XML 声明中的编码不匹配,您可能会遇到解析错误:

<?xml version="1.0" encoding="utf-16"?>

FileWriter and FileReader 假设默认字符编码是可以接受的(我的系统是 UTF-8)。您不能依赖它们以可移植的方式处理 UTF-16 编码的文件。这是他们的文档:

Convenience class for writing character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable. To specify these values yourself, construct an OutputStreamWriter on a FileOutputStream.

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

因此您需要按照文档的建议进行操作 - 使用替代方法。

下面是一些快速测试代码,可通过 removeLineFromFile 方法的三种不同实现来演示您遇到的问题:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class Encoding {
    
    private static File removeLineFromFile2(String file) {

        File ret = null;
        
        BufferedReader br = null;
        PrintWriter pw = null;
        try {
            File inFile = new File(file);
            if (!inFile.isFile()) {
                return ret;
            }

            ret = inFile;
            
            br = new BufferedReader(new InputStreamReader(
                    new FileInputStream(file), "UTF-16"));

            String line = null;
            int totalRows=0;
            boolean continueMethod = false;
            //Count total number of rows in file
            while ((line = br.readLine()) != null) {
                //check if file is already formatted
                if (line.contains("List for Work")){
                    continueMethod = true;
                }

                if (line.toLowerCase().contains("</row>")){
                        ++totalRows;
                    }
                }

            if (continueMethod)
            {
                //Create a temporary file to hold the file with deleted lines.
                File tempFile = new File(inFile.getAbsolutePath() + ".2.tmp");
                pw = new PrintWriter(new OutputStreamWriter(
                    new FileOutputStream(tempFile), "UTF-16"));

                line = null;
                br.close();
                br = null;
                br = new BufferedReader(new InputStreamReader(
                    new FileInputStream(file), "UTF-16"));
                boolean ignoreMe = false;
                int rowCounter = 0;
                int rowCloser = 0;
                //begin cycling through file and writing to new one.
                while((line = br.readLine()) != null)
                {
                    //if runs into a row, count it.
                    if (line.toLowerCase().contains("<row>")){
                        rowCounter++;
                    }
                    if (line.toLowerCase().contains("</row>")){
                        rowCloser++;
                    }
                    //Delete the first two, and last two lines
                    if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
                    {
                        ignoreMe = true;
                        //If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
                        if (rowCloser==totalRows)
                            rowCounter++;                   
                    }
                    else
                    {
                        ignoreMe = false;
                    }
                    //copy over other lines
                    if (!ignoreMe)
                    {
                        pw.println(line);
                        pw.flush();
                    }
                }   
                br.close();
                pw.close();
                System.out.println("Temp file is: " + tempFile.getAbsolutePath());
                ret = tempFile;
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        return ret;
    }
    
    private static File removeLineFromFile1(String file) {
        
        File ret = null;

        BufferedReader br = null;
        PrintWriter pw = null;
        try {
            File inFile = new File(file);
            if (!inFile.isFile()) {
                return ret;
            }

            ret = inFile;
            
            br = new BufferedReader(new InputStreamReader(
                    new FileInputStream(file), "UTF-16"));

            String line = null;
            int totalRows=0;
            boolean continueMethod = false;
            //Count total number of rows in file
            while ((line = br.readLine()) != null) {
                //check if file is already formatted
                if (line.contains("List for Work")){
                    continueMethod = true;
                }

                if (line.toLowerCase().contains("</row>")){
                        ++totalRows;
                    }
                }

            if (continueMethod)
            {
                //Create a temporary file to hold the file with deleted lines.
                File tempFile = new File(inFile.getAbsolutePath() + ".1.tmp");
                pw = new PrintWriter(new FileWriter(tempFile));

                line = null;
                br.close();
                br = null;
                br = new BufferedReader(new InputStreamReader(
                    new FileInputStream(file), "UTF-16"));
                boolean ignoreMe = false;
                int rowCounter = 0;
                int rowCloser = 0;
                //begin cycling through file and writing to new one.
                while((line = br.readLine()) != null)
                {
                    //if runs into a row, count it.
                    if (line.toLowerCase().contains("<row>")){
                        rowCounter++;
                    }
                    if (line.toLowerCase().contains("</row>")){
                        rowCloser++;
                    }
                    //Delete the first two, and last two lines
                    if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
                    {
                        ignoreMe = true;
                        //If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
                        if (rowCloser==totalRows)
                            rowCounter++;                   
                    }
                    else
                    {
                        ignoreMe = false;
                    }
                    //copy over other lines
                    if (!ignoreMe)
                    {
                        pw.println(line);
                        pw.flush();
                    }
                }   
                br.close();
                pw.close();
                System.out.println("Temp file is: " + tempFile.getAbsolutePath());
                ret = tempFile;
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        return ret;
    }
    
    private static File removeLineFromFile(String file) {

        File ret = null;
        
        BufferedReader br = null;
        PrintWriter pw = null;
        try {
            File inFile = new File(file);
            if (!inFile.isFile()) {
                return ret;
            }

            ret = inFile;
            
            br = new BufferedReader(new FileReader(file));

            String line = null;
            int totalRows=0;
            boolean continueMethod = false;
            //Count total number of rows in file
            while ((line = br.readLine()) != null) {
                //check if file is already formatted
                if (line.contains("List for Work")){
                    continueMethod = true;
                }

                if (line.toLowerCase().contains("</row>")){
                        ++totalRows;
                    }
                }

            if (continueMethod)
            {
                //Create a temporary file to hold the file with deleted lines.
                File tempFile = new File(inFile.getAbsolutePath() + ".tmp");
                pw = new PrintWriter(new FileWriter(tempFile));

                line = null;
                br.close();
                br = null;
                br = new BufferedReader(new FileReader(file));
                boolean ignoreMe = false;
                int rowCounter = 0;
                int rowCloser = 0;
                //begin cycling through file and writing to new one.
                while((line = br.readLine()) != null)
                {
                    //if runs into a row, count it.
                    if (line.toLowerCase().contains("<row>")){
                        rowCounter++;
                    }
                    if (line.toLowerCase().contains("</row>")){
                        rowCloser++;
                    }
                    //Delete the first two, and last two lines
                    if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
                    {
                        ignoreMe = true;
                        //If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
                        if (rowCloser==totalRows)
                            rowCounter++;                   
                    }
                    else
                    {
                        ignoreMe = false;
                    }
                    //copy over other lines
                    if (!ignoreMe)
                    {
                        pw.println(line);
                        pw.flush();
                    }
                }   
                br.close();
                pw.close();
                System.out.println("Temp file is: " + tempFile.getAbsolutePath());
                ret = tempFile;
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        return ret;
    }
    
    private static void parse(File file) {
        try {
            System.out.println("Parsing " + file.getAbsolutePath());
            
            SAXParserFactory parserFactor = SAXParserFactory.newInstance();
            SAXParser parser = parserFactor.newSAXParser();
            DefaultHandler handler = new DefaultHandler();
            
            parser.parse(file, handler);
        } catch (Exception ex) {
            System.out.println("An exception occurred: " + ex.getMessage());
        } finally {
            System.out.println("Done with " + file.getAbsolutePath());
        }
    }
    
    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        System.out.println("getAndParseFile");
        String fileName=args[0];

        File file = new File(fileName);

        File f2 = removeLineFromFile2(file.getAbsolutePath());
        File f1 = removeLineFromFile1(file.getAbsolutePath());
        File f = removeLineFromFile(file.getAbsolutePath());
        System.out.println("Finished Removing Lines");
        
        parse(f2);
        parse(f1);
        parse(f);
    }
}

removeLineFromFile2 代表您需要做的事情,removeLineFromFile1 代表如果您正确阅读内容但以错误的方式编写它们会发生什么(我怀疑这是您的情况) 而 removeLineFromFile 是您的实现,它对我的​​系统没有任何作用。

getAndParseFile
Temp file is: \path\to\sample-utf16.xml.2.tmp
Temp file is: \path\to\sample-utf16.xml.1.tmp
Finished Removing Lines
Parsing \path\to\sample-utf16.xml.2.tmp
Done with \path\to\sample-utf16.xml.2.tmp
Parsing \path\to\sample-utf16.xml.1.tmp
An exception occurred: Content is not allowed in prolog.
Done with \path\to\sample-utf16.xml.1.tmp
Parsing \path\to\sample-utf16.xml
Done with \path\to\sample-utf16.xml

以上所有假设您的输入文件确实是 XML 文件中指定的 UTF-16 格式。我认为情况并非如此。如果您自己创建了该文件,那么您的创建方式有误。尝试在 Notepad++(或类似工具)中打开它并通过编码菜单检查编码(应该说 UCS-2 或 UTF-16,而不是 ANSI、UTF-8 等)。

您的代码应始终明确指定它期望的文件编码。