RSS Reader 使用 Sax 解析器丢失标题中的字符

RSS Reader using Sax Parser losing characters from title

我正在尝试使用 SAX 解析器 return 来自 URL - http://pitchfork.com/rss/news/ 的 RSS 提要的内容,但在显示标题,显示部分文本或仅显示结束标记“>”

我如何修改我的处理程序 class 来防止这种情况发生?我想我应该使用 StringBuilder 或 StringBuffer,但我不确定如何实现它。

ParseHandler.java

public class RssParseHandler extends DefaultHandler {
//Parsed items
private List<RssItem> rssItems;
private RssItem currentItem;
private boolean parsingTitle;
private boolean parsingLink;
private boolean parsing_id;
private boolean parsingDescription;

public RssParseHandler() {
    rssItems = new ArrayList<RssItem>();
}

public List<RssItem> getItems() {
    return rssItems;
}

//Creates empty RssItem object during the process of an item start tag
//Indicators are set to true when particular tag is being processed
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {

    if ("item".equals(qName)) {
        currentItem = new RssItem();

    } else if ("title".equals(qName)) {
        parsingTitle = true;


    } else if ("link".equals(qName)) {
        parsingLink = true;


    } else if ("_id".equals(qName)) {
        parsing_id = true;


    } else if ("description".equals(qName)) {
        parsingDescription = true;

    }
}

//Current RssItem is added to the list following process of end tag
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {

    if ("item".equals(qName)) {
        rssItems.add(currentItem);
        currentItem = null;

    } else if ("title".equals(qName)) {
        parsingTitle = false;

    } else if ("link".equals(qName)) {
        parsingLink = false;

    } else if ("_id".equals(qName)) {
        parsing_id = false;

    } else if ("description".equals(qName)) {
        parsingDescription = false;
    }
}

@Override
public void characters(char[] ch, int start, int length) throws SAXException {

    if (parsingTitle) {
        if (currentItem != null)
            currentItem.setTitle(new String(ch, start, length));

    } else if (parsingLink) {
        if (currentItem != null) {
            currentItem.setLink(new String(ch, start, length));
            parsingLink = false;
        }

    } else if (parsing_id) {
        if (currentItem != null) {
            currentItem.set_id(new String(ch, start, length));
            parsing_id = false;
        }

    } else if (parsingDescription) {
        if (currentItem != null) {
            currentItem.setDescription(new String(ch, start, length));
            parsingDescription = false;
        }

    }
}}//rssHandlerClass

使用 StringBuilder 构建标签,而不是像文档中所说的那样使用新的 String 实例:

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

@CommonWares 在他的 post Here.

中正是这样说的

按照使用 StringBuilder 找到的方式构建您的标签,因为有块同时进入而不是整个字符串(这解释了不完整的标签!)。您可能需要也可能不需要 isBuilding 标志,但我不知道您的整个实现,所以我添加了它以防万一。

   StringBuilder mSb;
   boolean isBuilding;

   @Override
   public void startElement(String uri, String localName, String qName,
         Attributes attributes) throws SAXException {

        mSb = new StringBuilder();
        isBuilding = true;

        if(qName.equals("title")){
            parsingTitle = true;
        }
        ...
        ...
    }

    @Override
    public void characters (char ch[], int start, int length) {
        if (mSb !=null && isBuilding) {
            for (int i=start; i<start+length; i++) {
                mSb.append(ch[i]);
            }
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName)
        throws SAXException {

        if(parsingTitle){
            currentItem.setTitle(sb.toString().trim());
            parsingTitle = false;  
            isBuilding = false;
        }
    }