使用 Java 将 RSS 提要 XML 转换为 JSON 显示特殊字符
Converting RSS Feed XML to JSON using Java is Displaying Special Characters
创建了一个基于 Spring MVC 的 Restful 控制器,它采用硬编码的 RSS HTTP URL 并将其从 XML 转换为 JSON:
RssFeed 控制器:
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import org.apache.commons.io.IOUtils;
import org.apache.log4j.Logger;
import org.json.JSONObject;
import org.json.XML;
import com.fasterxml.jackson.databind.ObjectMapper;
@RestController
public class RssFeedController {
private HttpHeaders headers = null;
public RssFeedController() {
headers = new HttpHeaders();
headers.add("Content-Type", "application/json");
}
@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json")
public String getRssFeedAsJson() throws IOException {
InputStream xml = getInputStreamForURLData("http://www.samplefeed.com/feed");
String xmlString = IOUtils.toString(xml);
JSONObject jsonObject = XML.toJSONObject(xmlString);
ObjectMapper objectMapper = new ObjectMapper();
Object json = objectMapper.readValue(jsonObject.toString(), Object.class);
String response = objectMapper.writeValueAsString(json);
return response;
}
public static InputStream getInputStreamForURLData(String targetUrl) {
URL url = null;
HttpURLConnection httpConnection = null;
InputStream content = null;
try {
url = new URL(targetUrl);
URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent", "Mozilla/5.0");
httpConnection = (HttpURLConnection) conn;
int responseCode = httpConnection.getResponseCode();
content = (InputStream) httpConnection.getInputStream();
}
catch (MalformedURLException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
return content;
}
pom.xml
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20170516</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
</dependency>
原来的RSS Feed有以下内容:
<item>
<title>October Fest Weekend</title>
<link>http://www.samplefeed.com/feed/OctoberFestWeekend</link>
<comments>http://www.samplefeed.com/feed/OctoberFestWeekend/#comments</comments>
<pubDate>Wed, 04 Oct 2017 17:08:48 +0000</pubDate>
<dc:creator><![CDATA[John Doe]]></dc:creator>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://www.samplefeed.com/feed/?p=9227</guid>
<description><![CDATA[<p>
</p>
<p>Doors Open:6:30pm<br />
Show Begins: 7:30pm<br />
Show Ends (Estimated time): 11:00pm<br />
Location: Staples Center</p>
<p>Directions</p>
<p>Map of ...</p>
<p>The post <a rel="nofollow" href="http://www.samplefeed.com/feed/OctoberFestWeekend/">OctoberFest Weekend</a> appeared first on <a rel="nofollow" href="http://www.samplefeed.com">SampleFeed</a>.</p>
]]></description>
这会呈现为 JSON,如下所示:
{
"guid": {
"content": "http://www.samplefeed.com/feed/?p=9227",
"isPermaLink": false
},
"pubDate": "Wed, 04 Oct 2017 17:08:48 +0000",
"category": "Uncategorized",
"title": "October Fest Weekend",
"description": "<p>\n??</p>\n<p>Doors Open:6:30pm<br />\nShow Begins:?? 7:30pm<br />\nShow Ends (Estimated time):??11:00pm<br />\nLocation: Staples Center</p>\n<p>Directions</p>\n<p>Map of ...</p>\n<p>The post <a rel=\"nofollow\" href=\"http://www.samplefeed.com/feed/OctoberFestWeekend/\">OctoberFest Weekend</a> appeared first on <a rel=\"nofollow\" href=\"http://www.samplefeed.com\">Sample Feed</a>.</p>\n",
"dc:creator": "John Doe",
"link": "http://www.samplefeed.com/feed/OctoberFestWeekend",
"comments": "http://www.samplefeed.com/feed/OctoberFestWeekend/#comments"
}
请注意在呈现的 JSON 中 "description" 键的值后有两个问号(“??”),如下所示:
"description": "<p>\n??</p>\n
此外,演出开始后这里还有两个问号:
<br />\nShow Begins:??
还有 11:00 p.m.
之前
Show Ends (Estimated time):??11:00pm<br />
这不是唯一显示特殊字符的图案,还有三个 ???生成的标记以及一些地方,如 ????
例如
<title>Today’s 20th Annual Karaoke</title>
在 JSON 中这样呈现:
"title": "Today???s 20th Annual Karaoke"
和
<content-encoded>: <![CDATA[(Monte Vista High School, NY.). </span></p>]]></content:encoded>
在 JSON 中这样呈现:
"content:encoded": "(Monte Vista High School, NY.).????</span></p>
有些地方 XML 有一个破折号(“-”):
<strong>Welcome</strong> – Welcome to the Party!
在 JSON 中呈现:
<strong>Welcome</strong>????? Welcome to the Party!
有谁知道如何在我的代码中设置正确的编码,这样我就可以避免这些错误/特殊字符呈现问题?
Converting RSS Feed XML to JSON using Java is Displaying Special
Characters
逐行查看您的代码后,我得到了解决方案,我正在为您更新我的答案
特殊字符的问题响应为 ?
如果您更新这行代码
@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json")
至
@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json;charset=UTF-8")
您需要在使用 json 生成参数值时指定 UTF-8 字符集编码。对于我之前的误解回答,我很抱歉,但是我现在更新它。
像这样去掉了未知字符 (???):
@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json;charset=UTF-8")
public String getRssFeedAsJson() throws IOException, IllegalArgumentException {
String xmlString = readUrlToString("http://www.sample.com/feed");
JSONObject xmlJSONObj = XML.toJSONObject(xmlString);
byte[] ptext = xmlJSONObj.toString().getBytes(ISO_8859_1);
String jsonResponse = new String(ptext, UTF_8);
return jsonResponse;
}
public static String readUrlToString(String url) {
BufferedReader reader = null;
String result = null;
String retValue = null;
try {
URL u = new URL(url);
HttpURLConnection conn = (HttpURLConnection) u.openConnection();
conn.setRequestProperty("User-Agent", "Mozilla/5.0");
conn.setRequestMethod("GET");
conn.setDoOutput(true);
conn.setReadTimeout(2 * 1000);
conn.connect();
reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
StringBuilder builder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
builder.append(line).append("\n");
}
result = builder.toString();
retValue = result.replaceAll("[^\x00-\x7F]", "");
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if (reader != null) {
try {
reader.close();
}
catch (IOException ignoreOnClose) {
}
}
}
return retValue;
}
令人沮丧的是除了 SamDev 没有人试图提供帮助...
创建了一个基于 Spring MVC 的 Restful 控制器,它采用硬编码的 RSS HTTP URL 并将其从 XML 转换为 JSON:
RssFeed 控制器:
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import org.apache.commons.io.IOUtils;
import org.apache.log4j.Logger;
import org.json.JSONObject;
import org.json.XML;
import com.fasterxml.jackson.databind.ObjectMapper;
@RestController
public class RssFeedController {
private HttpHeaders headers = null;
public RssFeedController() {
headers = new HttpHeaders();
headers.add("Content-Type", "application/json");
}
@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json")
public String getRssFeedAsJson() throws IOException {
InputStream xml = getInputStreamForURLData("http://www.samplefeed.com/feed");
String xmlString = IOUtils.toString(xml);
JSONObject jsonObject = XML.toJSONObject(xmlString);
ObjectMapper objectMapper = new ObjectMapper();
Object json = objectMapper.readValue(jsonObject.toString(), Object.class);
String response = objectMapper.writeValueAsString(json);
return response;
}
public static InputStream getInputStreamForURLData(String targetUrl) {
URL url = null;
HttpURLConnection httpConnection = null;
InputStream content = null;
try {
url = new URL(targetUrl);
URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent", "Mozilla/5.0");
httpConnection = (HttpURLConnection) conn;
int responseCode = httpConnection.getResponseCode();
content = (InputStream) httpConnection.getInputStream();
}
catch (MalformedURLException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
return content;
}
pom.xml
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20170516</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
</dependency>
原来的RSS Feed有以下内容:
<item>
<title>October Fest Weekend</title>
<link>http://www.samplefeed.com/feed/OctoberFestWeekend</link>
<comments>http://www.samplefeed.com/feed/OctoberFestWeekend/#comments</comments>
<pubDate>Wed, 04 Oct 2017 17:08:48 +0000</pubDate>
<dc:creator><![CDATA[John Doe]]></dc:creator>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://www.samplefeed.com/feed/?p=9227</guid>
<description><![CDATA[<p>
</p>
<p>Doors Open:6:30pm<br />
Show Begins: 7:30pm<br />
Show Ends (Estimated time): 11:00pm<br />
Location: Staples Center</p>
<p>Directions</p>
<p>Map of ...</p>
<p>The post <a rel="nofollow" href="http://www.samplefeed.com/feed/OctoberFestWeekend/">OctoberFest Weekend</a> appeared first on <a rel="nofollow" href="http://www.samplefeed.com">SampleFeed</a>.</p>
]]></description>
这会呈现为 JSON,如下所示:
{
"guid": {
"content": "http://www.samplefeed.com/feed/?p=9227",
"isPermaLink": false
},
"pubDate": "Wed, 04 Oct 2017 17:08:48 +0000",
"category": "Uncategorized",
"title": "October Fest Weekend",
"description": "<p>\n??</p>\n<p>Doors Open:6:30pm<br />\nShow Begins:?? 7:30pm<br />\nShow Ends (Estimated time):??11:00pm<br />\nLocation: Staples Center</p>\n<p>Directions</p>\n<p>Map of ...</p>\n<p>The post <a rel=\"nofollow\" href=\"http://www.samplefeed.com/feed/OctoberFestWeekend/\">OctoberFest Weekend</a> appeared first on <a rel=\"nofollow\" href=\"http://www.samplefeed.com\">Sample Feed</a>.</p>\n",
"dc:creator": "John Doe",
"link": "http://www.samplefeed.com/feed/OctoberFestWeekend",
"comments": "http://www.samplefeed.com/feed/OctoberFestWeekend/#comments"
}
请注意在呈现的 JSON 中 "description" 键的值后有两个问号(“??”),如下所示:
"description": "<p>\n??</p>\n
此外,演出开始后这里还有两个问号:
<br />\nShow Begins:??
还有 11:00 p.m.
之前Show Ends (Estimated time):??11:00pm<br />
这不是唯一显示特殊字符的图案,还有三个 ???生成的标记以及一些地方,如 ????
例如
<title>Today’s 20th Annual Karaoke</title>
在 JSON 中这样呈现:
"title": "Today???s 20th Annual Karaoke"
和
<content-encoded>: <![CDATA[(Monte Vista High School, NY.). </span></p>]]></content:encoded>
在 JSON 中这样呈现:
"content:encoded": "(Monte Vista High School, NY.).????</span></p>
有些地方 XML 有一个破折号(“-”):
<strong>Welcome</strong> – Welcome to the Party!
在 JSON 中呈现:
<strong>Welcome</strong>????? Welcome to the Party!
有谁知道如何在我的代码中设置正确的编码,这样我就可以避免这些错误/特殊字符呈现问题?
Converting RSS Feed XML to JSON using Java is Displaying Special Characters
逐行查看您的代码后,我得到了解决方案,我正在为您更新我的答案 特殊字符的问题响应为 ?
如果您更新这行代码
@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json")
至
@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json;charset=UTF-8")
您需要在使用 json 生成参数值时指定 UTF-8 字符集编码。对于我之前的误解回答,我很抱歉,但是我现在更新它。
像这样去掉了未知字符 (???):
@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json;charset=UTF-8")
public String getRssFeedAsJson() throws IOException, IllegalArgumentException {
String xmlString = readUrlToString("http://www.sample.com/feed");
JSONObject xmlJSONObj = XML.toJSONObject(xmlString);
byte[] ptext = xmlJSONObj.toString().getBytes(ISO_8859_1);
String jsonResponse = new String(ptext, UTF_8);
return jsonResponse;
}
public static String readUrlToString(String url) {
BufferedReader reader = null;
String result = null;
String retValue = null;
try {
URL u = new URL(url);
HttpURLConnection conn = (HttpURLConnection) u.openConnection();
conn.setRequestProperty("User-Agent", "Mozilla/5.0");
conn.setRequestMethod("GET");
conn.setDoOutput(true);
conn.setReadTimeout(2 * 1000);
conn.connect();
reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
StringBuilder builder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
builder.append(line).append("\n");
}
result = builder.toString();
retValue = result.replaceAll("[^\x00-\x7F]", "");
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if (reader != null) {
try {
reader.close();
}
catch (IOException ignoreOnClose) {
}
}
}
return retValue;
}
令人沮丧的是除了 SamDev 没有人试图提供帮助...