Character-encoding 即使在使用 UTF8 后处理 InputStreamReader 时也不会保留
Character-encoding not preserved when dealing with InputStreamReader even after using UTF8
有数据:
batiment:Kube D
etage:4ème
description:some_description
我想通过 InputStreamReader 东西获取这些数据:
SharedByteArrayInputStream sbais = (SharedByteArrayInputStream) content;
Reader reader = new InputStreamReader(sbais, Charset.forName("UTF8"));
int size = sbais.available();
char[] theChars = new char[size];
int data = reader.read();
int i = 0;
while (data != -1) {
theChars[i] = (char) data;
i++;
data = reader.read();
}
String parse = new String(theChars);
String[] parties = parse.split("Content-Transfer-Encoding: quoted-printable");
String partie = (parties[1]).trim();
parties = partie.split("\R");
String ret = "";
for(String ligne : parties) {
if (ligne == null || ligne.trim().equals(""))
break;
ret = ret.concat(ligne).concat(System.lineSeparator());
}
return ret;
在运行时,数据 4ème
被转换为 4=E8me
所以怎么了?
编辑:
这里是headers的内容:
--_008_DB6P190MB0166B6F4DE5E31397B4A7B558C3C9DB6P190MB0166EURP_
Content-Type: multipart/alternative;
boundary="_000_DB6P190MB0166B6F4DE5E31397B4A7B558C3C9DB6P190MB0166EURP_"
--_000_DB6P190MB0166B6F4DE5E31397B4A7B558C3C9DB6P190MB0166EURP_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
batiment:KUBE D
etage:4=E8me
description:andrana
Cordialement,
...
我们可以看到您忽略了内容中字符串 Content-Transfer-Encoding: quoted-printable
之前的所有内容。
这意味着你的初始内容实际上是4=E8me
,它对应于一个ISO-8859-1字符串,编码为quoted-printable。
如果你想把它转换成4ème
,你必须解码它。
没有现成的东西,但是 this 答案会给你一些你可以使用的库的想法。
例如使用 Apache Common Codec,它会是这样的:
partie = new QuotedPrintableCodec(StandardCharsets.ISO_8859_1).decode(partie);
按照 Obourgain 的回答,这里是工作代码:
SharedByteArrayInputStream sbais = (SharedByteArrayInputStream) content;
Reader reader = new InputStreamReader(sbais, StandardCharsets.ISO_8859_1);
int size = sbais.available();
char[] theChars = new char[size];
int data = reader.read();
int i = 0;
while (data != -1) {
theChars[i] = (char) data;
i++;
data = reader.read();
}
String parse = new String(theChars);
String[] parties = parse.split("Content-Transfer-Encoding: quoted-printable");
String partie = (parties[1]).trim();
String[] lignes = partie.split("\R");
String ret = "";
for(String ligne : lignes) {
if (ligne == null || ligne.trim().equals(""))
break;
String tmp = new QuotedPrintableCodec(StandardCharsets.ISO_8859_1).decode(ligne, StandardCharsets.ISO_8859_1);
ret = ret.concat(tmp).concat(System.lineSeparator());
}
return ret;
有数据:
batiment:Kube D
etage:4ème
description:some_description
我想通过 InputStreamReader 东西获取这些数据:
SharedByteArrayInputStream sbais = (SharedByteArrayInputStream) content;
Reader reader = new InputStreamReader(sbais, Charset.forName("UTF8"));
int size = sbais.available();
char[] theChars = new char[size];
int data = reader.read();
int i = 0;
while (data != -1) {
theChars[i] = (char) data;
i++;
data = reader.read();
}
String parse = new String(theChars);
String[] parties = parse.split("Content-Transfer-Encoding: quoted-printable");
String partie = (parties[1]).trim();
parties = partie.split("\R");
String ret = "";
for(String ligne : parties) {
if (ligne == null || ligne.trim().equals(""))
break;
ret = ret.concat(ligne).concat(System.lineSeparator());
}
return ret;
在运行时,数据 4ème
被转换为 4=E8me
所以怎么了?
编辑:
这里是headers的内容:
--_008_DB6P190MB0166B6F4DE5E31397B4A7B558C3C9DB6P190MB0166EURP_
Content-Type: multipart/alternative;
boundary="_000_DB6P190MB0166B6F4DE5E31397B4A7B558C3C9DB6P190MB0166EURP_"
--_000_DB6P190MB0166B6F4DE5E31397B4A7B558C3C9DB6P190MB0166EURP_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
batiment:KUBE D
etage:4=E8me
description:andrana
Cordialement,
...
我们可以看到您忽略了内容中字符串 Content-Transfer-Encoding: quoted-printable
之前的所有内容。
这意味着你的初始内容实际上是4=E8me
,它对应于一个ISO-8859-1字符串,编码为quoted-printable。
如果你想把它转换成4ème
,你必须解码它。
没有现成的东西,但是 this 答案会给你一些你可以使用的库的想法。
例如使用 Apache Common Codec,它会是这样的:
partie = new QuotedPrintableCodec(StandardCharsets.ISO_8859_1).decode(partie);
按照 Obourgain 的回答,这里是工作代码:
SharedByteArrayInputStream sbais = (SharedByteArrayInputStream) content;
Reader reader = new InputStreamReader(sbais, StandardCharsets.ISO_8859_1);
int size = sbais.available();
char[] theChars = new char[size];
int data = reader.read();
int i = 0;
while (data != -1) {
theChars[i] = (char) data;
i++;
data = reader.read();
}
String parse = new String(theChars);
String[] parties = parse.split("Content-Transfer-Encoding: quoted-printable");
String partie = (parties[1]).trim();
String[] lignes = partie.split("\R");
String ret = "";
for(String ligne : lignes) {
if (ligne == null || ligne.trim().equals(""))
break;
String tmp = new QuotedPrintableCodec(StandardCharsets.ISO_8859_1).decode(ligne, StandardCharsets.ISO_8859_1);
ret = ret.concat(tmp).concat(System.lineSeparator());
}
return ret;