HttpClient.PostAsync 的字符编码问题
Character encoding problem with HttpClient.PostAsync
我们有一个遗留的网络应用程序,它可以在浏览器中手动运行。当我尝试使用 http posts 从代码中使用相同的网络应用程序时,我得到一些土耳其语字符作为 ?.
我有以下代码来制作 http post:
var httpClient = new HttpClient(); //static readonly in real code
var content = new StringContent("id_6=some text with Turkish characters öçşığüÖÇŞİĞÜ", Encoding.GetEncoding("ISO-8859-9"), "application/x-www-form-urlencoded");
var response = httpClient.PostAsync(url, content).Result; //I know this is not a good way, I'll focus on it later
var responseInString = response.Content.ReadAsStringAsync().Result;
File.WriteAllText("c:\temp\a.htm", responseInString);
Web 应用程序 returns 我 html 有一些输入值,包括那些由我的代码 post 编辑的值。由我的代码 post 编辑的表单值和使用我的值计算的表单值具有错误的土耳其语字符,而带有土耳其语字符的硬编码提交按钮看起来没问题。
Web 应用程序 returns 此 html(为简单起见被截断)到我的代码:
<!-- BELOW IS THE HARDCODED FORM FIELD WITH TURKISH CHARS OK! DISPLAYED AS: Programı Çağır -->
<input type="submit" value="Programı Çağır" name="j_id_jsp_262293626_16"/>
<!-- IRRELEVANT HTML REMOVED -->
<!-- BELOW IS THE OUTPUT FORM FIELD WITH CHAR ş BAD! DISPLAYED AS: some text with Turkish characters öç???üÖÇ???Ü -->
<input type="text" value="some text with Turkish characters öç???üÖÇ???Ü" id="id_2" name="id_2"/>
<!-- BELOW IS THE INPUT FORM FIELD WITH CHAR ş BAD! -->
<input type="text" value="some text with Turkish characters öç???üÖÇ???Ü" id="id_6" name="id_6" />
回复headers看起来不错:
有什么问题吗?
编辑:类似的代码 posting 到 sample form 可以正常工作:
static readonly HttpClient httpClient = new HttpClient();
[TestMethod]
public void TestHttpClientForTurkish()
{
var data = new Dictionary<string, string>()
{
{"fname", "öçşığü" },
{"lname", "ÖÇŞİĞÜ" }
};
var content = new FormUrlEncodedContent(data);
var response = httpClient.PostAsync("https://www.w3schools.com/action_page.php", content).Result;
var responseInString = response.Content.ReadAsStringAsync().Result;
Assert.IsTrue(responseInString.Contains("öçşığü") && responseInString.Contains("ÖÇŞİĞÜ"));
}
试试下面的代码
public static async Task SendRequestAsync()
{
var data = new Dictionary<string, byte[]>();
var key1 = "fname";
var val1 = Encoding.Unicode.GetBytes("öçşığü");
data.Add(key1, val1);
var key2 = "lname";
var val2 = Encoding.Unicode.GetBytes("ÖÇŞİĞÜ");
data.Add(key2, val2);
MemoryStream fs = new MemoryStream();
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(fs, data);
var barr = fs.ToArray();
var client = new HttpClient
{
BaseAddress = new Uri("http://www.yourservicelocation.com")
};
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(
new MediaTypeWithQualityHeaderValue("application/bson"));
var byteArrayContent = new ByteArrayContent(barr);
byteArrayContent.Headers.ContentType = new MediaTypeHeaderValue("application/bson");
var result = await client.PostAsync(
"api/SomeData/Incoming", byteArrayContent);
result.EnsureSuccessStatusCode();
}
我的发现:
- FormUrlEncodedContent class 不支持编码参数(因此不能正常处理土耳其语字符),所以我不得不使用 StringContent
- 我不得不使用 HttpUtility.UrlEncode 对表单值进行编码(并使用 ISO-8859-9 作为编码)。
这是最终代码,表单字段中的土耳其语字符没有任何问题:
var httpClient = new HttpClient(); //static readonly in real code
var iso = Encoding.GetEncoding("ISO-8859-9");
var content = new StringContent("id_6="+
HttpUtility.UrlEncode("some text with Turkish characters öçşığüÖÇŞİĞÜ", iso), iso,
"application/x-www-form-urlencoded");
var response = httpClient.PostAsync(url, content).Result;//Using Result because I don't have a UI thread or the context is not ASP.NET
var responseInString = response.Content.ReadAsStringAsync().Result;
File.WriteAllText("c:\temp\a.htm", responseInString);
我们有一个遗留的网络应用程序,它可以在浏览器中手动运行。当我尝试使用 http posts 从代码中使用相同的网络应用程序时,我得到一些土耳其语字符作为 ?.
我有以下代码来制作 http post:
var httpClient = new HttpClient(); //static readonly in real code
var content = new StringContent("id_6=some text with Turkish characters öçşığüÖÇŞİĞÜ", Encoding.GetEncoding("ISO-8859-9"), "application/x-www-form-urlencoded");
var response = httpClient.PostAsync(url, content).Result; //I know this is not a good way, I'll focus on it later
var responseInString = response.Content.ReadAsStringAsync().Result;
File.WriteAllText("c:\temp\a.htm", responseInString);
Web 应用程序 returns 我 html 有一些输入值,包括那些由我的代码 post 编辑的值。由我的代码 post 编辑的表单值和使用我的值计算的表单值具有错误的土耳其语字符,而带有土耳其语字符的硬编码提交按钮看起来没问题。
Web 应用程序 returns 此 html(为简单起见被截断)到我的代码:
<!-- BELOW IS THE HARDCODED FORM FIELD WITH TURKISH CHARS OK! DISPLAYED AS: Programı Çağır -->
<input type="submit" value="Programı Çağır" name="j_id_jsp_262293626_16"/>
<!-- IRRELEVANT HTML REMOVED -->
<!-- BELOW IS THE OUTPUT FORM FIELD WITH CHAR ş BAD! DISPLAYED AS: some text with Turkish characters öç???üÖÇ???Ü -->
<input type="text" value="some text with Turkish characters öç???üÖÇ???Ü" id="id_2" name="id_2"/>
<!-- BELOW IS THE INPUT FORM FIELD WITH CHAR ş BAD! -->
<input type="text" value="some text with Turkish characters öç???üÖÇ???Ü" id="id_6" name="id_6" />
回复headers看起来不错:
有什么问题吗?
编辑:类似的代码 posting 到 sample form 可以正常工作:
static readonly HttpClient httpClient = new HttpClient();
[TestMethod]
public void TestHttpClientForTurkish()
{
var data = new Dictionary<string, string>()
{
{"fname", "öçşığü" },
{"lname", "ÖÇŞİĞÜ" }
};
var content = new FormUrlEncodedContent(data);
var response = httpClient.PostAsync("https://www.w3schools.com/action_page.php", content).Result;
var responseInString = response.Content.ReadAsStringAsync().Result;
Assert.IsTrue(responseInString.Contains("öçşığü") && responseInString.Contains("ÖÇŞİĞÜ"));
}
试试下面的代码
public static async Task SendRequestAsync()
{
var data = new Dictionary<string, byte[]>();
var key1 = "fname";
var val1 = Encoding.Unicode.GetBytes("öçşığü");
data.Add(key1, val1);
var key2 = "lname";
var val2 = Encoding.Unicode.GetBytes("ÖÇŞİĞÜ");
data.Add(key2, val2);
MemoryStream fs = new MemoryStream();
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(fs, data);
var barr = fs.ToArray();
var client = new HttpClient
{
BaseAddress = new Uri("http://www.yourservicelocation.com")
};
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(
new MediaTypeWithQualityHeaderValue("application/bson"));
var byteArrayContent = new ByteArrayContent(barr);
byteArrayContent.Headers.ContentType = new MediaTypeHeaderValue("application/bson");
var result = await client.PostAsync(
"api/SomeData/Incoming", byteArrayContent);
result.EnsureSuccessStatusCode();
}
我的发现:
- FormUrlEncodedContent class 不支持编码参数(因此不能正常处理土耳其语字符),所以我不得不使用 StringContent
- 我不得不使用 HttpUtility.UrlEncode 对表单值进行编码(并使用 ISO-8859-9 作为编码)。
这是最终代码,表单字段中的土耳其语字符没有任何问题:
var httpClient = new HttpClient(); //static readonly in real code
var iso = Encoding.GetEncoding("ISO-8859-9");
var content = new StringContent("id_6="+
HttpUtility.UrlEncode("some text with Turkish characters öçşığüÖÇŞİĞÜ", iso), iso,
"application/x-www-form-urlencoded");
var response = httpClient.PostAsync(url, content).Result;//Using Result because I don't have a UI thread or the context is not ASP.NET
var responseInString = response.Content.ReadAsStringAsync().Result;
File.WriteAllText("c:\temp\a.htm", responseInString);