Content-Length 从 SslStream 读取响应时无法信任?
Content-Length can't be trusted when reading a response from SslStream?
在 .NET Core 2.2 上使用 TcpClient 和 NetworkStream。
尝试从 https://www.google.com/
获取内容
在继续之前,我想明确表示我不想使用 WebClient、HttpWebRequest 或 HttpClient 类。有很多问题是人们在使用 TcpClient 时遇到了一些问题,响应者或评论者建议使用其他东西来完成这项任务,所以请不要这样做。
假设我们有一个从 TcpClient 的 NetworkStream 获得并经过正确验证的 SslStream 实例。
假设还有一个 StreamWriter
用于将 HTTP 消息写入该流,还有一个 StreamReader
用于从响应中读取 HTTP 消息 header:
var tcpClient = new TcpClient("google.com", 443);
var stream = tcpClient.GetStream();
var sslStream = new SslStream(stream, false);
sslStream.AuthenticateAsClient("google.com");
var streamWriter = new StreamWriter(sslStream);
var streamReader = new StreamReader(sslStream);
假设我们发送请求的方式与 Firefox 浏览器发送请求的方式相同:
GET / HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0
这会导致发送以下响应:
HTTP/1.1 200 OK
Date: Sun, 28 Apr 2019 17:28:27 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=UTF-8
Strict-Transport-Security: max-age=31536000
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Content-Encoding: br
Server: gws
Content-Length: 55786
... etc
现在,在使用 streamReader.ReadLine()
读取所有响应 header 并解析响应 header 中找到的内容长度后,让我们将响应内容读入缓冲区:
var totalBytesRead = 0;
int bytesRead;
var buffer = new byte[contentLength];
do
{
bytesRead = sslStream.Read(buffer,
totalBytesRead,
contentLength - totalBytesRead);
totalBytesRead += bytesRead;
} while (totalBytesRead < contentLength && bytesRead > 0);
然而,这个 do..while
循环只会在远程服务器关闭连接后退出,这意味着对 Read
的最后一次调用将挂起。这意味着我们已经读取了整个响应内容,并且服务器已经在该流上侦听另一条 HTTP 消息。 contentLength
不正确吗?调用 ReadLine
时 streamReader
是否读取过多,因此是否弄乱了 SslStream
位置,从而导致读取无效数据?
什么给了?有没有人有这方面的经验?
P.S。这是一个示例控制台应用程序代码,省略了所有安全检查,证明了这一点:
private static void Main(string[] args)
{
using (var tcpClient = new TcpClient("google.com", 443))
{
var stream = tcpClient.GetStream();
using (var sslStream = new SslStream(stream, false))
{
sslStream.AuthenticateAsClient("google.com");
using (var streamReader = new StreamReader(sslStream))
using (var streamWriter = new StreamWriter(sslStream))
{
streamWriter.WriteLine("GET / HTTP/1.1");
streamWriter.WriteLine("Host: www.google.com");
streamWriter.WriteLine("User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0");
streamWriter.WriteLine("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
streamWriter.WriteLine("Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2");
streamWriter.WriteLine("Accept-Encoding: gzip, deflate, br");
streamWriter.WriteLine("Connection: keep-alive");
streamWriter.WriteLine("Upgrade-Insecure-Requests: 1");
streamWriter.WriteLine("Cache-Control: max-age=0");
streamWriter.WriteLine();
streamWriter.Flush();
var lines = new List<string>();
var line = streamReader.ReadLine();
var contentLength = 0;
while (!string.IsNullOrWhiteSpace(line))
{
var split = line.Split(": ");
if (split.First() == "Content-Length")
{
contentLength = int.Parse(split[1]);
}
lines.Add(line);
line = streamReader.ReadLine();
}
var totalBytesRead = 0;
int bytesRead;
var buffer = new byte[contentLength];
do
{
bytesRead = sslStream.Read(buffer,
totalBytesRead,
contentLength - totalBytesRead);
totalBytesRead += bytesRead;
Console.WriteLine(
$"Bytes read: {totalBytesRead} of {contentLength} (last chunk: {bytesRead} bytes)");
} while (totalBytesRead < contentLength && bytesRead > 0);
Console.WriteLine(
"--------------------");
}
}
}
Console.ReadLine();
}
编辑
这总是在我提交问题后发生。我已经摸不着头脑好几天了,一直没能找到问题的原因,但是一提交,我就知道是 StreamReader
在尝试时把事情搞砸了读一行。
因此,如果我停止使用 StreamReader
并将对 ReadLine
的调用替换为 byte-by-byte,一切似乎都很好。替换代码可以这样写:
private static IEnumerable<string> ReadHeader(Stream sslStream)
{
// One-byte buffer for reading bytes from the stream
var buffer = new byte[1];
// Initialize a four-character string to keep the last four bytes of the message
var check = new StringBuilder("....");
int bytes;
var responseBuilder = new StringBuilder();
do
{
// Read the next byte from the stream and write in into the buffer
bytes = sslStream.Read(buffer, 0, 1);
if (bytes == 0)
{
// If nothing was read, break the loop
break;
}
// Add the received byte to the response builder.
// We expect the header to be ASCII encoded so it's OK to just cast to char and append
responseBuilder.Append((char) buffer[0]);
// Always remove the first char from the string and append the latest received one
check.Remove(0, 1);
check.Append((char) buffer[0]);
// \r\n\r\n marks the end of the message header, so break here
if (check.ToString() == "\r\n\r\n")
{
break;
}
} while (bytes > 0);
var headerText = responseBuilder.ToString();
return headerText.Split("\r\n", StringSplitOptions.RemoveEmptyEntries);
}
...这将使我们的示例控制台应用程序看起来像这样:
private static void Main(string[] args)
{
using (var tcpClient = new TcpClient("google.com", 443))
{
var stream = tcpClient.GetStream();
using (var sslStream = new SslStream(stream, false))
{
sslStream.AuthenticateAsClient("google.com");
using (var streamWriter = new StreamWriter(sslStream))
{
streamWriter.WriteLine("GET / HTTP/1.1");
streamWriter.WriteLine("Host: www.google.com");
streamWriter.WriteLine("User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0");
streamWriter.WriteLine("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
streamWriter.WriteLine("Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2");
streamWriter.WriteLine("Accept-Encoding: gzip, deflate, br");
streamWriter.WriteLine("Connection: keep-alive");
streamWriter.WriteLine("Upgrade-Insecure-Requests: 1");
streamWriter.WriteLine("Cache-Control: max-age=0");
streamWriter.WriteLine();
streamWriter.Flush();
var lines = ReadHeader(sslStream);
var contentLengthLine = lines.First(x => x.StartsWith("Content-Length"));
var split = contentLengthLine.Split(": ");
var contentLength = int.Parse(split[1]);
var totalBytesRead = 0;
int bytesRead;
var buffer = new byte[contentLength];
do
{
bytesRead = sslStream.Read(buffer,
totalBytesRead,
contentLength - totalBytesRead);
totalBytesRead += bytesRead;
Console.WriteLine(
$"Bytes read: {totalBytesRead} of {contentLength} (last chunk: {bytesRead} bytes)");
} while (totalBytesRead < contentLength && bytesRead > 0);
Console.WriteLine(
"--------------------");
}
}
}
Console.ReadLine();
}
private static IEnumerable<string> ReadHeader(Stream sslStream)
{
// One-byte buffer for reading bytes from the stream
var buffer = new byte[1];
// Initialize a four-character string to keep the last four bytes of the message
var check = new StringBuilder("....");
int bytes;
var responseBuilder = new StringBuilder();
do
{
// Read the next byte from the stream and write in into the buffer
bytes = sslStream.Read(buffer, 0, 1);
if (bytes == 0)
{
// If nothing was read, break the loop
break;
}
// Add the received byte to the response builder.
// We expect the header to be ASCII encoded so it's OK to just cast to char and append
responseBuilder.Append((char)buffer[0]);
// Always remove the first char from the string and append the latest received one
check.Remove(0, 1);
check.Append((char)buffer[0]);
// \r\n\r\n marks the end of the message header, so break here
if (check.ToString() == "\r\n\r\n")
{
break;
}
} while (bytes > 0);
var headerText = responseBuilder.ToString();
return headerText.Split("\r\n", StringSplitOptions.RemoveEmptyEntries);
}
标题问题的答案是是。
它是可以信任的,只要您正确阅读消息header,即不要使用StreamReader.ReadLine
。
这里有一个实用方法可以完成这项工作:
private static string ReadStreamUntil(Stream stream, string boundary)
{
// One-byte buffer for reading bytes from the stream
var buffer = new byte[1];
// Initialize a string builder with some placeholder chars of the length as the boundary
var boundaryPlaceholder = string.Join(string.Empty, boundary.Select(x => "."));
var check = new StringBuilder(boundaryPlaceholder);
var responseBuilder = new StringBuilder();
do
{
// Read the next byte from the stream and write in into the buffer
var byteCount = stream.Read(buffer, 0, 1);
if (byteCount == 0)
{
// If nothing was read, break the loop
break;
}
// Add the received byte to the response builder.
responseBuilder.Append((char)buffer[0]);
// Always remove the first char from the string and append the latest received one
check.Remove(0, 1);
check.Append((char)buffer[0]);
// boundary marks the end of the message, so break here
} while (check.ToString() != boundary);
return responseBuilder.ToString();
}
然后,要读取header,我们只需调用ReadStreamUntil(sslStream, "\r\n\r\n")
。
这里的关键是逐字节读取流,直到遇到已知的字节序列(在本例中\r\n\r\n)。
使用此方法读取后,流将位于正确的位置,以便正确读取响应内容。
如果有任何好处,可以通过调用 await ReadAsync
而不是 Read
轻松地将此方法转换为异步变体。
值得注意的是,上述方法只有在文本为ASCII 编码时才有效。
在 .NET Core 2.2 上使用 TcpClient 和 NetworkStream。
尝试从 https://www.google.com/
在继续之前,我想明确表示我不想使用 WebClient、HttpWebRequest 或 HttpClient 类。有很多问题是人们在使用 TcpClient 时遇到了一些问题,响应者或评论者建议使用其他东西来完成这项任务,所以请不要这样做。
假设我们有一个从 TcpClient 的 NetworkStream 获得并经过正确验证的 SslStream 实例。
假设还有一个 StreamWriter
用于将 HTTP 消息写入该流,还有一个 StreamReader
用于从响应中读取 HTTP 消息 header:
var tcpClient = new TcpClient("google.com", 443);
var stream = tcpClient.GetStream();
var sslStream = new SslStream(stream, false);
sslStream.AuthenticateAsClient("google.com");
var streamWriter = new StreamWriter(sslStream);
var streamReader = new StreamReader(sslStream);
假设我们发送请求的方式与 Firefox 浏览器发送请求的方式相同:
GET / HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0
这会导致发送以下响应:
HTTP/1.1 200 OK
Date: Sun, 28 Apr 2019 17:28:27 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=UTF-8
Strict-Transport-Security: max-age=31536000
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Content-Encoding: br
Server: gws
Content-Length: 55786
... etc
现在,在使用 streamReader.ReadLine()
读取所有响应 header 并解析响应 header 中找到的内容长度后,让我们将响应内容读入缓冲区:
var totalBytesRead = 0;
int bytesRead;
var buffer = new byte[contentLength];
do
{
bytesRead = sslStream.Read(buffer,
totalBytesRead,
contentLength - totalBytesRead);
totalBytesRead += bytesRead;
} while (totalBytesRead < contentLength && bytesRead > 0);
然而,这个 do..while
循环只会在远程服务器关闭连接后退出,这意味着对 Read
的最后一次调用将挂起。这意味着我们已经读取了整个响应内容,并且服务器已经在该流上侦听另一条 HTTP 消息。 contentLength
不正确吗?调用 ReadLine
时 streamReader
是否读取过多,因此是否弄乱了 SslStream
位置,从而导致读取无效数据?
什么给了?有没有人有这方面的经验?
P.S。这是一个示例控制台应用程序代码,省略了所有安全检查,证明了这一点:
private static void Main(string[] args)
{
using (var tcpClient = new TcpClient("google.com", 443))
{
var stream = tcpClient.GetStream();
using (var sslStream = new SslStream(stream, false))
{
sslStream.AuthenticateAsClient("google.com");
using (var streamReader = new StreamReader(sslStream))
using (var streamWriter = new StreamWriter(sslStream))
{
streamWriter.WriteLine("GET / HTTP/1.1");
streamWriter.WriteLine("Host: www.google.com");
streamWriter.WriteLine("User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0");
streamWriter.WriteLine("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
streamWriter.WriteLine("Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2");
streamWriter.WriteLine("Accept-Encoding: gzip, deflate, br");
streamWriter.WriteLine("Connection: keep-alive");
streamWriter.WriteLine("Upgrade-Insecure-Requests: 1");
streamWriter.WriteLine("Cache-Control: max-age=0");
streamWriter.WriteLine();
streamWriter.Flush();
var lines = new List<string>();
var line = streamReader.ReadLine();
var contentLength = 0;
while (!string.IsNullOrWhiteSpace(line))
{
var split = line.Split(": ");
if (split.First() == "Content-Length")
{
contentLength = int.Parse(split[1]);
}
lines.Add(line);
line = streamReader.ReadLine();
}
var totalBytesRead = 0;
int bytesRead;
var buffer = new byte[contentLength];
do
{
bytesRead = sslStream.Read(buffer,
totalBytesRead,
contentLength - totalBytesRead);
totalBytesRead += bytesRead;
Console.WriteLine(
$"Bytes read: {totalBytesRead} of {contentLength} (last chunk: {bytesRead} bytes)");
} while (totalBytesRead < contentLength && bytesRead > 0);
Console.WriteLine(
"--------------------");
}
}
}
Console.ReadLine();
}
编辑
这总是在我提交问题后发生。我已经摸不着头脑好几天了,一直没能找到问题的原因,但是一提交,我就知道是 StreamReader
在尝试时把事情搞砸了读一行。
因此,如果我停止使用 StreamReader
并将对 ReadLine
的调用替换为 byte-by-byte,一切似乎都很好。替换代码可以这样写:
private static IEnumerable<string> ReadHeader(Stream sslStream)
{
// One-byte buffer for reading bytes from the stream
var buffer = new byte[1];
// Initialize a four-character string to keep the last four bytes of the message
var check = new StringBuilder("....");
int bytes;
var responseBuilder = new StringBuilder();
do
{
// Read the next byte from the stream and write in into the buffer
bytes = sslStream.Read(buffer, 0, 1);
if (bytes == 0)
{
// If nothing was read, break the loop
break;
}
// Add the received byte to the response builder.
// We expect the header to be ASCII encoded so it's OK to just cast to char and append
responseBuilder.Append((char) buffer[0]);
// Always remove the first char from the string and append the latest received one
check.Remove(0, 1);
check.Append((char) buffer[0]);
// \r\n\r\n marks the end of the message header, so break here
if (check.ToString() == "\r\n\r\n")
{
break;
}
} while (bytes > 0);
var headerText = responseBuilder.ToString();
return headerText.Split("\r\n", StringSplitOptions.RemoveEmptyEntries);
}
...这将使我们的示例控制台应用程序看起来像这样:
private static void Main(string[] args)
{
using (var tcpClient = new TcpClient("google.com", 443))
{
var stream = tcpClient.GetStream();
using (var sslStream = new SslStream(stream, false))
{
sslStream.AuthenticateAsClient("google.com");
using (var streamWriter = new StreamWriter(sslStream))
{
streamWriter.WriteLine("GET / HTTP/1.1");
streamWriter.WriteLine("Host: www.google.com");
streamWriter.WriteLine("User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0");
streamWriter.WriteLine("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
streamWriter.WriteLine("Accept-Language: sr,sr-RS;q=0.8,sr-CS;q=0.6,en-US;q=0.4,en;q=0.2");
streamWriter.WriteLine("Accept-Encoding: gzip, deflate, br");
streamWriter.WriteLine("Connection: keep-alive");
streamWriter.WriteLine("Upgrade-Insecure-Requests: 1");
streamWriter.WriteLine("Cache-Control: max-age=0");
streamWriter.WriteLine();
streamWriter.Flush();
var lines = ReadHeader(sslStream);
var contentLengthLine = lines.First(x => x.StartsWith("Content-Length"));
var split = contentLengthLine.Split(": ");
var contentLength = int.Parse(split[1]);
var totalBytesRead = 0;
int bytesRead;
var buffer = new byte[contentLength];
do
{
bytesRead = sslStream.Read(buffer,
totalBytesRead,
contentLength - totalBytesRead);
totalBytesRead += bytesRead;
Console.WriteLine(
$"Bytes read: {totalBytesRead} of {contentLength} (last chunk: {bytesRead} bytes)");
} while (totalBytesRead < contentLength && bytesRead > 0);
Console.WriteLine(
"--------------------");
}
}
}
Console.ReadLine();
}
private static IEnumerable<string> ReadHeader(Stream sslStream)
{
// One-byte buffer for reading bytes from the stream
var buffer = new byte[1];
// Initialize a four-character string to keep the last four bytes of the message
var check = new StringBuilder("....");
int bytes;
var responseBuilder = new StringBuilder();
do
{
// Read the next byte from the stream and write in into the buffer
bytes = sslStream.Read(buffer, 0, 1);
if (bytes == 0)
{
// If nothing was read, break the loop
break;
}
// Add the received byte to the response builder.
// We expect the header to be ASCII encoded so it's OK to just cast to char and append
responseBuilder.Append((char)buffer[0]);
// Always remove the first char from the string and append the latest received one
check.Remove(0, 1);
check.Append((char)buffer[0]);
// \r\n\r\n marks the end of the message header, so break here
if (check.ToString() == "\r\n\r\n")
{
break;
}
} while (bytes > 0);
var headerText = responseBuilder.ToString();
return headerText.Split("\r\n", StringSplitOptions.RemoveEmptyEntries);
}
标题问题的答案是是。
它是可以信任的,只要您正确阅读消息header,即不要使用StreamReader.ReadLine
。
这里有一个实用方法可以完成这项工作:
private static string ReadStreamUntil(Stream stream, string boundary)
{
// One-byte buffer for reading bytes from the stream
var buffer = new byte[1];
// Initialize a string builder with some placeholder chars of the length as the boundary
var boundaryPlaceholder = string.Join(string.Empty, boundary.Select(x => "."));
var check = new StringBuilder(boundaryPlaceholder);
var responseBuilder = new StringBuilder();
do
{
// Read the next byte from the stream and write in into the buffer
var byteCount = stream.Read(buffer, 0, 1);
if (byteCount == 0)
{
// If nothing was read, break the loop
break;
}
// Add the received byte to the response builder.
responseBuilder.Append((char)buffer[0]);
// Always remove the first char from the string and append the latest received one
check.Remove(0, 1);
check.Append((char)buffer[0]);
// boundary marks the end of the message, so break here
} while (check.ToString() != boundary);
return responseBuilder.ToString();
}
然后,要读取header,我们只需调用ReadStreamUntil(sslStream, "\r\n\r\n")
。
这里的关键是逐字节读取流,直到遇到已知的字节序列(在本例中\r\n\r\n)。
使用此方法读取后,流将位于正确的位置,以便正确读取响应内容。
如果有任何好处,可以通过调用 await ReadAsync
而不是 Read
轻松地将此方法转换为异步变体。
值得注意的是,上述方法只有在文本为ASCII 编码时才有效。