为什么我的 TCP 客户端在使用许多连接进行压力测试时无法连接到我的服务器并出现套接字异常
Why do my TCP clients fail to connect to my server with a socket exception when stress-testing it with many connections
在对我的大量连接的TCP服务器进行压力测试时,我发现连接请求会在一段时间后抛出SocketException。异常随机显示
Only one usage of each socket address (protocol/network address/port) is normally permitted.
或
No connection could be made because the target machine actively refused it.
作为其信息。
这通常会在几秒钟后随机发生,并且会在数万次连接和断开连接后发生。要连接,我使用本地端点 IPEndPoint clientEndPoint = new(IPAddress.Any, 0);
,我相信这会给我下一个免费的临时端口。
为了隔离问题,我编写了这个简单的程序,运行它既是一个 TCP 服务器,又是一个简单计数器的许多并行客户端:
using System.Diagnostics;
using System.Net;
using System.Net.Sockets;
CancellationTokenSource cancellationTokenSource = new();
CancellationToken cancellationToken = cancellationTokenSource.Token;
const int serverPort = 65000;
const int counterRequestMessage = -1;
const int randomCounterResponseMinDelay = 10; //ms
const int randomCounterResponseMaxDelay = 1000; //ms
const int maxParallelCounterRequests = 10000;
#region server
int counterValue = 0;
async void RunCounterServer()
{
TcpListener listener = new(IPAddress.Any, serverPort);
listener.Start(maxParallelCounterRequests);
while (!cancellationToken.IsCancellationRequested)
{
HandleCounterRequester(await listener.AcceptTcpClientAsync(cancellationToken));
}
listener.Stop();
}
async void HandleCounterRequester(TcpClient client)
{
await using NetworkStream stream = client.GetStream();
Memory<byte> memory = new byte[sizeof(int)];
//read requestMessage
await stream.ReadAsync(memory, cancellationToken);
int requestMessage = BitConverter.ToInt32(memory.Span);
Debug.Assert(requestMessage == counterRequestMessage);
//increment counter
int updatedCounterValue = Interlocked.Add(ref counterValue, 1);
Debug.Assert(BitConverter.TryWriteBytes(memory.Span, updatedCounterValue));
//wait random timeout
await Task.Delay(GetRandomCounterResponseDelay());
//write back response
await stream.WriteAsync(memory, cancellationToken);
client.Close();
client.Dispose();
}
int GetRandomCounterResponseDelay()
{
return Random.Shared.Next(randomCounterResponseMinDelay, randomCounterResponseMaxDelay);
}
RunCounterServer();
#endregion
IPEndPoint clientEndPoint = new(IPAddress.Any, 0);
IPEndPoint serverEndPoint = new(IPAddress.Parse("127.0.0.1"), serverPort);
ReaderWriterLockSlim isExceptionEncounteredLock = new(LockRecursionPolicy.NoRecursion);
bool isExceptionEncountered = false;
async Task RunCounterClient()
{
try
{
int counterResponse;
using (TcpClient client = new(clientEndPoint))
{
await client.ConnectAsync(serverEndPoint, cancellationToken);
await using (NetworkStream stream = client.GetStream())
{
Memory<byte> memory = new byte[sizeof(int)];
//send counter request
Debug.Assert(BitConverter.TryWriteBytes(memory.Span, counterRequestMessage));
await stream.WriteAsync(memory, cancellationToken);
//read counter response
await stream.ReadAsync(memory, cancellationToken);
counterResponse = BitConverter.ToInt32(memory.Span);
}
client.Close();
}
isExceptionEncounteredLock.EnterReadLock();
//log response if there was no exception encountered so far
if (!isExceptionEncountered)
{
Console.WriteLine(counterResponse);
}
isExceptionEncounteredLock.ExitReadLock();
}
catch (SocketException exception)
{
bool isFirstEncounteredException = false;
isExceptionEncounteredLock.EnterWriteLock();
//log exception and note that one was encountered if it is the first one
if (!isExceptionEncountered)
{
Console.WriteLine(exception.Message);
isExceptionEncountered = true;
isFirstEncounteredException = true;
}
isExceptionEncounteredLock.ExitWriteLock();
//if this is the first exception encountered, rethrow it
if (isFirstEncounteredException)
{
throw;
}
}
}
async void RunParallelCounterClients()
{
SemaphoreSlim clientSlotCount = new(maxParallelCounterRequests, maxParallelCounterRequests);
async void RunCounterClientAndReleaseSlot()
{
await RunCounterClient();
clientSlotCount.Release();
}
while (!cancellationToken.IsCancellationRequested)
{
await clientSlotCount.WaitAsync(cancellationToken);
RunCounterClientAndReleaseSlot();
}
}
RunParallelCounterClients();
while (true)
{
ConsoleKeyInfo keyInfo = Console.ReadKey(true);
if (keyInfo.Key == ConsoleKey.Escape)
{
cancellationTokenSource.Cancel();
break;
}
}
我最初的猜测是,我 运行 离开了临时端口,因为我不知何故没有正确释放它们。当请求完成时,我在我的客户端和服务器代码中只 Close()
和 Dispose()
我的 TcpClient
s。我以为这会自动释放端口,但是当我在控制台中使用 netstat -ab
时,它会给我无数这样的条目,即使在关闭应用程序之后也是如此:
TCP 127.0.0.1:65000 kubernetes:59996 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:59997 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:59998 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:59999 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60000 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60001 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60002 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60003 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60004 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60005 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60006 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60007 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60009 TIME_WAIT
此外,我的 PC 有时在退出应用程序后有时会卡顿很多。我认为这是由于 Windows 清理了我泄露的端口使用?
所以我想知道,我在这里做错了什么?
Only one usage of each socket address (protocol/network address/port) is normally permitted. ... My initial guess is, that I run out of ephemeral ports because I somehow do not free them correctly.
TIME_WAIT 是一个完全正常的状态,每个 TCP 连接在连接被主动关闭时都会进入,即在发送数据后显式调用关闭或在退出应用程序时隐式关闭。请参阅此图(来源 https://en.wikipedia.org/wiki/File:Tcp_state_diagram_fixed.svg):
离开TIME_WAIT状态进入CLOSED需要一段时间。只要连接在 TIME_OUT 源 ip、端口和目标 ip 的特定组合,端口就不能用于新连接。这有效地限制了在一段时间内从一个特定 IP 地址到另一个特定 IP 地址的可能连接数。请注意,典型的服务器不会 运行 进入这样的限制,因为它们从不同的系统获得许多连接,而从每个源 IP 获得的连接只有几个。
除了不主动关闭连接之外,对此我们无能为力。如果另一方首先触发连接(发送 FIN)并继续关闭(确认 FIN 并发送自己的 FIN),则不会发生 TIME_WAIT。当然,在您的单个客户端和单个服务器的特定场景中,这只会将问题转移到服务器上。
No connection could be made because the target machine actively refused it.
这还有一个原因。服务器在套接字上执行 listen
并给出预期的积压大小(OS 可能不会完全使用此值)。此积压用于在 OS 内核中接受新的 TCP 连接,服务器将调用 accept
以获取这些已接受的 TCP 连接。如果服务器调用 accept
的频率低于建立新连接的频率,积压将填满。一旦积压已满,服务器将拒绝新连接,从而导致您看到的错误。换句话说:如果服务器跟不上客户端,就会发生这种情况。
在对我的大量连接的TCP服务器进行压力测试时,我发现连接请求会在一段时间后抛出SocketException。异常随机显示
Only one usage of each socket address (protocol/network address/port) is normally permitted.
或
No connection could be made because the target machine actively refused it.
作为其信息。
这通常会在几秒钟后随机发生,并且会在数万次连接和断开连接后发生。要连接,我使用本地端点 IPEndPoint clientEndPoint = new(IPAddress.Any, 0);
,我相信这会给我下一个免费的临时端口。
为了隔离问题,我编写了这个简单的程序,运行它既是一个 TCP 服务器,又是一个简单计数器的许多并行客户端:
using System.Diagnostics;
using System.Net;
using System.Net.Sockets;
CancellationTokenSource cancellationTokenSource = new();
CancellationToken cancellationToken = cancellationTokenSource.Token;
const int serverPort = 65000;
const int counterRequestMessage = -1;
const int randomCounterResponseMinDelay = 10; //ms
const int randomCounterResponseMaxDelay = 1000; //ms
const int maxParallelCounterRequests = 10000;
#region server
int counterValue = 0;
async void RunCounterServer()
{
TcpListener listener = new(IPAddress.Any, serverPort);
listener.Start(maxParallelCounterRequests);
while (!cancellationToken.IsCancellationRequested)
{
HandleCounterRequester(await listener.AcceptTcpClientAsync(cancellationToken));
}
listener.Stop();
}
async void HandleCounterRequester(TcpClient client)
{
await using NetworkStream stream = client.GetStream();
Memory<byte> memory = new byte[sizeof(int)];
//read requestMessage
await stream.ReadAsync(memory, cancellationToken);
int requestMessage = BitConverter.ToInt32(memory.Span);
Debug.Assert(requestMessage == counterRequestMessage);
//increment counter
int updatedCounterValue = Interlocked.Add(ref counterValue, 1);
Debug.Assert(BitConverter.TryWriteBytes(memory.Span, updatedCounterValue));
//wait random timeout
await Task.Delay(GetRandomCounterResponseDelay());
//write back response
await stream.WriteAsync(memory, cancellationToken);
client.Close();
client.Dispose();
}
int GetRandomCounterResponseDelay()
{
return Random.Shared.Next(randomCounterResponseMinDelay, randomCounterResponseMaxDelay);
}
RunCounterServer();
#endregion
IPEndPoint clientEndPoint = new(IPAddress.Any, 0);
IPEndPoint serverEndPoint = new(IPAddress.Parse("127.0.0.1"), serverPort);
ReaderWriterLockSlim isExceptionEncounteredLock = new(LockRecursionPolicy.NoRecursion);
bool isExceptionEncountered = false;
async Task RunCounterClient()
{
try
{
int counterResponse;
using (TcpClient client = new(clientEndPoint))
{
await client.ConnectAsync(serverEndPoint, cancellationToken);
await using (NetworkStream stream = client.GetStream())
{
Memory<byte> memory = new byte[sizeof(int)];
//send counter request
Debug.Assert(BitConverter.TryWriteBytes(memory.Span, counterRequestMessage));
await stream.WriteAsync(memory, cancellationToken);
//read counter response
await stream.ReadAsync(memory, cancellationToken);
counterResponse = BitConverter.ToInt32(memory.Span);
}
client.Close();
}
isExceptionEncounteredLock.EnterReadLock();
//log response if there was no exception encountered so far
if (!isExceptionEncountered)
{
Console.WriteLine(counterResponse);
}
isExceptionEncounteredLock.ExitReadLock();
}
catch (SocketException exception)
{
bool isFirstEncounteredException = false;
isExceptionEncounteredLock.EnterWriteLock();
//log exception and note that one was encountered if it is the first one
if (!isExceptionEncountered)
{
Console.WriteLine(exception.Message);
isExceptionEncountered = true;
isFirstEncounteredException = true;
}
isExceptionEncounteredLock.ExitWriteLock();
//if this is the first exception encountered, rethrow it
if (isFirstEncounteredException)
{
throw;
}
}
}
async void RunParallelCounterClients()
{
SemaphoreSlim clientSlotCount = new(maxParallelCounterRequests, maxParallelCounterRequests);
async void RunCounterClientAndReleaseSlot()
{
await RunCounterClient();
clientSlotCount.Release();
}
while (!cancellationToken.IsCancellationRequested)
{
await clientSlotCount.WaitAsync(cancellationToken);
RunCounterClientAndReleaseSlot();
}
}
RunParallelCounterClients();
while (true)
{
ConsoleKeyInfo keyInfo = Console.ReadKey(true);
if (keyInfo.Key == ConsoleKey.Escape)
{
cancellationTokenSource.Cancel();
break;
}
}
我最初的猜测是,我 运行 离开了临时端口,因为我不知何故没有正确释放它们。当请求完成时,我在我的客户端和服务器代码中只 Close()
和 Dispose()
我的 TcpClient
s。我以为这会自动释放端口,但是当我在控制台中使用 netstat -ab
时,它会给我无数这样的条目,即使在关闭应用程序之后也是如此:
TCP 127.0.0.1:65000 kubernetes:59996 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:59997 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:59998 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:59999 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60000 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60001 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60002 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60003 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60004 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60005 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60006 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60007 TIME_WAIT
TCP 127.0.0.1:65000 kubernetes:60009 TIME_WAIT
此外,我的 PC 有时在退出应用程序后有时会卡顿很多。我认为这是由于 Windows 清理了我泄露的端口使用?
所以我想知道,我在这里做错了什么?
Only one usage of each socket address (protocol/network address/port) is normally permitted. ... My initial guess is, that I run out of ephemeral ports because I somehow do not free them correctly.
TIME_WAIT 是一个完全正常的状态,每个 TCP 连接在连接被主动关闭时都会进入,即在发送数据后显式调用关闭或在退出应用程序时隐式关闭。请参阅此图(来源 https://en.wikipedia.org/wiki/File:Tcp_state_diagram_fixed.svg):
离开TIME_WAIT状态进入CLOSED需要一段时间。只要连接在 TIME_OUT 源 ip、端口和目标 ip 的特定组合,端口就不能用于新连接。这有效地限制了在一段时间内从一个特定 IP 地址到另一个特定 IP 地址的可能连接数。请注意,典型的服务器不会 运行 进入这样的限制,因为它们从不同的系统获得许多连接,而从每个源 IP 获得的连接只有几个。
除了不主动关闭连接之外,对此我们无能为力。如果另一方首先触发连接(发送 FIN)并继续关闭(确认 FIN 并发送自己的 FIN),则不会发生 TIME_WAIT。当然,在您的单个客户端和单个服务器的特定场景中,这只会将问题转移到服务器上。
No connection could be made because the target machine actively refused it.
这还有一个原因。服务器在套接字上执行 listen
并给出预期的积压大小(OS 可能不会完全使用此值)。此积压用于在 OS 内核中接受新的 TCP 连接,服务器将调用 accept
以获取这些已接受的 TCP 连接。如果服务器调用 accept
的频率低于建立新连接的频率,积压将填满。一旦积压已满,服务器将拒绝新连接,从而导致您看到的错误。换句话说:如果服务器跟不上客户端,就会发生这种情况。