如何修复 "Tried to associate with unreachable remote address [akka.tcp://actorsystem@address:port]" 错误?
How to fix "Tried to associate with unreachable remote address [akka.tcp://actorsystem@address:port]" error?
我在从这个 example 获得的 kubernetes 上部署了 3 个灯塔 pods 和 3 个爬虫 pods。
现在集群看起来像这样:
akka.tcp://webcrawler@crawler-1.crawler:5213 | [crawler] | up |
akka.tcp://webcrawler@crawler-2.crawler:5213 | [crawler] | up |
akka.tcp://webcrawler@lighthouse-0.lighthouse:4053 | [lighthouse] | up |
akka.tcp://webcrawler@lighthouse-1.lighthouse:4053 | [lighthouse] | up |
akka.tcp://webcrawler@lighthouse-2.lighthouse:4053 | [lighthouse] | up |
如您所见,没有 crawler-0.crawler 节点。让我们看看节点的日志。
[WARNING][05/26/2020 10:07:24][Thread 0011][[akka://webcrawler/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fwebcrawler%40lighthouse-1.lighthouse%3A4053-940/endpointWriter#501112873]] AssociationError [akka.tcp://webcrawler@crawler-0.crawler:5213] -> akka.tcp://webcrawler@lighthouse-1.lighthouse:4053: Error [Association failed with akka.tcp://webcrawler@lighthouse-1.lighthouse:4053] []
[WARNING][05/26/2020 10:07:24][Thread 0009][[akka://webcrawler/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fwebcrawler%40lighthouse-2.lighthouse%3A4053-941/endpointWriter#592338082]] AssociationError [akka.tcp://webcrawler@crawler-0.crawler:5213] -> akka.tcp://webcrawler@lighthouse-2.lighthouse:4053: Error [Association failed with akka.tcp://webcrawler@lighthouse-2.lighthouse:4053] []
[WARNING][05/26/2020 10:07:24][Thread 0008][remoting] Tried to associate with unreachable remote address [akka.tcp://webcrawler@lighthouse-1.lighthouse:4053]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [Association failed with akka.tcp://webcrawler@lighthouse-1.lighthouse:4053] Caused by: [System.AggregateException: One or more errors occurred. (No such device or address) ---> System.Net.Internals.SocketExceptionFactory+ExtendedSocketException: No such device or address
at System.Net.Dns.InternalGetHostByName(String hostName)
at System.Net.Dns.ResolveCallback(Object context)
--- End of stack trace from previous location where exception was thrown ---
at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
at System.Net.Dns.EndGetHostEntry(IAsyncResult asyncResult)
at System.Net.Dns.<>c.b__27_1(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
at Akka.Remote.Transport.DotNetty.DotNettyTransport.ResolveNameAsync(DnsEndPoint address, AddressFamily addressFamily)
at Akka.Remote.Transport.DotNetty.DotNettyTransport.DnsToIPEndpoint(DnsEndPoint dns)
at Akka.Remote.Transport.DotNetty.TcpTransport.MapEndpointAsync(EndPoint socketAddress)
at Akka.Remote.Transport.DotNetty.TcpTransport.AssociateInternal(Address remoteAddress)
at Akka.Remote.Transport.DotNetty.DotNettyTransport.Associate(Address remoteAddress)
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at Akka.Remote.Transport.ProtocolStateActor.<>c.b__11_54(Task`1 result)
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
---> (Inner Exception #0) System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (00000005, 6): No such device or address
at System.Net.Dns.InternalGetHostByName(String hostName)
at System.Net.Dns.ResolveCallback(Object context)
--- End of stack trace from previous location where exception was thrown ---
at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
at System.Net.Dns.EndGetHostEntry(IAsyncResult asyncResult)
at System.Net.Dns.<>c.b__27_1(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
at Akka.Remote.Transport.DotNetty.DotNettyTransport.ResolveNameAsync(DnsEndPoint address, AddressFamily addressFamily)
at Akka.Remote.Transport.DotNetty.DotNettyTransport.DnsToIPEndpoint(DnsEndPoint dns)
at Akka.Remote.Transport.DotNetty.TcpTransport.MapEndpointAsync(EndPoint socketAddress)
at Akka.Remote.Transport.DotNetty.TcpTransport.AssociateInternal(Address remoteAddress)
at Akka.Remote.Transport.DotNetty.DotNettyTransport.Associate(Address remoteAddress)<---
]
虽然这个节点正在发送此类异常,但其他 2 个爬虫保持冷静,似乎什么都不做。
这些是我用来部署服务的 2 个 yaml:
apiVersion: v1
kind: Service
metadata:
name: crawler
labels:
app: crawler
spec:
clusterIP: None
ports:
- port: 5213
selector:
app: crawler
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: crawler
labels:
app: crawler
spec:
serviceName: crawler
replicas: 3
selector:
matchLabels:
app: crawler
template:
metadata:
labels:
app: crawler
spec:
terminationGracePeriodSeconds: 35
containers:
- name: crawler
image: myregistry.ru:443/crawler:3
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "pbm 127.0.0.1:9110 cluster leave"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_IP
value: "$(POD_NAME).crawler"
- name: CLUSTER_SEEDS
value: akka.tcp://webcrawler@lighthouse-0.lighthouse:4053,akka.tcp://webcrawler@lighthouse-1.lighthouse:4053,akka.tcp://webcrawler@lighthouse-2.lighthouse:4053
livenessProbe:
tcpSocket:
port: 5213
ports:
- containerPort: 5213
protocol: TCP
apiVersion: v1
kind: Service
metadata:
name: lighthouse
labels:
app: lighthouse
spec:
clusterIP: None
ports:
- port: 4053
selector:
app: lighthouse
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: lighthouse
labels:
app: lighthouse
spec:
serviceName: lighthouse
replicas: 3
selector:
matchLabels:
app: lighthouse
template:
metadata:
labels:
app: lighthouse
spec:
terminationGracePeriodSeconds: 35
containers:
- name: lighthouse
image: myregistry.ru:443/lighthouse:1
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "pbm 127.0.0.1:9110 cluster leave"]
env:
- name: ACTORSYSTEM
value: webcrawler
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_IP
value: "$(POD_NAME).lighthouse"
- name: CLUSTER_SEEDS
value: akka.tcp://webcrawler@lighthouse-0.lighthouse:4053,akka.tcp://webcrawler@lighthouse-1.lighthouse:4053,akka.tcp://webcrawler@lighthouse-2.lighthouse:4053
livenessProbe:
tcpSocket:
port: 4053
ports:
- containerPort: 4053
protocol: TCP
我假设,如果上述错误得到修复,一切都应该正常。有什么解决办法吗?
好的。我设法修好了。其中一个 kuber 节点无法解析 DNS 名称。简单重启节点即可解决问题。
我在从这个 example 获得的 kubernetes 上部署了 3 个灯塔 pods 和 3 个爬虫 pods。 现在集群看起来像这样:
akka.tcp://webcrawler@crawler-1.crawler:5213 | [crawler] | up |
akka.tcp://webcrawler@crawler-2.crawler:5213 | [crawler] | up |
akka.tcp://webcrawler@lighthouse-0.lighthouse:4053 | [lighthouse] | up |
akka.tcp://webcrawler@lighthouse-1.lighthouse:4053 | [lighthouse] | up |
akka.tcp://webcrawler@lighthouse-2.lighthouse:4053 | [lighthouse] | up |
如您所见,没有 crawler-0.crawler 节点。让我们看看节点的日志。
[WARNING][05/26/2020 10:07:24][Thread 0011][[akka://webcrawler/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fwebcrawler%40lighthouse-1.lighthouse%3A4053-940/endpointWriter#501112873]] AssociationError [akka.tcp://webcrawler@crawler-0.crawler:5213] -> akka.tcp://webcrawler@lighthouse-1.lighthouse:4053: Error [Association failed with akka.tcp://webcrawler@lighthouse-1.lighthouse:4053] []
[WARNING][05/26/2020 10:07:24][Thread 0009][[akka://webcrawler/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fwebcrawler%40lighthouse-2.lighthouse%3A4053-941/endpointWriter#592338082]] AssociationError [akka.tcp://webcrawler@crawler-0.crawler:5213] -> akka.tcp://webcrawler@lighthouse-2.lighthouse:4053: Error [Association failed with akka.tcp://webcrawler@lighthouse-2.lighthouse:4053] []
[WARNING][05/26/2020 10:07:24][Thread 0008][remoting] Tried to associate with unreachable remote address [akka.tcp://webcrawler@lighthouse-1.lighthouse:4053]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [Association failed with akka.tcp://webcrawler@lighthouse-1.lighthouse:4053] Caused by: [System.AggregateException: One or more errors occurred. (No such device or address) ---> System.Net.Internals.SocketExceptionFactory+ExtendedSocketException: No such device or address
at System.Net.Dns.InternalGetHostByName(String hostName)
at System.Net.Dns.ResolveCallback(Object context)
--- End of stack trace from previous location where exception was thrown ---
at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
at System.Net.Dns.EndGetHostEntry(IAsyncResult asyncResult)
at System.Net.Dns.<>c.b__27_1(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
at Akka.Remote.Transport.DotNetty.DotNettyTransport.ResolveNameAsync(DnsEndPoint address, AddressFamily addressFamily)
at Akka.Remote.Transport.DotNetty.DotNettyTransport.DnsToIPEndpoint(DnsEndPoint dns)
at Akka.Remote.Transport.DotNetty.TcpTransport.MapEndpointAsync(EndPoint socketAddress)
at Akka.Remote.Transport.DotNetty.TcpTransport.AssociateInternal(Address remoteAddress)
at Akka.Remote.Transport.DotNetty.DotNettyTransport.Associate(Address remoteAddress)
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at Akka.Remote.Transport.ProtocolStateActor.<>c.b__11_54(Task`1 result)
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
---> (Inner Exception #0) System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (00000005, 6): No such device or address
at System.Net.Dns.InternalGetHostByName(String hostName)
at System.Net.Dns.ResolveCallback(Object context)
--- End of stack trace from previous location where exception was thrown ---
at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
at System.Net.Dns.EndGetHostEntry(IAsyncResult asyncResult)
at System.Net.Dns.<>c.b__27_1(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
at Akka.Remote.Transport.DotNetty.DotNettyTransport.ResolveNameAsync(DnsEndPoint address, AddressFamily addressFamily)
at Akka.Remote.Transport.DotNetty.DotNettyTransport.DnsToIPEndpoint(DnsEndPoint dns)
at Akka.Remote.Transport.DotNetty.TcpTransport.MapEndpointAsync(EndPoint socketAddress)
at Akka.Remote.Transport.DotNetty.TcpTransport.AssociateInternal(Address remoteAddress)
at Akka.Remote.Transport.DotNetty.DotNettyTransport.Associate(Address remoteAddress)<---
]
虽然这个节点正在发送此类异常,但其他 2 个爬虫保持冷静,似乎什么都不做。
这些是我用来部署服务的 2 个 yaml:
apiVersion: v1
kind: Service
metadata:
name: crawler
labels:
app: crawler
spec:
clusterIP: None
ports:
- port: 5213
selector:
app: crawler
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: crawler
labels:
app: crawler
spec:
serviceName: crawler
replicas: 3
selector:
matchLabels:
app: crawler
template:
metadata:
labels:
app: crawler
spec:
terminationGracePeriodSeconds: 35
containers:
- name: crawler
image: myregistry.ru:443/crawler:3
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "pbm 127.0.0.1:9110 cluster leave"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_IP
value: "$(POD_NAME).crawler"
- name: CLUSTER_SEEDS
value: akka.tcp://webcrawler@lighthouse-0.lighthouse:4053,akka.tcp://webcrawler@lighthouse-1.lighthouse:4053,akka.tcp://webcrawler@lighthouse-2.lighthouse:4053
livenessProbe:
tcpSocket:
port: 5213
ports:
- containerPort: 5213
protocol: TCP
apiVersion: v1
kind: Service
metadata:
name: lighthouse
labels:
app: lighthouse
spec:
clusterIP: None
ports:
- port: 4053
selector:
app: lighthouse
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: lighthouse
labels:
app: lighthouse
spec:
serviceName: lighthouse
replicas: 3
selector:
matchLabels:
app: lighthouse
template:
metadata:
labels:
app: lighthouse
spec:
terminationGracePeriodSeconds: 35
containers:
- name: lighthouse
image: myregistry.ru:443/lighthouse:1
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "pbm 127.0.0.1:9110 cluster leave"]
env:
- name: ACTORSYSTEM
value: webcrawler
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_IP
value: "$(POD_NAME).lighthouse"
- name: CLUSTER_SEEDS
value: akka.tcp://webcrawler@lighthouse-0.lighthouse:4053,akka.tcp://webcrawler@lighthouse-1.lighthouse:4053,akka.tcp://webcrawler@lighthouse-2.lighthouse:4053
livenessProbe:
tcpSocket:
port: 4053
ports:
- containerPort: 4053
protocol: TCP
我假设,如果上述错误得到修复,一切都应该正常。有什么解决办法吗?
好的。我设法修好了。其中一个 kuber 节点无法解析 DNS 名称。简单重启节点即可解决问题。