独立 Service Fabric - AWS - FileStoreService - Copy-ServiceFabricApplicationPackage 失败
Standalone Service Fabric - AWS - FileStoreService - Copy-ServiceFabricApplicationPackage Fails
我在 AWS 中有一个 3 节点独立 windows 服务结构设置。 TestConfiguration 和 CreateCluster 脚本 运行 成功,但是在尝试将任何应用程序部署到集群时,我从 powershell 收到以下错误。
Copy-ServiceFabricApplicationPackage -ApplicationPackagePath .\pkg\<packagename> -ImageStoreConnectionString fabric:ImageStore
Copy-ServiceFabricApplicationPackage : An error occurred during this operation. Please check the trace logs for more
details.
At line:1 char:1
+ Copy-ServiceFabricApplicationPackage -ApplicationPackagePath .\pkg\ ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Copy-ServiceFabricApplicationPackage], FabricException
+ FullyQualifiedErrorId : CopyApplicationPackageErrorId,Microsoft.ServiceFabric.Powershell.CopyApplicationPackage
不确定哪些跟踪日志可用于诊断错误,但是检查其中一个节点上的 windows 事件日志我看到以下错误,全部针对 FileStoreService。
ImpersonateAndCopyFile for SourcePath:\<ipaddress>\StoreShare_Node31601795137630192.0.232.9494_01601794828730764_8589934592_1.ClusterManifest.xml, DestinationPath:C:\ProgramData\SF\Node1\Fabric\work\Applications\__FabricSystem_App4294967295\work\Store1601795317314061.0.232.9494_01601794828730764_8589934592_1.ClusterManifest.xml failed: 0x8007052e. Have tried all access tokens.
CopyFile: SourcePath:\<ip address>\StoreShare_Node31601795137630192.0.232.9494_01601794828730764_8589934592_1.ClusterManifest.xml, DestinationPath:C:\ProgramData\SF\Node1\Fabric\work\Applications\__FabricSystem_App4294967295\work\Store1601795317314061.0.232.9494_01601794828730764_8589934592_1.ClusterManifest.xml, Error:0x8007052e, ElapsedTime:80
CopyFile: no new token is found. current token count: 2
知道这可能是什么吗?我重新创建了一个没有安全性的新集群,防火墙在 AWS 和节点机器上都打开了所有端口(试图删除所有可能阻止复制的东西)。在 AWS 中,我使用的是 SimpleAD,因此所有节点都是 运行 同一个 AD 管理员,并且可以通过通信来创建集群。
下面是我正在使用的集群配置,尽可能简单地尝试限制问题的原因。
任何有关诊断复制文件问题的帮助,甚至是指点我相关的跟踪日志都会很棒。
此外,我注意到 ImageStoreService 在 Service Fabric Explorer 中显示警告
Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=false.
Partition reconfiguration is taking longer than expected.
ImageStoreService 3 3 00000000-0000-0000-0000-000000003000
P/P Ready Node3 131601795137630192
S/S InBuild Node1 131601795317314061
S/S InBuild Node2 131601795317314062
(Showing 3 out of 3 replicas. Total available replicas: 1)
编辑
附加信息
在进一步调查问题时,我 运行 带有 -Debug 标志的 Copy-ServiceFabricApplicationPackage 现在出现以下错误,提示使用用户名或密码将包从我的计算机上传到集群,或者集群将节点分发到节点是不正确的。我假设节点到节点使用它创建的以 fffff 结尾的本地帐户,我不知道为什么它会创建无效的用户凭据。如果它在上传包的计算机和集群之间,那么目前我 运行 没有打开安全性,所以不知道为什么这会是一个问题?非常感谢任何帮助。
Copy-ServiceFabricApplicationPackage -ApplicationPackagePath ..\pkg\Release -ImageStoreConnectionString fabric:imagestore -Debug
VERBOSE: System.Fabric.FabricException: An error occurred during this operation. Please check the trace logs for more details. ---> System.Runtime.InteropServices.COMException: The user name or password is incorrect. (Exception from HRESULT: 0x8007052E)
谢谢
{
"name": "SampleCluster",
"clusterConfigurationVersion": "1.0.0",
"apiVersion": "08-2017",
"nodes": [
{
"nodeName": "Node1",
"iPAddress": "<node 1 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/0",
"upgradeDomain": "UD0"
},
{
"nodeName": "Node2",
"iPAddress": "<node 2 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/1",
"upgradeDomain": "UD1"
},
{
"nodeName": "Node3",
"iPAddress": "<node 3 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/2",
"upgradeDomain": "UD2"
}
],
"properties": {
"diagnosticsStore": {
"metadata": "Please replace the diagnostics store with an actual file share accessible from all cluster machines.",
"dataDeletionAgeInDays": "7",
"storeType": "FileShare",
"IsEncrypted": "false",
"connectionstring": "c:\ProgramData\SF\DiagnosticsStore"
},
"nodeTypes": [
{
"name": "StandardNodeType",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpointPort": "19001",
"leaseDriverEndpointPort": "19002",
"serviceConnectionEndpointPort": "19003",
"httpGatewayEndpointPort": "19080",
"reverseProxyEndpointPort": "19081",
"applicationPorts": {
"startPort": "20000",
"endPort": "30000"
},
"ephemeralPorts": {
"startPort": "49152",
"endPort": "65534"
},
"isPrimary": true
}
],
"fabricSettings": [
{
"name": "Setup",
"parameters": [
{
"name": "FabricDataRoot",
"value": "C:\ProgramData\SF"
},
{
"name": "FabricLogRoot",
"value": "C:\ProgramData\SF\Log"
}
]
}
],
"addOnFeatures": [
"DnsService",
"RepairManager"
]
}
}
经过更多调查,我发现这是由于 windows 框上没有正确启用文件共享。尽管在网络适配器的属性中显示为已启用。我没有意识到需要在高级共享中心选项下启用设置(控制 Panel\Network 和 Internet\Network 和共享 Center\Advanced 共享设置)。
我在 AWS 中有一个 3 节点独立 windows 服务结构设置。 TestConfiguration 和 CreateCluster 脚本 运行 成功,但是在尝试将任何应用程序部署到集群时,我从 powershell 收到以下错误。
Copy-ServiceFabricApplicationPackage -ApplicationPackagePath .\pkg\<packagename> -ImageStoreConnectionString fabric:ImageStore
Copy-ServiceFabricApplicationPackage : An error occurred during this operation. Please check the trace logs for more
details.
At line:1 char:1
+ Copy-ServiceFabricApplicationPackage -ApplicationPackagePath .\pkg\ ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Copy-ServiceFabricApplicationPackage], FabricException
+ FullyQualifiedErrorId : CopyApplicationPackageErrorId,Microsoft.ServiceFabric.Powershell.CopyApplicationPackage
不确定哪些跟踪日志可用于诊断错误,但是检查其中一个节点上的 windows 事件日志我看到以下错误,全部针对 FileStoreService。
ImpersonateAndCopyFile for SourcePath:\<ipaddress>\StoreShare_Node31601795137630192.0.232.9494_01601794828730764_8589934592_1.ClusterManifest.xml, DestinationPath:C:\ProgramData\SF\Node1\Fabric\work\Applications\__FabricSystem_App4294967295\work\Store1601795317314061.0.232.9494_01601794828730764_8589934592_1.ClusterManifest.xml failed: 0x8007052e. Have tried all access tokens.
CopyFile: SourcePath:\<ip address>\StoreShare_Node31601795137630192.0.232.9494_01601794828730764_8589934592_1.ClusterManifest.xml, DestinationPath:C:\ProgramData\SF\Node1\Fabric\work\Applications\__FabricSystem_App4294967295\work\Store1601795317314061.0.232.9494_01601794828730764_8589934592_1.ClusterManifest.xml, Error:0x8007052e, ElapsedTime:80
CopyFile: no new token is found. current token count: 2
知道这可能是什么吗?我重新创建了一个没有安全性的新集群,防火墙在 AWS 和节点机器上都打开了所有端口(试图删除所有可能阻止复制的东西)。在 AWS 中,我使用的是 SimpleAD,因此所有节点都是 运行 同一个 AD 管理员,并且可以通过通信来创建集群。
下面是我正在使用的集群配置,尽可能简单地尝试限制问题的原因。
任何有关诊断复制文件问题的帮助,甚至是指点我相关的跟踪日志都会很棒。
此外,我注意到 ImageStoreService 在 Service Fabric Explorer 中显示警告
Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=false.
Partition reconfiguration is taking longer than expected.
ImageStoreService 3 3 00000000-0000-0000-0000-000000003000
P/P Ready Node3 131601795137630192
S/S InBuild Node1 131601795317314061
S/S InBuild Node2 131601795317314062
(Showing 3 out of 3 replicas. Total available replicas: 1)
编辑
附加信息
在进一步调查问题时,我 运行 带有 -Debug 标志的 Copy-ServiceFabricApplicationPackage 现在出现以下错误,提示使用用户名或密码将包从我的计算机上传到集群,或者集群将节点分发到节点是不正确的。我假设节点到节点使用它创建的以 fffff 结尾的本地帐户,我不知道为什么它会创建无效的用户凭据。如果它在上传包的计算机和集群之间,那么目前我 运行 没有打开安全性,所以不知道为什么这会是一个问题?非常感谢任何帮助。
Copy-ServiceFabricApplicationPackage -ApplicationPackagePath ..\pkg\Release -ImageStoreConnectionString fabric:imagestore -Debug
VERBOSE: System.Fabric.FabricException: An error occurred during this operation. Please check the trace logs for more details. ---> System.Runtime.InteropServices.COMException: The user name or password is incorrect. (Exception from HRESULT: 0x8007052E)
谢谢
{
"name": "SampleCluster",
"clusterConfigurationVersion": "1.0.0",
"apiVersion": "08-2017",
"nodes": [
{
"nodeName": "Node1",
"iPAddress": "<node 1 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/0",
"upgradeDomain": "UD0"
},
{
"nodeName": "Node2",
"iPAddress": "<node 2 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/1",
"upgradeDomain": "UD1"
},
{
"nodeName": "Node3",
"iPAddress": "<node 3 internal ip address>",
"nodeTypeRef": "StandardNodeType",
"faultDomain": "fd:/2",
"upgradeDomain": "UD2"
}
],
"properties": {
"diagnosticsStore": {
"metadata": "Please replace the diagnostics store with an actual file share accessible from all cluster machines.",
"dataDeletionAgeInDays": "7",
"storeType": "FileShare",
"IsEncrypted": "false",
"connectionstring": "c:\ProgramData\SF\DiagnosticsStore"
},
"nodeTypes": [
{
"name": "StandardNodeType",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpointPort": "19001",
"leaseDriverEndpointPort": "19002",
"serviceConnectionEndpointPort": "19003",
"httpGatewayEndpointPort": "19080",
"reverseProxyEndpointPort": "19081",
"applicationPorts": {
"startPort": "20000",
"endPort": "30000"
},
"ephemeralPorts": {
"startPort": "49152",
"endPort": "65534"
},
"isPrimary": true
}
],
"fabricSettings": [
{
"name": "Setup",
"parameters": [
{
"name": "FabricDataRoot",
"value": "C:\ProgramData\SF"
},
{
"name": "FabricLogRoot",
"value": "C:\ProgramData\SF\Log"
}
]
}
],
"addOnFeatures": [
"DnsService",
"RepairManager"
]
}
}
经过更多调查,我发现这是由于 windows 框上没有正确启用文件共享。尽管在网络适配器的属性中显示为已启用。我没有意识到需要在高级共享中心选项下启用设置(控制 Panel\Network 和 Internet\Network 和共享 Center\Advanced 共享设置)。