Google 云 运行 上的 GRPC:上游连接错误或 disconnect/reset 在 headers 之前。复位原因:远程复位
GRPC on Google Cloud Run : upstream connect error or disconnect/reset before headers. reset reason: remote reset
编辑
看来我描述的第一个错误很容易重现。实际上,Google 运行 似乎无法 运行 在 .NET5 GRPC 服务器上进行任何 GRPC 查询(至少,它之前确实有效,但截至今天,2 月 21 日,似乎有些东西改变了)。重现:
- 创建 .NET5 GRPC 服务器(对于 .NET6 也失败):
dotnet new grpc -o TestGrpc
- 更改
Program.cs
使其在 $PORT
上侦听,通常:
public static IHostBuilder CreateHostBuilder(string[] args)
{
var port = Environment.GetEnvironmentVariable("PORT") ?? "8080";
var url = string.Concat("http://0.0.0.0:", port);
return Host.CreateDefaultBuilder(args)
.ConfigureWebHostDefaults(webBuilder =>
{
webBuilder.UseStartup<Startup>().UseUrls(url);
});
}
- 一个非常简单的 Docker 包含服务器图像的文件(使用更标准的图像也会失败,例如 here):
FROM mcr.microsoft.com/dotnet/sdk:5.0
COPY . ./
RUN dotnet restore ./TestGrpc.csproj
RUN dotnet build ./TestGrpc.csproj -c Release
CMD dotnet run --project ./TestGrpc.csproj
- 构建并推送到 Google Artifcats Registry。
- 创建一个 Cloud 运行 实例并启用 HTTP/2(Ketrel requires HTTP/2 所以我们需要设置
HTTP/2 end-to-end,但我也测试过,但并没有更好。
- 例如使用 Grpcurl 并尝试:
grpcurl {CLOUD_RUN_URL}:443 list
你会得到与我的(更复杂的)项目相同的错误:
Failed to list services: rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: remote reset
在 Google Cloud 运行 实例上我只有日志:
2022-02-21T16:44:32.528530Z POST 200 1.02 KB 41 ms grpcurl/v1.8.6 grpc-go/1.44.1-dev https://***/grpc.reflection.v1alpha.ServerReflection/ServerReflectionInfo
(我真的不明白为什么它是 200...而且似乎永远不会达到实际的服务器实现,就好像有某种中间件阻止了查询以达到实现...)
我很确定这曾经在我以这种方式开始我的项目时起作用(然后更改了原型、服务等)。如果有人有线索,我将不胜感激:-)
INITIAL POST(没有上面的解释那么精确,但如果它能提供线索,我就把它留在这里)
我在 Docker(.NET5 GRPC 应用程序)中有一个服务器 运行ning。该服务器在本地部署时运行良好。但是最近我在 Google Cloud 运行: upstream connect error or disconnect/reset before headers. reset reason: remote reset
上部署它时出现错误,而它之前工作正常。我一直从我使用的任何客户端收到此错误,例如使用 Curl:
curl -v https://{ENDPOINT}/{Proto-base}/{Method} --http2
* Trying ***...
* TCP_NODELAY set
* Connected to *** (***) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=*.a.run.app
* start date: Feb 7 02:07:06 2022 GMT
* expire date: May 2 02:07:05 2022 GMT
* subjectAltName: host "***" matched cert's "*.a.run.app"
* issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1C3
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5564aad30860)
> GET /{Proto}/{Method} HTTP/2
> Host: ***
> user-agent: curl/7.68.0
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 503
< content-length: 85
< content-type: text/plain
< date: Mon, 21 Feb 2022 13:51:31 GMT
< server: Google Frontend
< traceparent: 00-5a74487dafb5687961deeb17e0158ca9-5ab63cd23680e7d7-01
< x-cloud-trace-context: 5a74487dafb5687961deeb17e0158ca9/6536478782730069975;o=1
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
<
* Connection #0 to host *** left intact
upstream connect error or disconnect/reset before headers. reset reason: remote reset
Grpcurl 也是如此:
grpcurl ***:443 list {Proto-base}
Failed to list methods for service "***.Company": rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: remote reset
我找不到太多关于此错误的资源,因为我阅读的大多数线程都处理另一种类型的 reset reason
(如协议或连接等)。但我完全不知道 remote reset
是什么意思以及我做错了什么。
查看 Google Cloud 运行 中的日志,我可以看到服务器肯定被命中,尽管我在未触发的路由中添加了跟踪日志记录,因此它永远不会到达我的代码:
2022-02-21T14:44:22.840580Z POST 200 1.01 KB 1 msgrpc-python/1.44.0 grpc-c/22.0.0 (linux; chttp2) https://***/{Protos-base}/{Method}
(如果我达到我的代码,它应该在所有地方打印一些“你好”,但它没有)
有没有人找到这个?
P.S.: 关于Envoy有很多东西,但我什至不用这个。我只有一个 Cloud 运行 实例(带有 HTTP/2 - 我试过没有,但由于协议问题失败了)。
这是来自 Envoy 和 Google Cloud 运行 的实际错误。如果您使用的是 .NET6,则有一个快速修复方法,否则会有点麻烦。我将在此处复制 Amanda Tarafa Mas 在 github issue I opened 上的 Google Cloud Platform 提供的答案:
Here are the potential fixes:
- When using .NET 6 you can set KestrelServerOptions.AllowAlternateSchemes to true.
- If on a lower .NET version, consider something like GRPC :scheme pseudo-header passed from proxy/loadbalancer causes ConnectionAbortedException dotnet/aspnetcore#30532 (comment). Or consider upgrading to .NET 6.
What's happening:
- Cloud Run has dependency on Envoy, which has a behavior change since 04/15/2021, see "preserve_downstream_scheme" in release notes:
https://www.envoyproxy.io/docs/envoy/latest/version_history/v1.18.0
Envoy recently removed the old behaviour: https://www.envoyproxy.io/docs/envoy/latest/version_history/current#removed-config-or-runtime
- In turn, this exposes this .NET issue: GRPC :scheme pseudo-header passed from proxy/loadbalancer causes ConnectionAbortedException dotnet/aspnetcore#30532, for which the Kestrel configuration flag was added, but only for .NET 6.
I'm looking into having this documented somewhere. @meteatamel can you update the tutorial so that it uses the Kestrel option?
对我来说,设置 KestrelServerOptions.AllowAlternate
足以让我的 GRPC 服务器再次工作。
正如@Craig 所说,您可以跟踪问题 here 看看它是否得到解决。
编辑
看来我描述的第一个错误很容易重现。实际上,Google 运行 似乎无法 运行 在 .NET5 GRPC 服务器上进行任何 GRPC 查询(至少,它之前确实有效,但截至今天,2 月 21 日,似乎有些东西改变了)。重现:
- 创建 .NET5 GRPC 服务器(对于 .NET6 也失败):
dotnet new grpc -o TestGrpc
- 更改
Program.cs
使其在$PORT
上侦听,通常:
public static IHostBuilder CreateHostBuilder(string[] args)
{
var port = Environment.GetEnvironmentVariable("PORT") ?? "8080";
var url = string.Concat("http://0.0.0.0:", port);
return Host.CreateDefaultBuilder(args)
.ConfigureWebHostDefaults(webBuilder =>
{
webBuilder.UseStartup<Startup>().UseUrls(url);
});
}
- 一个非常简单的 Docker 包含服务器图像的文件(使用更标准的图像也会失败,例如 here):
FROM mcr.microsoft.com/dotnet/sdk:5.0
COPY . ./
RUN dotnet restore ./TestGrpc.csproj
RUN dotnet build ./TestGrpc.csproj -c Release
CMD dotnet run --project ./TestGrpc.csproj
- 构建并推送到 Google Artifcats Registry。
- 创建一个 Cloud 运行 实例并启用 HTTP/2(Ketrel requires HTTP/2 所以我们需要设置 HTTP/2 end-to-end,但我也测试过,但并没有更好。
- 例如使用 Grpcurl 并尝试:
grpcurl {CLOUD_RUN_URL}:443 list
你会得到与我的(更复杂的)项目相同的错误:
Failed to list services: rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: remote reset
在 Google Cloud 运行 实例上我只有日志:
2022-02-21T16:44:32.528530Z POST 200 1.02 KB 41 ms grpcurl/v1.8.6 grpc-go/1.44.1-dev https://***/grpc.reflection.v1alpha.ServerReflection/ServerReflectionInfo
(我真的不明白为什么它是 200...而且似乎永远不会达到实际的服务器实现,就好像有某种中间件阻止了查询以达到实现...)
我很确定这曾经在我以这种方式开始我的项目时起作用(然后更改了原型、服务等)。如果有人有线索,我将不胜感激:-)
INITIAL POST(没有上面的解释那么精确,但如果它能提供线索,我就把它留在这里)
我在 Docker(.NET5 GRPC 应用程序)中有一个服务器 运行ning。该服务器在本地部署时运行良好。但是最近我在 Google Cloud 运行: upstream connect error or disconnect/reset before headers. reset reason: remote reset
上部署它时出现错误,而它之前工作正常。我一直从我使用的任何客户端收到此错误,例如使用 Curl:
curl -v https://{ENDPOINT}/{Proto-base}/{Method} --http2
* Trying ***...
* TCP_NODELAY set
* Connected to *** (***) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=*.a.run.app
* start date: Feb 7 02:07:06 2022 GMT
* expire date: May 2 02:07:05 2022 GMT
* subjectAltName: host "***" matched cert's "*.a.run.app"
* issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1C3
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5564aad30860)
> GET /{Proto}/{Method} HTTP/2
> Host: ***
> user-agent: curl/7.68.0
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 503
< content-length: 85
< content-type: text/plain
< date: Mon, 21 Feb 2022 13:51:31 GMT
< server: Google Frontend
< traceparent: 00-5a74487dafb5687961deeb17e0158ca9-5ab63cd23680e7d7-01
< x-cloud-trace-context: 5a74487dafb5687961deeb17e0158ca9/6536478782730069975;o=1
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
<
* Connection #0 to host *** left intact
upstream connect error or disconnect/reset before headers. reset reason: remote reset
Grpcurl 也是如此:
grpcurl ***:443 list {Proto-base}
Failed to list methods for service "***.Company": rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: remote reset
我找不到太多关于此错误的资源,因为我阅读的大多数线程都处理另一种类型的 reset reason
(如协议或连接等)。但我完全不知道 remote reset
是什么意思以及我做错了什么。
查看 Google Cloud 运行 中的日志,我可以看到服务器肯定被命中,尽管我在未触发的路由中添加了跟踪日志记录,因此它永远不会到达我的代码:
2022-02-21T14:44:22.840580Z POST 200 1.01 KB 1 msgrpc-python/1.44.0 grpc-c/22.0.0 (linux; chttp2) https://***/{Protos-base}/{Method}
(如果我达到我的代码,它应该在所有地方打印一些“你好”,但它没有)
有没有人找到这个?
P.S.: 关于Envoy有很多东西,但我什至不用这个。我只有一个 Cloud 运行 实例(带有 HTTP/2 - 我试过没有,但由于协议问题失败了)。
这是来自 Envoy 和 Google Cloud 运行 的实际错误。如果您使用的是 .NET6,则有一个快速修复方法,否则会有点麻烦。我将在此处复制 Amanda Tarafa Mas 在 github issue I opened 上的 Google Cloud Platform 提供的答案:
Here are the potential fixes:
- When using .NET 6 you can set KestrelServerOptions.AllowAlternateSchemes to true.
- If on a lower .NET version, consider something like GRPC :scheme pseudo-header passed from proxy/loadbalancer causes ConnectionAbortedException dotnet/aspnetcore#30532 (comment). Or consider upgrading to .NET 6.
What's happening:
- Cloud Run has dependency on Envoy, which has a behavior change since 04/15/2021, see "preserve_downstream_scheme" in release notes: https://www.envoyproxy.io/docs/envoy/latest/version_history/v1.18.0 Envoy recently removed the old behaviour: https://www.envoyproxy.io/docs/envoy/latest/version_history/current#removed-config-or-runtime
- In turn, this exposes this .NET issue: GRPC :scheme pseudo-header passed from proxy/loadbalancer causes ConnectionAbortedException dotnet/aspnetcore#30532, for which the Kestrel configuration flag was added, but only for .NET 6. I'm looking into having this documented somewhere. @meteatamel can you update the tutorial so that it uses the Kestrel option?
对我来说,设置 KestrelServerOptions.AllowAlternate
足以让我的 GRPC 服务器再次工作。
正如@Craig 所说,您可以跟踪问题 here 看看它是否得到解决。