Azure App Insights 中的(已恢复)异常过多
Too many (recovered) exceptions in Azure App Insights
我有一个问题:Application Insights 越来越多误报,正在发送异常邮件,经过调查,我们的应用程序没有问题
总结,长话短说
这是一个 X->Y 问题。问题 Y 是 AAI 正在记录大量服务器异常,请参阅详细说明,并向我们发送警报。问题 X 是 JWT 身份验证中间件抛出关于不匹配密钥的异常,但正在恢复所有这些异常并切换到不同的 OIDC 提供商。结果调用成功。
我该怎么做才能修复或将这些异常列入白名单?
问题 2:何时将异常记录到 AAI?仅当它们未处理或记录器决定时?
上下文
我们的应用程序通过经过身份验证的 webhooks 从 Twilio Sendgrid 接收电子邮件数据。它还允许我们的 B2C 租户用户访问应用程序并浏览 data/statistics.
B2C 不允许客户端凭据流,和 Sendgrid 不支持范围。 最后 我们最终使用了两个 OIDC 提供程序:用于交互式用户的 Azure AD B2C,以及 OpenIddict 在内存中向我们验证 Sendgrid 服务。
一些代码
public void ConfigureServices(IServiceCollection services)
services.AddLogging(
configuration => configuration
.AddApplicationInsights()
.SetMinimumLevel(LogLevel.Trace)
.AddConsole()
);
services.ConfigureOpenIddictAuthentication();
services
.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
.AddMicrosoftIdentityWebApi(Configuration)
//.EnableTokenAcquisitionToCallDownstreamApi()
//.AddInMemoryTokenCaches()
;
services.AddAuthorization(authorization => authorization
.AddPolicy("AzureSendgridPolicy", policy => policy
.RequireAuthenticatedUser()
.AddAuthenticationSchemes(JwtBearerDefaults.AuthenticationScheme,
OpenIddictValidationAspNetCoreDefaults.AuthenticationScheme)
)
);
}
public static IServiceCollection ConfigureOpenIddictAuthentication(this IServiceCollection services)
{
services.AddDbContext<OpenIddictDbContext>(ef => ef
// Configure the context to use an in-memory store.
// This prevents multiple cluster instances from deployment
.UseInMemoryDatabase(nameof(OpenIddictDbContext))
// Register the entity sets needed by OpenIddict.
.UseOpenIddict()
)
.AddOpenIddict(options =>
options.AddServer(server => server
.DisableAccessTokenEncryption() //Just for development
//Development: no time to waste on certificate management today
.AddEphemeralEncryptionKey()
.AddEphemeralSigningKey()
.RegisterClaims(OpenIddictConstants.Claims.Role)
.RegisterScopes(OpenIddictConstants.Scopes.Roles)
.SetTokenEndpointUris("/api/v1/Auth/token")
.SetAuthorizationEndpointUris("/api/v1/Auth/authorize")
.AllowClientCredentialsFlow() //Only one supported by Sendgrid
.UseAspNetCore()
.EnableTokenEndpointPassthrough())
.AddCore(core => core.UseEntityFrameworkCore(ef => ef.UseDbContext<OpenIddictDbContext>()))
.AddValidation(validation => validation
.UseLocalServer(_ => {})
.UseAspNetCore(_ => {})
)
)
.AddHostedService<OpenIddictHostedService>()
.AddAuthentication()
;
return services;
}
Azure 应用洞察
在AAI上,我发现最常抛出的异常是SecurityTokenUnableToValidateException
它被抛出的次数很多,比真正的 401
s 多得多。由于开发环境中的临时密钥,每次重新启动应用程序时,JWK 都会由 OpenIddict 重新生成。
不过细看一些痕迹,发现不是错误
这里是我的发现:
- 服务器正在返回 204
- 作为涉及的数据库,100%写入数据库(401不涉及EF访问数据库)
- 异常分析找到异常中指定的JWK
异常分析
查看抛出的异常,这里是AAI的描述文字
IDX10516: Signature validation failed. Unable to match key:
kid: 'RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE'.
Exceptions caught:
''.
token: '{"alg":"RS256","kid":"RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE","typ":"at+jwt"}.{"sub":"SendgridWebhook","name":"Sendgrid Webhook API","oi_prst":"SendgridWebhook","client_id":"SendgridWebhook","oi_tkn_id":"8d0d5f94-2094-4a21-b84d-304d1d99e3fb","exp":1629910230,"iss":"https://****.azurewebsites.net/","iat":1629906630}'. Valid Lifetime: 'True'. Valid Issuer: 'False'
堆栈跟踪
Microsoft.IdentityModel.Tokens.SecurityTokenUnableToValidateException:
at Microsoft.IdentityModel.Tokens.InternalValidators.ValidateLifetimeAndIssuerAfterSignatureNotValidatedJwt (Microsoft.IdentityModel.Tokens, Version=6.10.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at System.IdentityModel.Tokens.Jwt.JwtSecurityTokenHandler.ValidateSignature (System.IdentityModel.Tokens.Jwt, Version=6.10.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at System.IdentityModel.Tokens.Jwt.JwtSecurityTokenHandler.ValidateToken (System.IdentityModel.Tokens.Jwt, Version=6.10.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at Microsoft.AspNetCore.Authentication.JwtBearer.JwtBearerHandler+<HandleAuthenticateAsync>d__6.MoveNext (Microsoft.AspNetCore.Authentication.JwtBearer, Version=5.0.5.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
我确定发生了以下情况
- 应用程序收到 JWT
- 由于注册了多个 OIDC 提供商,运行时会获取 B2C 和自应用程序的 JWK
- 根据 B2C 密钥验证 JWT,失败
- JWT 已根据自身密钥进行验证,成功
- 已授予访问权限
我相信框架中某处的代码结构良好,如下所示。由于有多个提供者要尝试,只有当所有提供者都失败时才会抛出异常。否则,简单的 for 循环将作为对异常的恢复
object principal = null;
Exception toThrow = null;
for (IAuthenticationProvider provider: GetProviders) {
try {
principal = provider.Authenticate(jwt);
} catch(SomeKindOfJwtException ex) {
toThrow = ex;
}
}
if (principal == null) //and perhaps the exception is not null
throw toThrow;
看看那个 JWK RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE
,我可以通过浏览器 https://***.azurewebsites.net/.well-known/jwks
轻松找到它
{
"keys": [
{
"kid": "RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE",
"use": "sig",
"kty": "RSA",
"alg": "RS256",
"e": "AQAB",
"n": "rMhqYnq4tv9kuHi2Ei-Inm-xysof_1retVymwqGeQ4hnlCRgrMAODGD4qxybhnpufuitEQRckCb4P49O_qafSQ0ocgRRIIuQJc-vLhLJHGp681_9cZT-jGxHnGw5Jdr0NZxH8RwV6cXcmpRN6f2WupujyhLLNwuu8aaTrucHA3JXshib9ad9R96OacT1r6X77HHXdSzURLRWH-f2JFwpBjOvVfJPxW4dzPY0BT7CzP3lxVvGiNXOp4-E8kVz1jER2EP5wO0Ho2qjlIbGUvGF1ui7GxLItldDs-PkZOGGvsO7yS7aeQHSiMTJt7EO-w-ffCJYv-ZColAiHO9jNL0NmQ"
}
]
}
我也做的太多了,偷看微软资源。 Here should be the point where the exception is thrown and maybe here 记录异常的地方
What can I do in order to either fix or whitelist these exceptions?
加一个telemetry filter。根据异常遥测,您可以决定放弃遥测。
Question 2: when do exceptions get logged to AAI? Only when they are unhandled or when the logger decides to?
未处理或按指示执行时。例如,当使用 ILogger
记录异常时,使用 AAI ILogger (see docs)
时也会将其记录到 AAI
我有一个问题:Application Insights 越来越多误报,正在发送异常邮件,经过调查,我们的应用程序没有问题
总结,长话短说
这是一个 X->Y 问题。问题 Y 是 AAI 正在记录大量服务器异常,请参阅详细说明,并向我们发送警报。问题 X 是 JWT 身份验证中间件抛出关于不匹配密钥的异常,但正在恢复所有这些异常并切换到不同的 OIDC 提供商。结果调用成功。
我该怎么做才能修复或将这些异常列入白名单?
问题 2:何时将异常记录到 AAI?仅当它们未处理或记录器决定时?
上下文
我们的应用程序通过经过身份验证的 webhooks 从 Twilio Sendgrid 接收电子邮件数据。它还允许我们的 B2C 租户用户访问应用程序并浏览 data/statistics.
B2C 不允许客户端凭据流,和 Sendgrid 不支持范围。 最后 我们最终使用了两个 OIDC 提供程序:用于交互式用户的 Azure AD B2C,以及 OpenIddict 在内存中向我们验证 Sendgrid 服务。
一些代码
public void ConfigureServices(IServiceCollection services)
services.AddLogging(
configuration => configuration
.AddApplicationInsights()
.SetMinimumLevel(LogLevel.Trace)
.AddConsole()
);
services.ConfigureOpenIddictAuthentication();
services
.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
.AddMicrosoftIdentityWebApi(Configuration)
//.EnableTokenAcquisitionToCallDownstreamApi()
//.AddInMemoryTokenCaches()
;
services.AddAuthorization(authorization => authorization
.AddPolicy("AzureSendgridPolicy", policy => policy
.RequireAuthenticatedUser()
.AddAuthenticationSchemes(JwtBearerDefaults.AuthenticationScheme,
OpenIddictValidationAspNetCoreDefaults.AuthenticationScheme)
)
);
}
public static IServiceCollection ConfigureOpenIddictAuthentication(this IServiceCollection services)
{
services.AddDbContext<OpenIddictDbContext>(ef => ef
// Configure the context to use an in-memory store.
// This prevents multiple cluster instances from deployment
.UseInMemoryDatabase(nameof(OpenIddictDbContext))
// Register the entity sets needed by OpenIddict.
.UseOpenIddict()
)
.AddOpenIddict(options =>
options.AddServer(server => server
.DisableAccessTokenEncryption() //Just for development
//Development: no time to waste on certificate management today
.AddEphemeralEncryptionKey()
.AddEphemeralSigningKey()
.RegisterClaims(OpenIddictConstants.Claims.Role)
.RegisterScopes(OpenIddictConstants.Scopes.Roles)
.SetTokenEndpointUris("/api/v1/Auth/token")
.SetAuthorizationEndpointUris("/api/v1/Auth/authorize")
.AllowClientCredentialsFlow() //Only one supported by Sendgrid
.UseAspNetCore()
.EnableTokenEndpointPassthrough())
.AddCore(core => core.UseEntityFrameworkCore(ef => ef.UseDbContext<OpenIddictDbContext>()))
.AddValidation(validation => validation
.UseLocalServer(_ => {})
.UseAspNetCore(_ => {})
)
)
.AddHostedService<OpenIddictHostedService>()
.AddAuthentication()
;
return services;
}
Azure 应用洞察
在AAI上,我发现最常抛出的异常是SecurityTokenUnableToValidateException
它被抛出的次数很多,比真正的 401
s 多得多。由于开发环境中的临时密钥,每次重新启动应用程序时,JWK 都会由 OpenIddict 重新生成。
不过细看一些痕迹,发现不是错误
这里是我的发现:
- 服务器正在返回 204
- 作为涉及的数据库,100%写入数据库(401不涉及EF访问数据库)
- 异常分析找到异常中指定的JWK
异常分析
查看抛出的异常,这里是AAI的描述文字
IDX10516: Signature validation failed. Unable to match key:
kid: 'RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE'.
Exceptions caught:
''.
token: '{"alg":"RS256","kid":"RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE","typ":"at+jwt"}.{"sub":"SendgridWebhook","name":"Sendgrid Webhook API","oi_prst":"SendgridWebhook","client_id":"SendgridWebhook","oi_tkn_id":"8d0d5f94-2094-4a21-b84d-304d1d99e3fb","exp":1629910230,"iss":"https://****.azurewebsites.net/","iat":1629906630}'. Valid Lifetime: 'True'. Valid Issuer: 'False'
堆栈跟踪
Microsoft.IdentityModel.Tokens.SecurityTokenUnableToValidateException:
at Microsoft.IdentityModel.Tokens.InternalValidators.ValidateLifetimeAndIssuerAfterSignatureNotValidatedJwt (Microsoft.IdentityModel.Tokens, Version=6.10.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at System.IdentityModel.Tokens.Jwt.JwtSecurityTokenHandler.ValidateSignature (System.IdentityModel.Tokens.Jwt, Version=6.10.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at System.IdentityModel.Tokens.Jwt.JwtSecurityTokenHandler.ValidateToken (System.IdentityModel.Tokens.Jwt, Version=6.10.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
at Microsoft.AspNetCore.Authentication.JwtBearer.JwtBearerHandler+<HandleAuthenticateAsync>d__6.MoveNext (Microsoft.AspNetCore.Authentication.JwtBearer, Version=5.0.5.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
我确定发生了以下情况
- 应用程序收到 JWT
- 由于注册了多个 OIDC 提供商,运行时会获取 B2C 和自应用程序的 JWK
- 根据 B2C 密钥验证 JWT,失败
- JWT 已根据自身密钥进行验证,成功
- 已授予访问权限
我相信框架中某处的代码结构良好,如下所示。由于有多个提供者要尝试,只有当所有提供者都失败时才会抛出异常。否则,简单的 for 循环将作为对异常的恢复
object principal = null;
Exception toThrow = null;
for (IAuthenticationProvider provider: GetProviders) {
try {
principal = provider.Authenticate(jwt);
} catch(SomeKindOfJwtException ex) {
toThrow = ex;
}
}
if (principal == null) //and perhaps the exception is not null
throw toThrow;
看看那个 JWK RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE
,我可以通过浏览器 https://***.azurewebsites.net/.well-known/jwks
轻松找到它
{
"keys": [
{
"kid": "RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE",
"use": "sig",
"kty": "RSA",
"alg": "RS256",
"e": "AQAB",
"n": "rMhqYnq4tv9kuHi2Ei-Inm-xysof_1retVymwqGeQ4hnlCRgrMAODGD4qxybhnpufuitEQRckCb4P49O_qafSQ0ocgRRIIuQJc-vLhLJHGp681_9cZT-jGxHnGw5Jdr0NZxH8RwV6cXcmpRN6f2WupujyhLLNwuu8aaTrucHA3JXshib9ad9R96OacT1r6X77HHXdSzURLRWH-f2JFwpBjOvVfJPxW4dzPY0BT7CzP3lxVvGiNXOp4-E8kVz1jER2EP5wO0Ho2qjlIbGUvGF1ui7GxLItldDs-PkZOGGvsO7yS7aeQHSiMTJt7EO-w-ffCJYv-ZColAiHO9jNL0NmQ"
}
]
}
我也做的太多了,偷看微软资源。 Here should be the point where the exception is thrown and maybe here 记录异常的地方
What can I do in order to either fix or whitelist these exceptions?
加一个telemetry filter。根据异常遥测,您可以决定放弃遥测。
Question 2: when do exceptions get logged to AAI? Only when they are unhandled or when the logger decides to?
未处理或按指示执行时。例如,当使用 ILogger
记录异常时,使用 AAI ILogger (see docs)