Azure App Insights 中的(已恢复)异常过多

Too many (recovered) exceptions in Azure App Insights

我有一个问题:Application Insights 越来越多误报,正在发送异常邮件,经过调查,我们的应用程序没有问题

总结,长话短说

这是一个 X->Y 问题。问题 Y 是 AAI 正在记录大量服务器异常,请参阅详细说明,并向我们发送警报。问题 X 是 JWT 身份验证中间件抛出关于不匹配密钥的异常,但正在恢复所有这些异常并切换到不同的 OIDC 提供商。结果调用成功。

我该怎么做才能修复或将这些异常列入白名单?

问题 2:何时将异常记录到 AAI?仅当它们未处理或记录器决定时?

上下文

我们的应用程序通过经过身份验证的 webhooks 从 Twilio Sendgrid 接收电子邮件数据。它还允许我们的 B2C 租户用户访问应用程序并浏览 data/statistics.

B2C 不允许客户端凭据流, Sendgrid 不支持范围。 最后 我们最终使用了两个 OIDC 提供程序:用于交互式用户的 Azure AD B2C,以及 OpenIddict 在内存中向我们验证 Sendgrid 服务。

一些代码

    public void ConfigureServices(IServiceCollection services)

        services.AddLogging(
            configuration => configuration
                .AddApplicationInsights()
                .SetMinimumLevel(LogLevel.Trace)
                .AddConsole()
        );

        services.ConfigureOpenIddictAuthentication();

        services
            .AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
            .AddMicrosoftIdentityWebApi(Configuration)
            //.EnableTokenAcquisitionToCallDownstreamApi()
            //.AddInMemoryTokenCaches()
            ;

        services.AddAuthorization(authorization => authorization
            .AddPolicy("AzureSendgridPolicy", policy => policy
                .RequireAuthenticatedUser()
                .AddAuthenticationSchemes(JwtBearerDefaults.AuthenticationScheme,
                    OpenIddictValidationAspNetCoreDefaults.AuthenticationScheme)
            )
        );
    }

    public static IServiceCollection ConfigureOpenIddictAuthentication(this IServiceCollection services)
    {
        services.AddDbContext<OpenIddictDbContext>(ef => ef
                // Configure the context to use an in-memory store.
                // This prevents multiple cluster instances from deployment
                .UseInMemoryDatabase(nameof(OpenIddictDbContext))
                // Register the entity sets needed by OpenIddict.
                .UseOpenIddict()
            )
            .AddOpenIddict(options =>
                options.AddServer(server => server
                        .DisableAccessTokenEncryption() //Just for development

                        //Development: no time to waste on certificate management today
                        .AddEphemeralEncryptionKey()
                        .AddEphemeralSigningKey()
                        .RegisterClaims(OpenIddictConstants.Claims.Role)
                        .RegisterScopes(OpenIddictConstants.Scopes.Roles)
                        .SetTokenEndpointUris("/api/v1/Auth/token")
                        .SetAuthorizationEndpointUris("/api/v1/Auth/authorize")
                        .AllowClientCredentialsFlow() //Only one supported by Sendgrid
                        .UseAspNetCore()
                        .EnableTokenEndpointPassthrough())
                    .AddCore(core => core.UseEntityFrameworkCore(ef => ef.UseDbContext<OpenIddictDbContext>()))
                    .AddValidation(validation => validation
                        .UseLocalServer(_ => {})
                        .UseAspNetCore(_ => {})
                    )
            )
            .AddHostedService<OpenIddictHostedService>()
            .AddAuthentication()
            ;

        return services;
    }

Azure 应用洞察

在AAI上,我发现最常抛出的异常是SecurityTokenUnableToValidateException

它被抛出的次数很多,比真正的 401s 多得多。由于开发环境中的临时密钥,每次重新启动应用程序时,JWK 都会由 OpenIddict 重新生成。

不过细看一些痕迹,发现不是错误

这里是我的发现:

异常分析

查看抛出的异常,这里是AAI的描述文字

IDX10516: Signature validation failed. Unable to match key: 
kid: 'RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE'.
Exceptions caught:
 ''. 
token: '{"alg":"RS256","kid":"RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE","typ":"at+jwt"}.{"sub":"SendgridWebhook","name":"Sendgrid Webhook API","oi_prst":"SendgridWebhook","client_id":"SendgridWebhook","oi_tkn_id":"8d0d5f94-2094-4a21-b84d-304d1d99e3fb","exp":1629910230,"iss":"https://****.azurewebsites.net/","iat":1629906630}'. Valid Lifetime: 'True'. Valid Issuer: 'False' 

堆栈跟踪

Microsoft.IdentityModel.Tokens.SecurityTokenUnableToValidateException:
   at Microsoft.IdentityModel.Tokens.InternalValidators.ValidateLifetimeAndIssuerAfterSignatureNotValidatedJwt (Microsoft.IdentityModel.Tokens, Version=6.10.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.IdentityModel.Tokens.Jwt.JwtSecurityTokenHandler.ValidateSignature (System.IdentityModel.Tokens.Jwt, Version=6.10.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.IdentityModel.Tokens.Jwt.JwtSecurityTokenHandler.ValidateToken (System.IdentityModel.Tokens.Jwt, Version=6.10.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at Microsoft.AspNetCore.Authentication.JwtBearer.JwtBearerHandler+<HandleAuthenticateAsync>d__6.MoveNext (Microsoft.AspNetCore.Authentication.JwtBearer, Version=5.0.5.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)

我确定发生了以下情况

  1. 应用程序收到 JWT
  2. 由于注册了多个 OIDC 提供商,运行时会获取 B2C 和自应用程序的 JWK
  3. 根据 B2C 密钥验证 JWT,失败
  4. JWT 已根据自身密钥进行验证,成功
  5. 已授​​予访问权限

我相信框架中某处的代码结构良好,如下所示。由于有多个提供者要尝试,只有当所有提供者都失败时才会抛出异常。否则,简单的 for 循环将作为对异常的恢复

object principal = null;
Exception toThrow = null;
for (IAuthenticationProvider provider: GetProviders) {
    try {
        principal = provider.Authenticate(jwt);
    } catch(SomeKindOfJwtException ex) {
        toThrow = ex;
    }
}
if (principal == null) //and perhaps the exception is not null
    throw toThrow;

看看那个 JWK RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE,我可以通过浏览器 https://***.azurewebsites.net/.well-known/jwks 轻松找到它

{
  "keys": [
    {
      "kid": "RMHQYNQ4TV9KUHI2EI-INM-XYSOF_1RETVYMWQGE",
      "use": "sig",
      "kty": "RSA",
      "alg": "RS256",
      "e": "AQAB",
      "n": "rMhqYnq4tv9kuHi2Ei-Inm-xysof_1retVymwqGeQ4hnlCRgrMAODGD4qxybhnpufuitEQRckCb4P49O_qafSQ0ocgRRIIuQJc-vLhLJHGp681_9cZT-jGxHnGw5Jdr0NZxH8RwV6cXcmpRN6f2WupujyhLLNwuu8aaTrucHA3JXshib9ad9R96OacT1r6X77HHXdSzURLRWH-f2JFwpBjOvVfJPxW4dzPY0BT7CzP3lxVvGiNXOp4-E8kVz1jER2EP5wO0Ho2qjlIbGUvGF1ui7GxLItldDs-PkZOGGvsO7yS7aeQHSiMTJt7EO-w-ffCJYv-ZColAiHO9jNL0NmQ"
    }
  ]
}

我也做的太多了,偷看微软资源。 Here should be the point where the exception is thrown and maybe here 记录异常的地方

What can I do in order to either fix or whitelist these exceptions?

加一个telemetry filter。根据异常遥测,您可以决定放弃遥测。

Question 2: when do exceptions get logged to AAI? Only when they are unhandled or when the logger decides to?

未处理或按指示执行时。例如,当使用 ILogger 记录异常时,使用 AAI ILogger (see docs)

时也会将其记录到 AAI