分析我的 NodeJS 服务器显示它挂起......但是如何找出原因？

Question

我有一个 NodeJS 服务器运行 NestJS 并部署在 AKS 上。
我创建了一个 Interceptor，这样如果我发送一个 HTTP header 'profile'，它将运行通常的代码，但它不会发送响应，而是替换 body 与配置文件输出。

代码如下：

import {
  CallHandler, ExecutionContext, Injectable, NestInterceptor,
} from '@nestjs/common';
import { Observable } from 'rxjs';
import { map } from 'rxjs/operators';
import { Session } from 'inspector';

/**
 * This interceptor will replace the body of the response with the result of the profiling
 * if the request as a header 'profile'
 *
 * To use on a controller or a method, simply decorate it with @UseInterceptors(Profiler)
 */
@Injectable()
export class Profiler implements NestInterceptor {
  intercept(context: ExecutionContext, next: CallHandler): Observable<any> | Promise<Observable<any>> {
    const profile = context.switchToHttp().getRequest().get('profile');
    // if the request doesn't have a 'profile' header, we deal with the request as usual
    if (!profile) return next.handle();

    // start a profiling session
    const session = new Session();
    session.connect();

    return new Promise((resolve) => {
      session.post('Profiler.enable', () => {
        session.post('Profiler.start', () => {
          resolve(next.handle().pipe(map(() => new Promise((resolve) => {
            session.post('Profiler.stop', (_, { profile }) => {
              resolve(profile);
            });
          }))));
        });
      });
    });
  }
}

这样做，我可以获得一个 JSON，然后我可以使用 Chrome 开发工具打开它：

您可以看到，所有单独的功能只需要几毫秒到运行，但在这两者之间，有很长的休息时间。

这是我的 deployment.yaml 文件的摘录，它显示我的 pod 应该有 2GB 内存，我认为这应该足够了。

spec:
      serviceAccount: {{ include "api.fullname" . }}-service-account
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          resources:
            limits:
              memory: "2000Mi"
            requests:
              memory: "2000Mi"
          ports:
            - name: http
              containerPort: {{ .Values.port }}
              protocol: TCP

那么我们如何解释这些长时间的休息以及如何预防呢？

Answer 1

通过在数据库请求周围添加更多日志，我们意识到前几个请求大约需要 2-3 秒，而后面的请求只需要 ~10 毫秒或更少。

解决方案是使用 PgBouncer 维护与数据库的连接池。
更多信息在这里： https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/steps-to-install-and-setup-pgbouncer-connection-pooling-proxy/ba-p/730555

分析我的 NodeJS 服务器显示它挂起......但是如何找出原因？

Profiling my NodeJS server shows it's hanging... But how to figure out why?

profiling

node.js

docker

kubernetes

nestjs