如何检测 java 程序中的内存压力？

Question

我有一个用 java 编写的批处理过程，它分析极长的标记序列（可能有数十亿甚至数万亿！）并观察二元语法模式（也称为词对）。

在此代码中，使用来自 Apache commons 的 ImmutablePair class 将二元语法表示为字符串对。我不会事先知道令牌的基数。它们可能非常重复，或者每个标记可能是完全独特的。

我可以放入内存的数据越多，分析就会越好！

但我绝对不能一次处理整个作业。因此，我需要将尽可能多的数据加载到缓冲区中，执行部分分析，将我的部分结果刷新到文件（或 API 或其他文件），然后清除缓存并重新开始。

我优化内存使用的一种方法是使用 Guava interners 对我的 String 实例进行去重。

现在，我的代码基本上是这样的：

int BUFFER_SIZE = 100_000_000;

Map<Pair<String, String>, LongAdder> bigramCounts = new HashMap<>(BUFFER_SIZE);

Interner<String> interner =  Interners.newStrongInterner();

String prevToken = null;
Iterator<String> tokens = getTokensFromSomewhere();
while (tokens.hasNest()) {
  String token = interner.intern(tokens.next());
  if (prevToken != null) {
    Pair<String, String> bigram = new ImmutablePair(prevToken, token);
    LongAdder bigramCount = bigramCounts.computeIfAbsent(
        bigram,
        (c) -> new LongAdder()
    );
    bigramCount.increment();
    // If our buffer is full, we need to flush!
    boolean tooMuchMemoryPressure = bigramCounts.size() > BUFFER_SIZE;
    if (tooMuchMemoryPressure) {
      // Analyze the data, and write the partial results somewhere
      doSomeFancyAnalysis(bigramCounts);
      // Clear the buffer and start over
      bigramCounts.clear();
    }
  }
  prevToken = token;
}

此代码的问题在于，这是一种非常粗略的确定是否存在 tooMuchMemoryPressure.

的方法

我想运行在许多不同类型的硬件上执行此作业，具有不同的内存量。无论实例如何，我都希望此代码自动调整以最大化内存消耗。

而不是使用像 BUFFER_SIZE 这样的硬编码常量（通过实验、启发式、猜测得出），我实际上只想询问 JVM 内存是否快满了。但这是一个非常复杂的问题，考虑到 mark/sweep 算法的复杂性，以及所有不同世代的收集器。

假设此批处理作业可能运行在具有不同可用内存量的各种不同机器上完成这样的事情，什么是好的通用方法？我不需要非常精确...我只是在寻找一个粗略的信号，以根据实际堆的状态知道我需要尽快刷新缓冲区。

Answer 1

初步了解进程堆 space 正在发生的事情的最简单方法是 Runtime.freeMemory() 以及 .maxMemory 和 .totalMemory。然而，第一个不考虑垃圾，所以 under-estimation 充其量只是在 GC 开始之前可能完全误导。

假设对于您的应用程序“内存压力”基本上意味着“（很快）不够”，有趣的值是 GC 后立即释放内存。

这可以通过使用 GarbageCollectorMXBean 它提供 GcInfo GC 后的内存使用情况。

这个 bean 可以在 GC 之后被观察到，因为它是一个 NotificationEmitter，尽管这在 Javadoc 中没有被公布。 a longer example 之后的一些最小代码是

  void registerCallback() {
    List<GarbageCollectorMXBean> gcbeans =
      java.lang.management.ManagementFactory.getGarbageCollectorMXBeans();
    for (GarbageCollectorMXBean gcbean : gcbeans) {
      System.out.println(gcbean.getName());
      NotificationEmitter emitter = (NotificationEmitter) gcbean;
      emitter.addNotificationListener(this::handle, null, null);
    }
  }

  private void handle(Notification notification, Object handback) {
    if (!notification.getType()
      .equals(GarbageCollectionNotificationInfo.GARBAGE_COLLECTION_NOTIFICATION)) {
      return;
    }
    GarbageCollectionNotificationInfo info = GarbageCollectionNotificationInfo
      .from((CompositeData) notification.getUserData());
    GcInfo gcInfo = info.getGcInfo();
    gcInfo.getMemoryUsageAfterGc().forEach((name, memUsage) -> {
      System.err.println(name+ "->" + memUsage);
    });
  }

将有多个 memUsage 条目，这也会因 GC 的不同而有所不同。但是根据提供的值，used、committed 和 max 我们可以得出可用内存的上限，这应该再次给出 OP 要求的“粗略信号”。

doSomeFancyAnalysis 当然也需要它的新鲜内存份额，因此粗略估计每个二元语法需要分析多少，这可能是需要注意的限制。

如何检测 java 程序中的内存压力？

How to detect memory-pressure in a java program?

java

memory

garbage-collection