pprof -call_tree 做什么？

Question

go tool pprof 有一个 -call_tree 选项，根据 -help，应该“创建上下文相关的调用树”。但是，CPU 配置文件上的 pprof -tree 使用和不使用此选项给我的输出完全相同。看起来像这样（一个代表节点）：

      flat  flat%   sum%        cum   cum%   calls calls% + context          
----------------------------------------------------------+-------------
                                             0.07s 35.00% |   google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders
                                             0.04s 20.00% |   golang.org/x/net/http2.(*Framer).readMetaFrame
                                             0.02s 10.00% |   github.com/Shopify/sarama.(*FetchResponse).decode
     0.06s  0.79% 51.18%      0.20s  2.63%                | runtime.mapassign_faststr
                                             0.05s 25.00% |   runtime.newobject (inline)
                                             0.03s 15.00% |   aeshashbody
                                             0.03s 15.00% |   runtime.mallocgc

这显然不是一棵树，尤其是因为它在 HTTP/2 下显示了 Sarama — 从 Kafka 消费（通过 Sarama）和服务 HTTP/2 是这个过程同时做的两个独立的事情。

为什么 -call_tree 不影响 -tree 的这个输出？一般来说，-call_tree是做什么的？

加分

我上面显示的输出节点的确切含义是什么？这是否意味着包含 mapassign_faststr 的样本中有 35% 也包含 operateHeaders，而 10% 包含 decode？下面几行呢，比如mallocgc?

我可以阅读哪些文档来回答上述问题？

Answer 1

Why doesn’t -call_tree affect this output of -tree

我认为 -call_tree 选项不会改变 -tree 的输出输出实际上不是一棵树，它输出树的节点（更多信息在额外的学分部分).

In general, what does -call_tree do?

当您查看使用 -png 标志生成的图像时，您可以看到不同之处，而没有 -call_tree 标志：并使用 -call_tree 标志：

因此，pprof 尝试根据上下文创建单独的树，而不是拥有 1 个调用树。在我的例子中（将列出它们，因为图像上的文本不可读）根是：

testing.(*B).launch（benchmark/test 框架）
runtime.gcBgMarkWorker（运行时GC的一部分）
runtime.bgsweep（运行时GC的一部分）
runtime.mcall（运行时调度程序的一部分）
runtime.morestack（与堆栈有关:)）

在非 -call_tree 图像中，这些节点仍然存在，但从中间树开始，就好像我们的代码直接调用这些后台处理。

基本上该选项的作用是 remove/hide 函数之间的不频繁调用，因此您最终会为每组经常相互调用的函数创建一棵树。

我没有对此进行测试，但我想 pprof 也会为用户代码执行这种上下文感知树分离。总而言之，它 returns 主观上更具可读性的树，或者至少是更相关的树。

What is the exact meaning of the output node I showed above?

-tree 选项尝试输出树，如图所示。但由于它是文本输出，它一次显示树的 1 个节点，context 列中的非缩进行是当前节点，上面的行是调用当前节点的节点，下面的行这个调用的节点（图像中的箭头）。

calls% 是传入或传出边缘的 "weight"，因此确实是函数调用的百分比。

What documents could I read to answer the above questions?

如果通过查看源代码弄清楚了所有这些，这里有一些关键部分，以防您感兴趣：

生成大部分输出的文件：https://github.com/google/pprof/blob/2007db6d4f53c44a417ddae675d50f56b8e8c2fd/internal/report/report.go
-tree选项的函数：https://github.com/google/pprof/blob/2007db6d4f53c44a417ddae675d50f56b8e8c2fd/internal/report/report.go#L1047
解释何时实际使用 -call_tree 的行：https://github.com/google/pprof/blob/2007db6d4f53c44a417ddae675d50f56b8e8c2fd/internal/report/report.go#L133

pprof -call_tree 做什么？

What does pprof -call_tree do?

profiling

go

pprof

加分