docprocservice 和容器上的频繁 GC

Question

我是运行针对 vespa 的性能测试，容器看起来很慢，无法处理传入的更多请求。查看vespa.log，有很多GC分配失败的日志。但是，系统资源非常低 (CPU<30%, mem<35%)。有什么配置可以优化吗？

顺便说一句，看起来 docprocservice 默认情况下在内容节点上运行，如何为 docprocservice 调整 jvmargs？

1523361302.261056        24298   container       stdout  info    [GC (Allocation Failure)  3681916K->319796K(7969216K), 0.0521448 secs]
1523361302.772183        24301   docprocservice  stdout  info    [GC (Allocation Failure)  729622K->100400K(1494272K), 0.0058702 secs]
1523361306.478681        24301   docprocservice  stdout  info    [GC (Allocation Failure)  729648K->99337K(1494272K), 0.0071413 secs]
1523361308.275909        24298   container       stdout  info    [GC (Allocation Failure)  3675316K->325043K(7969216K), 0.0669859 secs]
1523361309.798619        24301   docprocservice  stdout  info    [GC (Allocation Failure)  728585K->100538K(1494272K), 0.0060528 secs]
1523361313.530767        24301   docprocservice  stdout  info    [GC (Allocation Failure)  729786K->100561K(1494272K), 0.0088941 secs]
1523361314.549254        24298   container       stdout  info    [GC (Allocation Failure)  3680563K->330211K(7969216K), 0.0531680 secs]
1523361317.571889        24301   docprocservice  stdout  info    [GC (Allocation Failure)  729809K->100551K(1494272K), 0.0062653 secs]
1523361320.736348        24298   container       stdout  info    [GC (Allocation Failure)  3685729K->316908K(7969216K), 0.0595787 secs]
1523361320.839502        24301   docprocservice  stdout  info    [GC (Allocation Failure)  729799K->99311K(1494272K), 0.0069882 secs]
1523361324.948995        24301   docprocservice  stdout  info    [GC (Allocation Failure)  728559K->99139K(1494272K), 0.0127939 secs]

services.xml:
<container id="container" version="1.0">                                                                                               
    <config name="container.handler.threadpool">                                                                                         
        <maxthreads>10000</maxthreads>                                                                                                   
    </config>                                                                                                                            

    <config name="config.docproc.docproc">                                                                                               
      <numthreads>500</numthreads>                                                                                                      
    </config>                                                                                                                            

    <config name="search.config.qr-start">                                                                                               
      <jvm>                                                                                                                              
        <heapSizeAsPercentageOfPhysicalMemory>60</heapSizeAsPercentageOfPhysicalMemory>                                                  
      </jvm>                                                                                                                             
    </config>                                                                                                                            
    <document-api />                                                                                                                     

    <search>                                                                                                                             
        <provider id="music" cluster="music" cachesize="64M" type="local" />                                                           
    </search>                                                                                                                            

    <nodes>                                                                                                                              
      <node hostalias="admin0" />                                                                                                        
      <node hostalias="node2" />                                                                                                         
    </nodes>                                                                                                                             
  </container>

# free -lh
              total        used        free      shared  buff/cache   available
Mem:           125G         43G         18G        177M         63G         80G
Low:           125G        106G         18G
High:            0B          0B          0B
Swap:            0B          0B          0B

Answer 1

那些 GC 消息来自 jvm 并且是正常的而不是真正的故障。这就是 JVM 的工作方式，收集应用程序创建的垃圾，所有这些都是来自年轻一代的次要收集。如果您开始看到 Full GC 消息，则需要进行调整。

'docprocservice' 也不参与搜索服务，因此您可以安全地忽略那些服务测试。您的瓶颈很可能是底层内容层。那里的资源使用情况如何？无论如何，运行 10K maxthreads 似乎过多，默认的 500 已经绰绰有余 - 你使用的是哪种基准测试客户端？

Answer 2

一般来说，如果您提供

会更容易提供帮助

设置和硬件配置（例如 services.xml 和文档架构）
正在使用什么类型的 queries/ranking 配置文件，搜索的字段等。文档总数以及如果您使用自定义排名配置文件，结果与使用 built-in 'unranked' 排名简介。
返回的平均命中数 (&hits=x) 参数和平均总命中数
资源使用（例如，当延迟开始攀升超过目标延迟 SLA（瓶颈 reached/max 吞吐量）时，来自容器和内容节点的 vmstat/top/network 实用程序
同上，但只有一个客户端（无并发）。如果您已经超过了目标延迟 SLA/expectation 而没有并发，您可能必须检查正在使用的功能（例如将 rank:filter 添加到未排名的字段，将 fast-search 添加到涉及的属性查询等）
使用的基准客户端（例如连接数和使用的参数）。我们通常使用 vespa-fbench 工具。

关于基准测试和分析 Vespa 的一些一般资源

对 Vespa 进行基准测试（包括我们自己的使用持久连接的基准测试客户端，如果您使用 none-persistent 连接进行基准测试，您最终可能会对 OS 维护 tcp 连接的能力进行基准测试）http://docs.vespa.ai/documentation/performance/vespa-benchmarking.html
分析和大小调整 http://docs.vespa.ai/documentation/performance/
功能调整http://docs.vespa.ai/documentation/performance/feature-tuning.html
缩放 Vespa http://docs.vespa.ai/documentation/performance/sizing-search.html 这有一些有趣的图表（例如，总体延迟和总命中率之间的预期关系以及达到饱和时预期延迟分解）。

docprocservice 和容器上的频繁 GC

Frequent GC on docprocservice and container

vespa