LUA 中字符串连接的 GC 的绝对可实现最小值?

Absolute achievable minimum of GC with string concat in LUA?

运行时: lua 5.1.x 在 ARM64 下编译,不允许使用 c 模块

示例代码,准备运行: https://paste.gg/p/anonymous/08f364480a5f470e9da610ab565e11c0

我需要在循环中每 X 毫秒连接一堆字符串。据我了解,LUA 支持 string interning,这意味着字符串文字被“缓存”,而不是每次都分配。因此,只有直接调用 tostring()(或 .. sugar)才会分配。现有字符串值的其余部分将通过引用传递。

到目前为止我做了什么:

最后的结果还是让我很伤心:

Allocated pre-concat: 2.486328125 KB
Allocated post-concat: 39.7451171875 KB
Total table meta bytes: 1544 B
Total tostring meta bytes: 273 B

我是否遗漏了什么或者我是否达到了 LUA 的极限?

我假设你提到的问题与函数CONTAINER.PopulateState的内存消耗有关。我认为您的代码没问题,但您没有衡量正确的事情。我删除了所有 collectgarbage 以便将它们收集到代码的单个部分中:

print("Allocated PRE-concat:                " ..  tostring(collectgarbage("count")))

-- First time
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" ..  tostring(collectgarbage("count")))
collectgarbage("collect") 
print("Allocated POST-concat  AFTER-COLLECT:" ..  tostring(collectgarbage("count")))

-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" ..  tostring(collectgarbage("count")))
collectgarbage("collect") 
print("Allocated POST-concat  AFTER-COLLECT:" ..  tostring(collectgarbage("count")))

-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" ..  tostring(collectgarbage("count")))
collectgarbage("collect") 
print("Allocated POST-concat  AFTER-COLLECT:" ..  tostring(collectgarbage("count")))

结果非常不同,更有意义:

Allocated PRE-concat:                48.70703125
Allocated POST-concat BEFORE-COLLECT:54.3232421875
Allocated POST-concat  AFTER-COLLECT:51.8515625
Allocated POST-concat BEFORE-COLLECT:54.5576171875
Allocated POST-concat  AFTER-COLLECT:51.8515625
Allocated POST-concat BEFORE-COLLECT:54.5576171875
Allocated POST-concat  AFTER-COLLECT:51.8515625

在程序 initialization 之后和调用 CONTAINER.PopulateState() 之前,程序已经使用了 48.7 KB。

在对CONTAINER.PopulateState()的第一次调用中,有一个小的3 KB 的内存增加似乎是持久的:这个内存似乎在程序执行中没有被释放。这可能是由于字节码编译、缓存或内部使用造成的。

但是CONTAINER.PopulateState()的以下执行通常使用2.7 KB 内存,并且每次都会释放此内存。程序行为似乎非常一致:CONTAINER.PopulateState() 的执行不会使程序使用更多内存。实际上,与程序的其余部分 (48 KB) 相比,函数 CONTAINER.PopulateState() (2.7 KB) 使用的临时内存可以忽略不计。

如果您想更好地控制正在发生的事情,您可以使用 C 语言实现这部分,并提供 Lua.

的接口

完整代码:

CONTAINER =
{
      Ver = "0.3",
      --- integer lookup for the DateTime
      timeLUT = {[0]="00",[1]="01",[2]="02",[3]="03",[4]="04",[5]="05",[6]="06",[7]="07",[8]="08",[9]="09"},
      strCACHE = { [100] = ""},
      SubStrA  = "Unknown",
      SubAPrst = "ASjdasda",
}
     
    
for i = 10,99,1 do
  CONTAINER.timeLUT[i] = tostring(i)
end
        
DataBlob = {
  vAng = { x = 1.0, y = 2.0, z = 3.0},
  vPos = { x = 2131.0, y = 42.0, z = -433.0},
    
  Composite =
  {
        VARIANT1 = { isFirst = true, isMiddle = false, isLast = true },
        VARIANT2 = { isIgnored = true},
        VARIANT3 = { isAccurate = false },
        VARIANT4 = { bEnabled = false },
        VARIANT5 = { isLocked = false, ImpactV = 1.8 },
        VARIANT6 = { troCoWal = true },
        VARIANT7 = { isBroCal = false }
  } 
} 
    
Global = {
  isLocked = function(x)return false end,
  GetTimeStamp = function(x)return math.random() + math.random(1, 99) end,
  GetLocalTimeStamp = function(x)return math.random() + math.random(1, 99) end,
  GetTotalPTime = function(x)return math.random() + math.random(1, 99) end,
  GetDataBlob = function(x)return DataBlob end,
  GetName = function(x)return "AThing" end
}
    
function CONTAINER.PopulateState()
 
    local gcInit = 0
    local gcLast = 0
    
  -- Cachig globals
    
  local floor, mod, tostring = math.floor, math.mod, tostring
  local G = Global
  local intCache = CONTAINER.timeLUT
  local strBuilder = CONTAINER.strCACHE
    
  -- Fetching & caching data
  local locDB, Name = G.GetDataBlob(), G.GetName()
  local ts = G.GetTimeStamp()
  local lag = math.random() + math.random(1, 2)
    
  -- Local helpers
  local function sBool(bool)
    return bool and "1" or "0"
  end
    
  local t = 0
    
  function cAppend(cTbl, ...)
    for i=0, arg.n do
      cTbl[#cTbl+1] = arg[i]
      t = t +1
    end
  end

  function cClear(cTbl)
    for _=0, #cTbl do
      cTbl[#cTbl] = nil
    end
  end
        
  -- Populating table
  cClear(strBuilder)
        
  if locDB ~= nil then
    locDB = G.GetDataBlob()
    local PC = locDB.Composite
    local tp = G.GetTotalPTime()
    local d, h, m, s = floor(tp/86400), floor(mod(tp, 86400)/3600), floor(mod(tp,3600)/60), floor(mod(tp,60))
    
    cAppend(strBuilder,  "[", Name, "]:\n",
            "Ang :",      "(", tostring(locDB.vAng.x),",",tostring(locDB.vAng.y),",",tostring(locDB.vAng.z), ")\n",
            "Pos :",      "(", tostring(locDB.vPos.x),",",tostring(locDB.vPos.y),",",tostring(locDB.vPos.z), ")\n",
            "isLocked: ", sBool(G.isLocked()),  "\n")
        
    if (locDB.Composite["VARIANT1"] ~= nil) then
      cAppend(strBuilder, "isFirst / isLast: ", sBool(PC.VARIANT1.isFirst)," / ",sBool(PC.VARIANT1.isLast), "\n",   
              "isMiddle: ",         sBool(PC.VARIANT1.isMiddle), "\n")
    end
    
    if (locDB.Composite["VARIANT2"] ~= nil) then
      cAppend(strBuilder, "isIgnored: ",  sBool(PC.VARIANT2.isIgnored),  "\n")
    end
    
    if (locDB.Composite["VARIANT4"] ~= nil) then
      cAppend(strBuilder, "bEnabled: ",   sBool(PC.VARIANT4.bEnabled),   "\n")
    end
    
    if (locDB.Composite["VARIANT3"] ~= nil) then
      cAppend(strBuilder, "isAccurate: ", sBool(PC.VARIANT3.isAccurate), "\n")
    end
    
    if (locDB.Composite["VARIANT5"] ~= nil) then
      cAppend(strBuilder, "isLocked: ",   sBool(PC.VARIANT5.isLocked),   "\n",
              "ImpactV: ",    tostring(PC.VARIANT5.ImpactV), "\n")
    end
    
    if (locDB.Composite["VARIANT6"]) then
      cAppend(strBuilder, "troCoWal: ",   sBool(PC.VARIANT6.troCoWal),   "\n")
    end
    
    if (locDB.Composite["VARIANT7"]) then
          cAppend(strBuilder, "isBroCal: ",   sBool(PC.VARIANT7.isBroCal),   "\n")
    end
    
    cAppend(strBuilder, "Time taken: ",intCache[d],":",intCache[h],":",intCache[m],":",intCache[s], "\n",
    
                        "TS: ",        tostring(ts),                   "\n",    
                        "local TS: ",  tostring(G.GetLocalTimeStamp()),"\n",    
                        "Lag: ",       string.format("%.5f", lag) , " ms\n",    
                        "Heap: ",      tostring(gcLast),             "KB\n")
    
 
    
    cAppend(strBuilder, "Alloc: ",     tostring(gcLast-gcInit),"KB"," (v", CONTAINER.Ver, ")","\n",
    
                        "Extra: ",     CONTAINER.SubStrA, "_", CONTAINER.SubAPrst,             "\n")    
  end
end
    
 
print("Allocated PRE-concat:                " ..  tostring(collectgarbage("count")))

-- First time
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" ..  tostring(collectgarbage("count")))
collectgarbage("collect") 
print("Allocated POST-concat  AFTER-COLLECT:" ..  tostring(collectgarbage("count")))

-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" ..  tostring(collectgarbage("count")))
collectgarbage("collect") 
print("Allocated POST-concat  AFTER-COLLECT:" ..  tostring(collectgarbage("count")))

-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" ..  tostring(collectgarbage("count")))
collectgarbage("collect") 
print("Allocated POST-concat  AFTER-COLLECT:" ..  tostring(collectgarbage("count")))

您想尽量减少字符串对象的中间分配次数,以降低 GC 压力并减慢 GC 命中率。在这种情况下,我建议您 将自己限制为 1 次调用 string.format 使用您想要格式化的字符串:

  • 可以全局声明字符串格式,这样就interned一次。
  • string.format代码可以读取here。从这段代码中我们可以看出,中间字符串转换是在 C 堆栈上完成的,缓冲区为 LUAL_BUFFERSIZE 字节。此尺寸在luaconf.h中声明,可根据您的需要定制。这种方法对于您的用例应该是最有效的,因为您只需删除所有中间步骤(table 插入、table.concat 等)。
local MY_STRING_FORMAT = [[My Very Big String
param-string-1 %d
param-string-2 %x
param-string-3 %f
param-string-4 %d
param-string-5 %d
]]

string.format(MY_STRING_FORMAT,
              Param1,
              Param2,
              Param3,
              Param4,
              Param5,
              etc...)