LUA 中字符串连接的 GC 的绝对可实现最小值?
Absolute achievable minimum of GC with string concat in LUA?
运行时: lua 5.1.x 在 ARM64 下编译,不允许使用 c 模块
示例代码,准备运行:
https://paste.gg/p/anonymous/08f364480a5f470e9da610ab565e11c0
我需要在循环中每 X 毫秒连接一堆字符串。据我了解,LUA 支持 string interning,这意味着字符串文字被“缓存”,而不是每次都分配。因此,只有直接调用 tostring()
(或 ..
sugar)才会分配。现有字符串值的其余部分将通过引用传递。
到目前为止我做了什么:
- 消除了所有整数->字符串分配(通过 LUT)
- 尽管
tostring(bool)
return 从缓存中插入了字符串,但我也删除了它
- 通过 table 创建了伪字符串生成器,它通过索引工作(每个约 16B)
- “预先调整大小”说 table 以避免关联加法的成本并使其成为全局的,因此不会每次都收集和重新创建
- 使用 table.concat() 进行最终的大字符串连接
最后的结果还是让我很伤心:
Allocated pre-concat: 2.486328125 KB
Allocated post-concat: 39.7451171875 KB
Total table meta bytes: 1544 B
Total tostring meta bytes: 273 B
我是否遗漏了什么或者我是否达到了 LUA 的极限?
我假设你提到的问题与函数CONTAINER.PopulateState
的内存消耗有关。我认为您的代码没问题,但您没有衡量正确的事情。我删除了所有 collectgarbage
以便将它们收集到代码的单个部分中:
print("Allocated PRE-concat: " .. tostring(collectgarbage("count")))
-- First time
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
结果非常不同,更有意义:
Allocated PRE-concat: 48.70703125
Allocated POST-concat BEFORE-COLLECT:54.3232421875
Allocated POST-concat AFTER-COLLECT:51.8515625
Allocated POST-concat BEFORE-COLLECT:54.5576171875
Allocated POST-concat AFTER-COLLECT:51.8515625
Allocated POST-concat BEFORE-COLLECT:54.5576171875
Allocated POST-concat AFTER-COLLECT:51.8515625
在程序 initialization
之后和调用 CONTAINER.PopulateState()
之前,程序已经使用了 48.7 KB。
在对CONTAINER.PopulateState()
的第一次调用中,有一个小的3 KB 的内存增加似乎是持久的:这个内存似乎在程序执行中没有被释放。这可能是由于字节码编译、缓存或内部使用造成的。
但是CONTAINER.PopulateState()
的以下执行通常使用2.7 KB 内存,并且每次都会释放此内存。程序行为似乎非常一致:CONTAINER.PopulateState()
的执行不会使程序使用更多内存。实际上,与程序的其余部分 (48 KB) 相比,函数 CONTAINER.PopulateState()
(2.7 KB) 使用的临时内存可以忽略不计。
如果您想更好地控制正在发生的事情,您可以使用 C
语言实现这部分,并提供 Lua
.
的接口
完整代码:
CONTAINER =
{
Ver = "0.3",
--- integer lookup for the DateTime
timeLUT = {[0]="00",[1]="01",[2]="02",[3]="03",[4]="04",[5]="05",[6]="06",[7]="07",[8]="08",[9]="09"},
strCACHE = { [100] = ""},
SubStrA = "Unknown",
SubAPrst = "ASjdasda",
}
for i = 10,99,1 do
CONTAINER.timeLUT[i] = tostring(i)
end
DataBlob = {
vAng = { x = 1.0, y = 2.0, z = 3.0},
vPos = { x = 2131.0, y = 42.0, z = -433.0},
Composite =
{
VARIANT1 = { isFirst = true, isMiddle = false, isLast = true },
VARIANT2 = { isIgnored = true},
VARIANT3 = { isAccurate = false },
VARIANT4 = { bEnabled = false },
VARIANT5 = { isLocked = false, ImpactV = 1.8 },
VARIANT6 = { troCoWal = true },
VARIANT7 = { isBroCal = false }
}
}
Global = {
isLocked = function(x)return false end,
GetTimeStamp = function(x)return math.random() + math.random(1, 99) end,
GetLocalTimeStamp = function(x)return math.random() + math.random(1, 99) end,
GetTotalPTime = function(x)return math.random() + math.random(1, 99) end,
GetDataBlob = function(x)return DataBlob end,
GetName = function(x)return "AThing" end
}
function CONTAINER.PopulateState()
local gcInit = 0
local gcLast = 0
-- Cachig globals
local floor, mod, tostring = math.floor, math.mod, tostring
local G = Global
local intCache = CONTAINER.timeLUT
local strBuilder = CONTAINER.strCACHE
-- Fetching & caching data
local locDB, Name = G.GetDataBlob(), G.GetName()
local ts = G.GetTimeStamp()
local lag = math.random() + math.random(1, 2)
-- Local helpers
local function sBool(bool)
return bool and "1" or "0"
end
local t = 0
function cAppend(cTbl, ...)
for i=0, arg.n do
cTbl[#cTbl+1] = arg[i]
t = t +1
end
end
function cClear(cTbl)
for _=0, #cTbl do
cTbl[#cTbl] = nil
end
end
-- Populating table
cClear(strBuilder)
if locDB ~= nil then
locDB = G.GetDataBlob()
local PC = locDB.Composite
local tp = G.GetTotalPTime()
local d, h, m, s = floor(tp/86400), floor(mod(tp, 86400)/3600), floor(mod(tp,3600)/60), floor(mod(tp,60))
cAppend(strBuilder, "[", Name, "]:\n",
"Ang :", "(", tostring(locDB.vAng.x),",",tostring(locDB.vAng.y),",",tostring(locDB.vAng.z), ")\n",
"Pos :", "(", tostring(locDB.vPos.x),",",tostring(locDB.vPos.y),",",tostring(locDB.vPos.z), ")\n",
"isLocked: ", sBool(G.isLocked()), "\n")
if (locDB.Composite["VARIANT1"] ~= nil) then
cAppend(strBuilder, "isFirst / isLast: ", sBool(PC.VARIANT1.isFirst)," / ",sBool(PC.VARIANT1.isLast), "\n",
"isMiddle: ", sBool(PC.VARIANT1.isMiddle), "\n")
end
if (locDB.Composite["VARIANT2"] ~= nil) then
cAppend(strBuilder, "isIgnored: ", sBool(PC.VARIANT2.isIgnored), "\n")
end
if (locDB.Composite["VARIANT4"] ~= nil) then
cAppend(strBuilder, "bEnabled: ", sBool(PC.VARIANT4.bEnabled), "\n")
end
if (locDB.Composite["VARIANT3"] ~= nil) then
cAppend(strBuilder, "isAccurate: ", sBool(PC.VARIANT3.isAccurate), "\n")
end
if (locDB.Composite["VARIANT5"] ~= nil) then
cAppend(strBuilder, "isLocked: ", sBool(PC.VARIANT5.isLocked), "\n",
"ImpactV: ", tostring(PC.VARIANT5.ImpactV), "\n")
end
if (locDB.Composite["VARIANT6"]) then
cAppend(strBuilder, "troCoWal: ", sBool(PC.VARIANT6.troCoWal), "\n")
end
if (locDB.Composite["VARIANT7"]) then
cAppend(strBuilder, "isBroCal: ", sBool(PC.VARIANT7.isBroCal), "\n")
end
cAppend(strBuilder, "Time taken: ",intCache[d],":",intCache[h],":",intCache[m],":",intCache[s], "\n",
"TS: ", tostring(ts), "\n",
"local TS: ", tostring(G.GetLocalTimeStamp()),"\n",
"Lag: ", string.format("%.5f", lag) , " ms\n",
"Heap: ", tostring(gcLast), "KB\n")
cAppend(strBuilder, "Alloc: ", tostring(gcLast-gcInit),"KB"," (v", CONTAINER.Ver, ")","\n",
"Extra: ", CONTAINER.SubStrA, "_", CONTAINER.SubAPrst, "\n")
end
end
print("Allocated PRE-concat: " .. tostring(collectgarbage("count")))
-- First time
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
您想尽量减少字符串对象的中间分配次数,以降低 GC 压力并减慢 GC 命中率。在这种情况下,我建议您 将自己限制为 1 次调用 到 string.format
使用您想要格式化的字符串:
- 可以全局声明字符串格式,这样就
interned
一次。
string.format
代码可以读取here。从这段代码中我们可以看出,中间字符串转换是在 C
堆栈上完成的,缓冲区为 LUAL_BUFFERSIZE
字节。此尺寸在luaconf.h
中声明,可根据您的需要定制。这种方法对于您的用例应该是最有效的,因为您只需删除所有中间步骤(table 插入、table.concat 等)。
local MY_STRING_FORMAT = [[My Very Big String
param-string-1 %d
param-string-2 %x
param-string-3 %f
param-string-4 %d
param-string-5 %d
]]
string.format(MY_STRING_FORMAT,
Param1,
Param2,
Param3,
Param4,
Param5,
etc...)
运行时: lua 5.1.x 在 ARM64 下编译,不允许使用 c 模块
示例代码,准备运行: https://paste.gg/p/anonymous/08f364480a5f470e9da610ab565e11c0
我需要在循环中每 X 毫秒连接一堆字符串。据我了解,LUA 支持 string interning,这意味着字符串文字被“缓存”,而不是每次都分配。因此,只有直接调用 tostring()
(或 ..
sugar)才会分配。现有字符串值的其余部分将通过引用传递。
到目前为止我做了什么:
- 消除了所有整数->字符串分配(通过 LUT)
- 尽管
tostring(bool)
return 从缓存中插入了字符串,但我也删除了它 - 通过 table 创建了伪字符串生成器,它通过索引工作(每个约 16B)
- “预先调整大小”说 table 以避免关联加法的成本并使其成为全局的,因此不会每次都收集和重新创建
- 使用 table.concat() 进行最终的大字符串连接
最后的结果还是让我很伤心:
Allocated pre-concat: 2.486328125 KB
Allocated post-concat: 39.7451171875 KB
Total table meta bytes: 1544 B
Total tostring meta bytes: 273 B
我是否遗漏了什么或者我是否达到了 LUA 的极限?
我假设你提到的问题与函数CONTAINER.PopulateState
的内存消耗有关。我认为您的代码没问题,但您没有衡量正确的事情。我删除了所有 collectgarbage
以便将它们收集到代码的单个部分中:
print("Allocated PRE-concat: " .. tostring(collectgarbage("count")))
-- First time
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
结果非常不同,更有意义:
Allocated PRE-concat: 48.70703125
Allocated POST-concat BEFORE-COLLECT:54.3232421875
Allocated POST-concat AFTER-COLLECT:51.8515625
Allocated POST-concat BEFORE-COLLECT:54.5576171875
Allocated POST-concat AFTER-COLLECT:51.8515625
Allocated POST-concat BEFORE-COLLECT:54.5576171875
Allocated POST-concat AFTER-COLLECT:51.8515625
在程序 initialization
之后和调用 CONTAINER.PopulateState()
之前,程序已经使用了 48.7 KB。
在对CONTAINER.PopulateState()
的第一次调用中,有一个小的3 KB 的内存增加似乎是持久的:这个内存似乎在程序执行中没有被释放。这可能是由于字节码编译、缓存或内部使用造成的。
但是CONTAINER.PopulateState()
的以下执行通常使用2.7 KB 内存,并且每次都会释放此内存。程序行为似乎非常一致:CONTAINER.PopulateState()
的执行不会使程序使用更多内存。实际上,与程序的其余部分 (48 KB) 相比,函数 CONTAINER.PopulateState()
(2.7 KB) 使用的临时内存可以忽略不计。
如果您想更好地控制正在发生的事情,您可以使用 C
语言实现这部分,并提供 Lua
.
完整代码:
CONTAINER =
{
Ver = "0.3",
--- integer lookup for the DateTime
timeLUT = {[0]="00",[1]="01",[2]="02",[3]="03",[4]="04",[5]="05",[6]="06",[7]="07",[8]="08",[9]="09"},
strCACHE = { [100] = ""},
SubStrA = "Unknown",
SubAPrst = "ASjdasda",
}
for i = 10,99,1 do
CONTAINER.timeLUT[i] = tostring(i)
end
DataBlob = {
vAng = { x = 1.0, y = 2.0, z = 3.0},
vPos = { x = 2131.0, y = 42.0, z = -433.0},
Composite =
{
VARIANT1 = { isFirst = true, isMiddle = false, isLast = true },
VARIANT2 = { isIgnored = true},
VARIANT3 = { isAccurate = false },
VARIANT4 = { bEnabled = false },
VARIANT5 = { isLocked = false, ImpactV = 1.8 },
VARIANT6 = { troCoWal = true },
VARIANT7 = { isBroCal = false }
}
}
Global = {
isLocked = function(x)return false end,
GetTimeStamp = function(x)return math.random() + math.random(1, 99) end,
GetLocalTimeStamp = function(x)return math.random() + math.random(1, 99) end,
GetTotalPTime = function(x)return math.random() + math.random(1, 99) end,
GetDataBlob = function(x)return DataBlob end,
GetName = function(x)return "AThing" end
}
function CONTAINER.PopulateState()
local gcInit = 0
local gcLast = 0
-- Cachig globals
local floor, mod, tostring = math.floor, math.mod, tostring
local G = Global
local intCache = CONTAINER.timeLUT
local strBuilder = CONTAINER.strCACHE
-- Fetching & caching data
local locDB, Name = G.GetDataBlob(), G.GetName()
local ts = G.GetTimeStamp()
local lag = math.random() + math.random(1, 2)
-- Local helpers
local function sBool(bool)
return bool and "1" or "0"
end
local t = 0
function cAppend(cTbl, ...)
for i=0, arg.n do
cTbl[#cTbl+1] = arg[i]
t = t +1
end
end
function cClear(cTbl)
for _=0, #cTbl do
cTbl[#cTbl] = nil
end
end
-- Populating table
cClear(strBuilder)
if locDB ~= nil then
locDB = G.GetDataBlob()
local PC = locDB.Composite
local tp = G.GetTotalPTime()
local d, h, m, s = floor(tp/86400), floor(mod(tp, 86400)/3600), floor(mod(tp,3600)/60), floor(mod(tp,60))
cAppend(strBuilder, "[", Name, "]:\n",
"Ang :", "(", tostring(locDB.vAng.x),",",tostring(locDB.vAng.y),",",tostring(locDB.vAng.z), ")\n",
"Pos :", "(", tostring(locDB.vPos.x),",",tostring(locDB.vPos.y),",",tostring(locDB.vPos.z), ")\n",
"isLocked: ", sBool(G.isLocked()), "\n")
if (locDB.Composite["VARIANT1"] ~= nil) then
cAppend(strBuilder, "isFirst / isLast: ", sBool(PC.VARIANT1.isFirst)," / ",sBool(PC.VARIANT1.isLast), "\n",
"isMiddle: ", sBool(PC.VARIANT1.isMiddle), "\n")
end
if (locDB.Composite["VARIANT2"] ~= nil) then
cAppend(strBuilder, "isIgnored: ", sBool(PC.VARIANT2.isIgnored), "\n")
end
if (locDB.Composite["VARIANT4"] ~= nil) then
cAppend(strBuilder, "bEnabled: ", sBool(PC.VARIANT4.bEnabled), "\n")
end
if (locDB.Composite["VARIANT3"] ~= nil) then
cAppend(strBuilder, "isAccurate: ", sBool(PC.VARIANT3.isAccurate), "\n")
end
if (locDB.Composite["VARIANT5"] ~= nil) then
cAppend(strBuilder, "isLocked: ", sBool(PC.VARIANT5.isLocked), "\n",
"ImpactV: ", tostring(PC.VARIANT5.ImpactV), "\n")
end
if (locDB.Composite["VARIANT6"]) then
cAppend(strBuilder, "troCoWal: ", sBool(PC.VARIANT6.troCoWal), "\n")
end
if (locDB.Composite["VARIANT7"]) then
cAppend(strBuilder, "isBroCal: ", sBool(PC.VARIANT7.isBroCal), "\n")
end
cAppend(strBuilder, "Time taken: ",intCache[d],":",intCache[h],":",intCache[m],":",intCache[s], "\n",
"TS: ", tostring(ts), "\n",
"local TS: ", tostring(G.GetLocalTimeStamp()),"\n",
"Lag: ", string.format("%.5f", lag) , " ms\n",
"Heap: ", tostring(gcLast), "KB\n")
cAppend(strBuilder, "Alloc: ", tostring(gcLast-gcInit),"KB"," (v", CONTAINER.Ver, ")","\n",
"Extra: ", CONTAINER.SubStrA, "_", CONTAINER.SubAPrst, "\n")
end
end
print("Allocated PRE-concat: " .. tostring(collectgarbage("count")))
-- First time
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
-- One more try
CONTAINER.PopulateState()
print("Allocated POST-concat BEFORE-COLLECT:" .. tostring(collectgarbage("count")))
collectgarbage("collect")
print("Allocated POST-concat AFTER-COLLECT:" .. tostring(collectgarbage("count")))
您想尽量减少字符串对象的中间分配次数,以降低 GC 压力并减慢 GC 命中率。在这种情况下,我建议您 将自己限制为 1 次调用 到 string.format
使用您想要格式化的字符串:
- 可以全局声明字符串格式,这样就
interned
一次。 string.format
代码可以读取here。从这段代码中我们可以看出,中间字符串转换是在C
堆栈上完成的,缓冲区为LUAL_BUFFERSIZE
字节。此尺寸在luaconf.h
中声明,可根据您的需要定制。这种方法对于您的用例应该是最有效的,因为您只需删除所有中间步骤(table 插入、table.concat 等)。
local MY_STRING_FORMAT = [[My Very Big String
param-string-1 %d
param-string-2 %x
param-string-3 %f
param-string-4 %d
param-string-5 %d
]]
string.format(MY_STRING_FORMAT,
Param1,
Param2,
Param3,
Param4,
Param5,
etc...)