高效拆分字符串

Efficiently splitting strings

作为日志解析的结果,我有一个包含主机名和偶尔包含 IP 地址的字段。我需要进一步处理该字段中的数据以从主机名解析域。 IE。如果主机名是 googleanalytics.google.com 我想尽可能高效地从中解析 google.com,因为系统每秒处理数千条日志消息。

我现在有的是:

-- Save hostname into a temporary variable
local tempMetaValue = hostname

local count = 0
local byte_char = string.byte(".")
for i = 1, #tempMetaValue do
    if string.byte(tempMetaValue, i) == byte_char then
        count = count + 1
    end
end

local dotCount = count

-- If there was only one dot do nothing
if dotCount == 1 then
    return 0

-- Check whether there were more than one dot
elseif dotCount == 2 then
    -- Get the index of the first dot
    local beginIndex = string.find(tempMetaValue,".",1,true)
    -- Get the substring starting after the first dot
    local domainMeta = string.sub(tempMetaValue,beginIndex+1)
    -- Double check that the substring exists
    if domainMeta ~= nil then
        -- Populate the domain meta field
    end
-- If there are more than two dots..
elseif dotCount > 2 then
    -- Test to see if the hostname is actually an IP address
    if tempMetaValue:match("%d%d?%d?%.%d%d?%d?%.%d%d?%d?%.%d%d?%d?") then
        -- Skip the rest if an IP address was found
    end
    -- Get the index of the second to last dot
    local beginIndex = string.find(tempMetaValue,"\.[^\.]*\.[^\.]*$")
    -- Get the substring starting after the second to last dot
    local domainMeta = string.sub(tempMetaValue,beginIndex+1)
    -- Double check that the substring exists
    if domainMeta ~= nil then
        -- Populate the domain meta field
    end
end

我感觉他的解决方案可能不是最快的。 "A feeling"因为在这之前我对Lua的经验是零,但对于这么简单的任务来说似乎太长了。

我尝试创建一个解决方案,其中类似于拆分的操作,例如Java 将被执行,它会留下最后一个标记 "unsplit",从而留下我真正想要的部分(域),但这些尝试无处可去。所以基本上对于该解决方案,我想创建与主机名值中的点一样多的令牌,即 googleanalytics.google.com 将分为 "googleanalytics" 和 "google.com".

这样的事情符合您的要求吗?

function getdomain(str)
    -- Grad just the last two dotted parts of the string.
    local domain = str:match("%.?([^.]+%.[^.]+)$")
    -- If we have dotted parts and they are all numbers then this is an IP address.
    if domain and tonumber((domain:gsub("%.", ""))) then
        return nil
    end
    return domain
end

print(getdomain("googleanalytics.google.com"))
print(getdomain("foobar.com"))
print(getdomain("1.2.3.4"))
print(getdomain("something.else.longer.than.that.com"))
print(getdomain("foobar"))

那个 "is it an IP address" 测试非常愚蠢,很可能应该成为一个更强大的测试,但对于服务的快速演示。