Aerospike 将哪种哈希函数用于 UDF 模块?

What kind of hash function does Aerospike use for UDF modules?

Aerospike 允许列出已加载的 UDF 模块。这是一个 aql-示例 (taken from help-portal):

SHOW MODULES

结果包含hash字段:

aql> show modules
+---------------------------+-------+------------------------+
| module                    | type  | hash                   |
+---------------------------+-------+------------------------+
| "example1.lua"            | "lua" | "033671e05067888fce09" |
| "example2.lua"            | "lua" | "07b42082cca8e73a96b2" |
+---------------------------+-------+------------------------+
2 rows in set (0.000 secs)

我的问题 (1):我有 example1.lua 文件(UDF 模块的来源),如何计算(和检查)它的哈希值?

我的假设:

  1. 哈希取决于源文件 - TRUE
  2. 哈希值取决于加载时间戳 - 失败
  3. 哈希值取决于文件名 - TRUE
  4. md5sum 在源文件上 - 失败(显然,参见假设 3)
  5. sha1sum 纯源文件 - 失败(见假设 3)

实际上,我想检查已注册模块的版本,并在必要时升级该模块。如文档中所述:

the hash value of the file. Most users will not find the hash value useful, but some may use it to verify the version or instance of a UDF on the server.

所以,备选问题(2)是:如何查看注册的UDF模块的版本?

它是以下 3 个字段的 20 字节 sha1 散列,表示为 json 文档。 json 文档是没有任何换行符或 EOF 字符的字符串表示形式。

"content64" : base64 encoding of whole file
"type" : LUA
"name" : filename

示例:

{"content64": "ZnVuY3Rpb24gcHJpbnRfbWV0YShyZWMpCgoJaWYgbm90IGFlcm9zcGlrZTpleGlzdHMocmVjKSB0aGVuCgkJcmV0dXJuICJub3JlYyIKCWVuZAoKCWluZm8ocmVjb3JkLnR0bChyZWMpKQoJaW5mbyhyZWNvcmQubGFzdF91cGRhdGVfdGltZShyZWMpKQoJcmV0dXJuIHJlY29yZC5sYXN0X3VwZGF0ZV90aW1lKHJlYykKZW5kCg==", "type": "LUA", "name": "lut.lua"}

交叉检查:(忽略为 echo 命令添加的 '' 转义字符)

$ echo -n {\"content64\": \"ZnVuY3Rpb24gcHJpbnRfbWV0YShyZWMpCgoJaWYgbm90IGFlcm9zcGlrZTpleGlzdHMocmVjKSB0aGVuCgkJcmV0dXJuICJub3JlYyIKCWVuZAoKCWluZm8ocmVjb3JkLnR0bChyZWMpKQoJaW5mbyhyZWNvcmQubGFzdF91cGRhdGVfdGltZShyZWMpKQoJcmV0dXJuIHJlY29yZC5sYXN0X3VwZGF0ZV90aW1lKHJlYykKZW5kCg==\", \"type\": \"LUA\", \"name\": \"lut.lua\"} | sha1sum 
998354a59337b229e2dd777a3288e8e8f33568a5  -

$ asinfo -v "udf-list"
filename=lut.lua,hash=998354a59337b229e2dd777a3288e8e8f33568a5,type=LUA;

除了@sunil 的回答

Python 示例

此例程使用 AeroSpike 的 UDF 计算 LUA 模块的哈希值,写在 python 3

import base64
import hashlib

# You could pass one argument - full path, and then extract 
# name. But here we have only this naive implementation

def udf_module_get_hash(MODULE_PATH, MODULE_NAME_WITHOUT_DIR_WITH_EXT):
    with open(MODULE_PATH, 'rb') as f:
        content = f.read()
    b64 = base64.b64encode(content)

    meta = '{{"content64": "{0}", "type": "LUA", "name": "{1}"}}'.format(
        b64.decode('ascii'),
        MODULE_NAME_WITHOUT_DIR_WITH_EXT
    )
    sha1 = hashlib.sha1()
    sha1.update(meta.encode())

    h = sha1.hexdigest()
    # h is something like '052ac7359e46d1c6c97a5bf1a9854739cd9e481a'
    return h