Aerospike 将哪种哈希函数用于 UDF 模块?
What kind of hash function does Aerospike use for UDF modules?
Aerospike 允许列出已加载的 UDF 模块。这是一个 aql-示例 (taken from help-portal):
SHOW MODULES
结果包含hash
字段:
aql> show modules
+---------------------------+-------+------------------------+
| module | type | hash |
+---------------------------+-------+------------------------+
| "example1.lua" | "lua" | "033671e05067888fce09" |
| "example2.lua" | "lua" | "07b42082cca8e73a96b2" |
+---------------------------+-------+------------------------+
2 rows in set (0.000 secs)
我的问题 (1):我有 example1.lua
文件(UDF 模块的来源),如何计算(和检查)它的哈希值?
我的假设:
- 哈希取决于源文件 - TRUE
- 哈希值取决于加载时间戳 - 失败
- 哈希值取决于文件名 - TRUE
md5sum
在源文件上 - 失败(显然,参见假设 3)
sha1sum
纯源文件 - 失败(见假设 3)
实际上,我想检查已注册模块的版本,并在必要时升级该模块。如文档中所述:
the hash value of the file. Most users will not find the hash value useful, but some may use it to verify the version or instance of a UDF on the server.
所以,备选问题(2)是:如何查看注册的UDF模块的版本?
它是以下 3 个字段的 20 字节 sha1 散列,表示为 json 文档。 json 文档是没有任何换行符或 EOF 字符的字符串表示形式。
"content64" : base64 encoding of whole file
"type" : LUA
"name" : filename
示例:
{"content64": "ZnVuY3Rpb24gcHJpbnRfbWV0YShyZWMpCgoJaWYgbm90IGFlcm9zcGlrZTpleGlzdHMocmVjKSB0aGVuCgkJcmV0dXJuICJub3JlYyIKCWVuZAoKCWluZm8ocmVjb3JkLnR0bChyZWMpKQoJaW5mbyhyZWNvcmQubGFzdF91cGRhdGVfdGltZShyZWMpKQoJcmV0dXJuIHJlY29yZC5sYXN0X3VwZGF0ZV90aW1lKHJlYykKZW5kCg==", "type": "LUA", "name": "lut.lua"}
交叉检查:(忽略为 echo 命令添加的 '' 转义字符)
$ echo -n {\"content64\": \"ZnVuY3Rpb24gcHJpbnRfbWV0YShyZWMpCgoJaWYgbm90IGFlcm9zcGlrZTpleGlzdHMocmVjKSB0aGVuCgkJcmV0dXJuICJub3JlYyIKCWVuZAoKCWluZm8ocmVjb3JkLnR0bChyZWMpKQoJaW5mbyhyZWNvcmQubGFzdF91cGRhdGVfdGltZShyZWMpKQoJcmV0dXJuIHJlY29yZC5sYXN0X3VwZGF0ZV90aW1lKHJlYykKZW5kCg==\", \"type\": \"LUA\", \"name\": \"lut.lua\"} | sha1sum
998354a59337b229e2dd777a3288e8e8f33568a5 -
$ asinfo -v "udf-list"
filename=lut.lua,hash=998354a59337b229e2dd777a3288e8e8f33568a5,type=LUA;
除了@sunil 的回答
Python 示例
此例程使用 AeroSpike 的 UDF 计算 LUA 模块的哈希值,写在 python 3
import base64
import hashlib
# You could pass one argument - full path, and then extract
# name. But here we have only this naive implementation
def udf_module_get_hash(MODULE_PATH, MODULE_NAME_WITHOUT_DIR_WITH_EXT):
with open(MODULE_PATH, 'rb') as f:
content = f.read()
b64 = base64.b64encode(content)
meta = '{{"content64": "{0}", "type": "LUA", "name": "{1}"}}'.format(
b64.decode('ascii'),
MODULE_NAME_WITHOUT_DIR_WITH_EXT
)
sha1 = hashlib.sha1()
sha1.update(meta.encode())
h = sha1.hexdigest()
# h is something like '052ac7359e46d1c6c97a5bf1a9854739cd9e481a'
return h
Aerospike 允许列出已加载的 UDF 模块。这是一个 aql-示例 (taken from help-portal):
SHOW MODULES
结果包含hash
字段:
aql> show modules
+---------------------------+-------+------------------------+
| module | type | hash |
+---------------------------+-------+------------------------+
| "example1.lua" | "lua" | "033671e05067888fce09" |
| "example2.lua" | "lua" | "07b42082cca8e73a96b2" |
+---------------------------+-------+------------------------+
2 rows in set (0.000 secs)
我的问题 (1):我有 example1.lua
文件(UDF 模块的来源),如何计算(和检查)它的哈希值?
我的假设:
- 哈希取决于源文件 - TRUE
- 哈希值取决于加载时间戳 - 失败
- 哈希值取决于文件名 - TRUE
md5sum
在源文件上 - 失败(显然,参见假设 3)sha1sum
纯源文件 - 失败(见假设 3)
实际上,我想检查已注册模块的版本,并在必要时升级该模块。如文档中所述:
the hash value of the file. Most users will not find the hash value useful, but some may use it to verify the version or instance of a UDF on the server.
所以,备选问题(2)是:如何查看注册的UDF模块的版本?
它是以下 3 个字段的 20 字节 sha1 散列,表示为 json 文档。 json 文档是没有任何换行符或 EOF 字符的字符串表示形式。
"content64" : base64 encoding of whole file
"type" : LUA
"name" : filename
示例:
{"content64": "ZnVuY3Rpb24gcHJpbnRfbWV0YShyZWMpCgoJaWYgbm90IGFlcm9zcGlrZTpleGlzdHMocmVjKSB0aGVuCgkJcmV0dXJuICJub3JlYyIKCWVuZAoKCWluZm8ocmVjb3JkLnR0bChyZWMpKQoJaW5mbyhyZWNvcmQubGFzdF91cGRhdGVfdGltZShyZWMpKQoJcmV0dXJuIHJlY29yZC5sYXN0X3VwZGF0ZV90aW1lKHJlYykKZW5kCg==", "type": "LUA", "name": "lut.lua"}
交叉检查:(忽略为 echo 命令添加的 '' 转义字符)
$ echo -n {\"content64\": \"ZnVuY3Rpb24gcHJpbnRfbWV0YShyZWMpCgoJaWYgbm90IGFlcm9zcGlrZTpleGlzdHMocmVjKSB0aGVuCgkJcmV0dXJuICJub3JlYyIKCWVuZAoKCWluZm8ocmVjb3JkLnR0bChyZWMpKQoJaW5mbyhyZWNvcmQubGFzdF91cGRhdGVfdGltZShyZWMpKQoJcmV0dXJuIHJlY29yZC5sYXN0X3VwZGF0ZV90aW1lKHJlYykKZW5kCg==\", \"type\": \"LUA\", \"name\": \"lut.lua\"} | sha1sum
998354a59337b229e2dd777a3288e8e8f33568a5 -
$ asinfo -v "udf-list"
filename=lut.lua,hash=998354a59337b229e2dd777a3288e8e8f33568a5,type=LUA;
除了@sunil 的回答
Python 示例
此例程使用 AeroSpike 的 UDF 计算 LUA 模块的哈希值,写在 python 3
import base64
import hashlib
# You could pass one argument - full path, and then extract
# name. But here we have only this naive implementation
def udf_module_get_hash(MODULE_PATH, MODULE_NAME_WITHOUT_DIR_WITH_EXT):
with open(MODULE_PATH, 'rb') as f:
content = f.read()
b64 = base64.b64encode(content)
meta = '{{"content64": "{0}", "type": "LUA", "name": "{1}"}}'.format(
b64.decode('ascii'),
MODULE_NAME_WITHOUT_DIR_WITH_EXT
)
sha1 = hashlib.sha1()
sha1.update(meta.encode())
h = sha1.hexdigest()
# h is something like '052ac7359e46d1c6c97a5bf1a9854739cd9e481a'
return h