使用位操作确定多热编码的有效性
Determining the validity of a multi-hot encoding using bit manipulation
假设我有 N
个项目和一个表示结果中包含这些项目的二进制数:
N = 4
# items 1 and 3 will be included in the result
vector = 0b0101
# item 2 will be included in the result
vector = 0b0010
我还提供了一个列表冲突,指出哪些项目不能同时包含在结果中:
conflicts = [
0b0110, # any result that contains items 1 AND 2 is invalid
0b0111, # any result that contains AT LEAST 2 items from {1, 2, 3} is invalid
]
鉴于此冲突列表,我们可以确定早期 vector
s 的有效性:
# invalid as it triggers conflict 1: [0, 1, 1, 1]
vector = 0b0101
# valid as it triggers no conflicts
vector = 0b0010
在此上下文中,如何使用位操作来确定一个向量或大量向量的有效性以防止冲突列表?
提供的解决方案 已经让我们完成了大部分工作,但我不确定如何使其适应整数用例(以完全避免 numpy 数组和 numba)。
N = 4
# items 1 and 3 will be included in the result
vector = 0b0101
# item 2 will be included in the result
vector = 0b0010
conflicts = [
0b0110, # any result that contains items 1 AND 2 is invalid
0b0111, # any result that contains AT LEAST 2 items from {1, 2, 3} is invalid
]
def find_conflict(vector, conflicts):
found_conflict = False
for v in conflicts:
result = vector & v # do a logical AND operation
if result != 0: # there are common elements
number_of_bits_set = bin(result).count("1") # count number of common elements
if number_of_bits_set >= 2: # check common limit for detection of invalid vectors
found_conflict = True
print(f"..Conflict between {bin(vector)} and {bin(v)}: {bin(result)}")
if found_conflict:
print(f"Conflict found for {bin(vector)}.")
else:
print(f"No conflict found for {bin(vector)}.")
# invalid as it triggers conflict 1: [0, 1, 1, 1]
vector = 0b0101
find_conflict(vector, conflicts)
# valid as it triggers no conflicts
vector = 0b0010
find_conflict(vector, conflicts)
$ python3 pythontest.py
..Conflict between 0b101 and 0b111: 0b101
Conflict found for 0b101.
No conflict found for 0b10.
$
假设我有 N
个项目和一个表示结果中包含这些项目的二进制数:
N = 4
# items 1 and 3 will be included in the result
vector = 0b0101
# item 2 will be included in the result
vector = 0b0010
我还提供了一个列表冲突,指出哪些项目不能同时包含在结果中:
conflicts = [
0b0110, # any result that contains items 1 AND 2 is invalid
0b0111, # any result that contains AT LEAST 2 items from {1, 2, 3} is invalid
]
鉴于此冲突列表,我们可以确定早期 vector
s 的有效性:
# invalid as it triggers conflict 1: [0, 1, 1, 1]
vector = 0b0101
# valid as it triggers no conflicts
vector = 0b0010
在此上下文中,如何使用位操作来确定一个向量或大量向量的有效性以防止冲突列表?
提供的解决方案
N = 4
# items 1 and 3 will be included in the result
vector = 0b0101
# item 2 will be included in the result
vector = 0b0010
conflicts = [
0b0110, # any result that contains items 1 AND 2 is invalid
0b0111, # any result that contains AT LEAST 2 items from {1, 2, 3} is invalid
]
def find_conflict(vector, conflicts):
found_conflict = False
for v in conflicts:
result = vector & v # do a logical AND operation
if result != 0: # there are common elements
number_of_bits_set = bin(result).count("1") # count number of common elements
if number_of_bits_set >= 2: # check common limit for detection of invalid vectors
found_conflict = True
print(f"..Conflict between {bin(vector)} and {bin(v)}: {bin(result)}")
if found_conflict:
print(f"Conflict found for {bin(vector)}.")
else:
print(f"No conflict found for {bin(vector)}.")
# invalid as it triggers conflict 1: [0, 1, 1, 1]
vector = 0b0101
find_conflict(vector, conflicts)
# valid as it triggers no conflicts
vector = 0b0010
find_conflict(vector, conflicts)
$ python3 pythontest.py
..Conflict between 0b101 and 0b111: 0b101
Conflict found for 0b101.
No conflict found for 0b10.
$