是否有规范化字符串并将其转换为 integers/floats 的函数?
Is there a function to normalize strings and convert them to integers/floats?
我有多个特征列表,它们是我要分析的字符串。
即,例如:
[["0.5", "0.4", "disabled", "0.7", "disabled"], ["feature1", "feature2", "feature4", "feature1", "feature3"]]
我知道如何将“0.5”之类的字符串转换为浮点数,但是有没有办法将此类列表“规范化”为整数或浮点值(在我的例子中每个列表都是独立的)?我想要这样的东西:
[[2, 1, 0, 3, 0], [0, 1, 3, 0, 2]]
有人知道如何实现吗?不幸的是,我还找不到与此问题相关的任何信息。
有点乱,但应该可以做你想做的事 - 使用字典来跟踪你使用过的列表中的项目。您可以用生成器替换 for 循环以减少冗长。
def track_items_in_list(test_list):
outer_list = []
# iterate through outer list
for _list in test_list:
# unique_count is an integer that corresponds to an item in your list
unique_count = 0
# used_tracker matches the unique_count with an item in your list
used_tracker = {}
inner_list = []
# iterate through inner list
for _item in _list:
# check the used_tracker to see if the item has been used - if so, replace with the corresponding v'unique count'
if _item in used_tracker:
inner_list.append(used_tracker[_item])
else:
# if not, add the count to the tracker
inner_list.append(unique_count)
used_tracker[_item] = unique_count
unique_count += 1
outer_list.append(inner_list)
track_items_in_list([["0.5", "0.4", "disabled", "0.7", "disabled"], ["feature1", "feature2", "feature4", "feature1", "feature3"]])
# [[0, 1, 2, 3, 2], [0, 1, 2, 0, 3]]
使用字典和计数器为新值赋予 ID 并记住过去的 ID:
import itertools, collections
def norm(lst):
d = collections.defaultdict(itertools.count().__next__)
return [d[s] for s in lst]
lst = [["0.5", "0.4", "disabled", "0.7", "disabled"],
["feature1", "feature2", "feature4", "feature1", "feature3"]]
print(list(map(norm, lst)))
# [[0, 1, 2, 3, 2], [0, 1, 2, 0, 3]]
或者通过枚举排序的唯一值;但是请注意,"disables"
在数值之后排序:
def norm_sort(lst):
d = {x: i for i, x in enumerate(sorted(set(lst)))}
return [d[s] for s in lst]
print(list(map(norm_sort, lst)))
[[1, 0, 3, 2, 3], [0, 1, 3, 0, 2]]
我有多个特征列表,它们是我要分析的字符串。 即,例如:
[["0.5", "0.4", "disabled", "0.7", "disabled"], ["feature1", "feature2", "feature4", "feature1", "feature3"]]
我知道如何将“0.5”之类的字符串转换为浮点数,但是有没有办法将此类列表“规范化”为整数或浮点值(在我的例子中每个列表都是独立的)?我想要这样的东西:
[[2, 1, 0, 3, 0], [0, 1, 3, 0, 2]]
有人知道如何实现吗?不幸的是,我还找不到与此问题相关的任何信息。
有点乱,但应该可以做你想做的事 - 使用字典来跟踪你使用过的列表中的项目。您可以用生成器替换 for 循环以减少冗长。
def track_items_in_list(test_list):
outer_list = []
# iterate through outer list
for _list in test_list:
# unique_count is an integer that corresponds to an item in your list
unique_count = 0
# used_tracker matches the unique_count with an item in your list
used_tracker = {}
inner_list = []
# iterate through inner list
for _item in _list:
# check the used_tracker to see if the item has been used - if so, replace with the corresponding v'unique count'
if _item in used_tracker:
inner_list.append(used_tracker[_item])
else:
# if not, add the count to the tracker
inner_list.append(unique_count)
used_tracker[_item] = unique_count
unique_count += 1
outer_list.append(inner_list)
track_items_in_list([["0.5", "0.4", "disabled", "0.7", "disabled"], ["feature1", "feature2", "feature4", "feature1", "feature3"]])
# [[0, 1, 2, 3, 2], [0, 1, 2, 0, 3]]
使用字典和计数器为新值赋予 ID 并记住过去的 ID:
import itertools, collections
def norm(lst):
d = collections.defaultdict(itertools.count().__next__)
return [d[s] for s in lst]
lst = [["0.5", "0.4", "disabled", "0.7", "disabled"],
["feature1", "feature2", "feature4", "feature1", "feature3"]]
print(list(map(norm, lst)))
# [[0, 1, 2, 3, 2], [0, 1, 2, 0, 3]]
或者通过枚举排序的唯一值;但是请注意,"disables"
在数值之后排序:
def norm_sort(lst):
d = {x: i for i, x in enumerate(sorted(set(lst)))}
return [d[s] for s in lst]
print(list(map(norm_sort, lst)))
[[1, 0, 3, 2, 3], [0, 1, 3, 0, 2]]