是否有规范化字符串并将其转换为 integers/floats 的函数？

Question

我有多个特征列表，它们是我要分析的字符串。即，例如：

[["0.5", "0.4", "disabled", "0.7", "disabled"], ["feature1", "feature2", "feature4", "feature1", "feature3"]]

我知道如何将“0.5”之类的字符串转换为浮点数，但是有没有办法将此类列表“规范化”为整数或浮点值（在我的例子中每个列表都是独立的）？我想要这样的东西：

[[2, 1, 0, 3, 0], [0, 1, 3, 0, 2]]

有人知道如何实现吗？不幸的是，我还找不到与此问题相关的任何信息。

Answer 1

有点乱，但应该可以做你想做的事 - 使用字典来跟踪你使用过的列表中的项目。您可以用生成器替换 for 循环以减少冗长。

def track_items_in_list(test_list):
    outer_list = []
    # iterate through outer list
    for _list in test_list:
        # unique_count is an integer that corresponds to an item in your list
        unique_count = 0
        # used_tracker matches the unique_count with an item in your list
        used_tracker = {}
        inner_list = []
        # iterate through inner list
        for _item in _list:
            # check the used_tracker to see if the item has been used - if so, replace with the corresponding v'unique count'
            if _item in used_tracker:
                inner_list.append(used_tracker[_item])
            else:
                # if not, add the count to the tracker
                inner_list.append(unique_count)
                used_tracker[_item] = unique_count
                unique_count += 1
         outer_list.append(inner_list)

track_items_in_list([["0.5", "0.4", "disabled", "0.7", "disabled"], ["feature1", "feature2", "feature4", "feature1", "feature3"]])
# [[0, 1, 2, 3, 2], [0, 1, 2, 0, 3]]

Answer 2

使用字典和计数器为新值赋予 ID 并记住过去的 ID：

import itertools, collections

def norm(lst):
    d = collections.defaultdict(itertools.count().__next__)
    return [d[s] for s in lst]

lst = [["0.5", "0.4", "disabled", "0.7", "disabled"],
       ["feature1", "feature2", "feature4", "feature1", "feature3"]]
print(list(map(norm, lst)))
# [[0, 1, 2, 3, 2], [0, 1, 2, 0, 3]]

或者通过枚举排序的唯一值；但是请注意，"disables" 在数值之后排序：

def norm_sort(lst):
    d = {x: i for i, x in enumerate(sorted(set(lst)))}
    return [d[s] for s in lst]

print(list(map(norm_sort, lst)))
[[1, 0, 3, 2, 3], [0, 1, 3, 0, 2]]

是否有规范化字符串并将其转换为 integers/floats 的函数？

Is there a function to normalize strings and convert them to integers/floats?

python

normalization

feature-scaling