是否可以在二维列表上应用 sklearn.preprocessing.LabelEncoder() ?
Is it possible to apply sklearn.preprocessing.LabelEncoder() on a 2D list?
假设我有如下列表:
l = [
['PER', 'O', 'O', 'GEO'],
['ORG', 'O', 'O', 'O'],
['O', 'O', 'O', 'GEO'],
['O', 'O', 'PER', 'O']
]
我想用 LabelEncoder() 对 2D 列表进行编码。
它应该看起来像:
l = [
[1, 0, 0, 2],
[3, 0, 0, 0],
[0, 0, 0, 2],
[0, 0, 1, 0]
]
可能吗?
如果没有,是否有任何解决方法?
提前致谢!
您可以展平列表,用所有潜在值拟合编码器,然后使用编码器转换每个子列表,如下所示:
from sklearn.preprocessing import LabelEncoder
l = [
['PER', 'O', 'O', 'GEO'],
['ORG', 'O', 'O', 'O'],
['O', 'O', 'O', 'GEO'],
['O', 'O', 'PER', 'O']
]
flattened_l = [e for sublist in l for e in sublist]
# flattened_l is ['PER', 'O', 'O', 'GEO', 'ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'GEO', 'O', 'O', 'PER', 'O']
le = LabelEncoder().fit(flattened_l)
# See the mapping generated by the encoder:
list(enumerate(le.classes_))
# [(0, 'GEO'), (1, 'O'), (2, 'ORG'), (3, 'PER')]
# And, finally, transform each sublist:
res = [list(le.transform(sublist)) for sublist in l]
res
# Getting the result you want:
# [[3, 1, 1, 0], [2, 1, 1, 1], [1, 1, 1, 0], [1, 1, 3, 1]]
假设我有如下列表:
l = [
['PER', 'O', 'O', 'GEO'],
['ORG', 'O', 'O', 'O'],
['O', 'O', 'O', 'GEO'],
['O', 'O', 'PER', 'O']
]
我想用 LabelEncoder() 对 2D 列表进行编码。
它应该看起来像:
l = [
[1, 0, 0, 2],
[3, 0, 0, 0],
[0, 0, 0, 2],
[0, 0, 1, 0]
]
可能吗? 如果没有,是否有任何解决方法?
提前致谢!
您可以展平列表,用所有潜在值拟合编码器,然后使用编码器转换每个子列表,如下所示:
from sklearn.preprocessing import LabelEncoder
l = [
['PER', 'O', 'O', 'GEO'],
['ORG', 'O', 'O', 'O'],
['O', 'O', 'O', 'GEO'],
['O', 'O', 'PER', 'O']
]
flattened_l = [e for sublist in l for e in sublist]
# flattened_l is ['PER', 'O', 'O', 'GEO', 'ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'GEO', 'O', 'O', 'PER', 'O']
le = LabelEncoder().fit(flattened_l)
# See the mapping generated by the encoder:
list(enumerate(le.classes_))
# [(0, 'GEO'), (1, 'O'), (2, 'ORG'), (3, 'PER')]
# And, finally, transform each sublist:
res = [list(le.transform(sublist)) for sublist in l]
res
# Getting the result you want:
# [[3, 1, 1, 0], [2, 1, 1, 1], [1, 1, 1, 0], [1, 1, 3, 1]]