遍历 pandas DataFrame 中的 select 个单元格并替换一个值

Question

我有一个类似于以下示例的 pandas DataFrame：

      tags      tag1      tag2      tag3
0     [a,b,c]     0         0         0
1     [a,b]       0         0         0
2     [b,d]       0         0         0
...
n     [a,b,d]     0         0         0

我想将 tags 编码为 tag1, tag2, tag3 行中的 1，如果它们存在于该行索引的 tags 数组中。

但是，我不太清楚如何正确地迭代；目前我的想法如下：

for i, row in dataset.iterrows():
    for tag in row[0]:
        for column in range (1,4):
            if dataset.iloc[:,column].index == tag:
                dataset.set_value(i, column, 1)

但是，从该方法返回数据集后，列仍然全部为 0 值。

谢谢！

Answer 1

看来您需要：

astype 如果包含列表到字符串则转换列
str.strip 删除 []
str.get_dummies

df1 = df['tags'].astype(str).str.strip('[]').str.get_dummies(', ')
print (df1)
   'a'  'b'  'c'  'd'
0    1    1    1    0
1    1    1    0    0
2    0    1    0    1
3    1    1    0    1

最后将 df1 添加到原始 DataFrame 中 concat:

df = pd.concat([df,df1], axis=1)
print (df)
        tags  tag1  tag2  tag3  'a'  'b'  'c'  'd'
0  [a, b, c]     0     0     0    1    1    1    0
1     [a, b]     0     0     0    1    1    0    0
2     [b, d]     0     0     0    0    1    0    1
3  [a, b, d]     0     0     0    1    1    0    1

遍历 pandas DataFrame 中的 select 个单元格并替换一个值

Iterating overs select cells in pandas DataFrame and replacing a value

python

iteration

indices

pandas