通过在另一列中拆分以逗号分隔的多个值来复制行 pandas

Question

我从

找到了应该工作的代码

但我收到错误消息“名称 'Series' 未定义”。它在示例中运行良好，但其他用户也确实出现了此错误。有谁知道如何让它工作？

如有任何帮助，我们将不胜感激！

original_df = DataFrame([{'country': 'a', 'title': 'title1'},
               {'country': 'a,b,c', 'title': 'title2'},
               {'country': 'd,e,f', 'title': 'title3'},
               {'country': 'e', 'title': 'title4'}])

desired_df = DataFrame([{'country': 'a', 'title': 'title1'},
               {'country': 'a', 'title': 'title2'},
               {'country': 'b', 'title': 'title2'},
               {'country': 'c', 'title': 'title2'},
               {'country': 'd', 'title': 'title3'},
               {'country': 'e', 'title': 'title3'},
               {'country': 'f', 'title': 'title3'},
               {'country': 'e', 'title': 'title4'}])

#Code I used:
desired_df = pd.concat(
    [
        Series(row["title"], row["country"].split(","))
        for _, row in original_df.iterrows()
    ]
).reset_index()

Answer 1

首先 split 逗号上的列以获取列表，然后您可以 explode 该系列列表。将 'title' 移动到索引，以便对 'country' 中的每个元素进行重复。最后两部分只是清理名称并从索引中删除标题。

(df.set_index('title')['country']
   .str.split(',')
   .explode()
   .rename('country')
   .reset_index())

    title country
0  title1       a
1  title2       a
2  title2       b
3  title2       c
4  title3       d
5  title3       e
6  title3       f
7  title4       e

此外，您的原始代码在逻辑上没有问题，但您需要正确创建 object。我会建议导入模块而不是单独的 classes/methods，这样你就可以创建一个 Series with pd.Series 而不是 Series

import pandas as pd
                
desired_df = pd.concat([pd.Series(row['title'], row['country'].split(','))              
                        for _, row in original_df.iterrows()]).reset_index()

Answer 2

你可以在这里使用pd.Series.str.split with df.explode。

df['country'] = df['country'].str.split(',')
df.explode('country').reset_index(drop=True)

  country   title
0       a  title1
1       a  title2
2       b  title2
3       c  title2
4       d  title3
5       e  title3
6       f  title3
7       e  title4

对于NameError，您可以使用这种方式导入。

from pandas import DataFrame, Series

注意：使用上述导入语句只会将 DataFrame 和 Series 类引入范围。

通过在另一列中拆分以逗号分隔的多个值来复制行 pandas

duplicating rows by splitting comma separated multiple values in another column pandas

python

rows

comma

pandas