Python 以逗号和方括号分隔的数据透视列

Question

我正在处理一个目前看起来像这样的大型数据框：

ID	Region	Noun
1	North America	[('Everything', 'NP'), ('the product','NP'), ('it','NP')]
2	North America	[('beautiful product','NP')]

第三列包含名词短语，每个短语都有标签（两个大写字母）。这些都是引号，然后加上括号，再放在方括号里，让我很难分开和旋转。

我只想旋转最后一列，因此最终输出将如下所示：

ID	Region	Noun	Type
1	North America	everything	NP
1	North America	the product	NP
1	North America	it	NP
2	North America	beautiful product	NP

烦人的是有些行的括号比其他行多。有什么方法可以在 Python 上实现吗？

Answer 1

我假设您在 Noun 列中有字符串。您可以对它们应用 ast.literal_eval 以将它们转换为 Python 列表：

from ast import literal_eval

# apply if Noun are strings, skip otherwise
df["Noun"] = df["Noun"].apply(literal_eval)

df = df.explode("Noun")
df[["Noun", "Type"]] = df["Noun"].apply(pd.Series)
print(df)

打印：

   ID         Region               Noun Type
0   1  North America         Everything   NP
0   1  North America        the product   NP
0   1  North America                 it   NP
1   2  North America  beautiful product   NP

Answer 2

这是另一种方式：

df2 = df.explode('Noun').reset_index(drop=True)
df2[['ID','Region']].join(pd.DataFrame(df2['Noun'].tolist(),columns = ['Noun','Type']))

输出：

   ID         Region               Noun Type
0   1  North America         Everything   NP
1   1  North America        the product   NP
2   1  North America                 it   NP
3   2  North America  beautiful product   NP

Python 以逗号和方括号分隔的数据透视列

Python pivot column with comma and bracket separated

python

pivot

dataframe

pandas