[python]：将字典项的 pandas 列转换为 DataFrame 中的单独行

Question

我有一个 pandas DataFrame，如下所示：

date_time    country  src_type  edges
2021-05-01   DE       home      {"home": 10, "nav": 3}
2021-05-03   IN       nav       {"support": 1}
2021-05-04   AE       cart      {"chat": 1, "about": 4, "home": 5}
2021-05-07   US       about     {}

列 edges 是一个包含边 dst_type 到其值 edge_count 的映射的字典。我希望字典中的每个单独项目都是 DataFrame 中的单独一行。

这在查看预期输出时会更清楚：

date_time    country  src_type  dst_type  edge_count
2021-05-01   DE       home      home      10
2021-05-01   DE       home      nav       3
2021-05-03   IN       nav       support   1
2021-05-04   AE       cart      chat      1
2021-05-04   AE       cart      about     4
2021-05-04   AE       cart      home      5

原始 DataFrame 中的最后一行被删除，因为 edges 中的字典为空。

date_time    country  src_type  edges
. . .
2021-05-07   US       about     {}

目前，我正在做以下事情：

records = []

for _, row in df.iterrows():
    for dst_type, edge_count in sorted(row["edges"].items()):
        records.append(
            (row["date_time"], row["country"], row["src_type"], dst_type, edge_count)
        )

df = pd.DataFrame.from_records(
    records, columns=["date_time", "country", "src_type", "dst_type", "edge_count"]
)

但是，这非常慢，因为遍历 DataFrame 需要时间。我想 向量化 这个操作并使其更快。有任何指示或建议吗？

如果您对此有任何帮助，我将不胜感激，因为它可以优化我们的处理速度，使其更快。谢谢！

Answer 1

可以使用pd.DataFrame() to convert the dictionary to new columns with dict keys as column labels. Then use .melt() to convert the new columns to individual rows. Sort by date_time column as required using .sort_values(). Finally clean up those rows without value (or with NaN value) in the resulting edge_count column using .dropna()，如下：

df2 = df.drop('edges', axis=1).join(pd.DataFrame(df['edges'].tolist()))

(df2.melt(id_vars=['date_time', 'country', 'src_type'], var_name='dst_type', value_name='edge_count')
    .sort_values('date_time')
    .dropna(subset=['edge_count'])
)

结果：

     date_time country src_type dst_type  edge_count
0   2021-05-01      DE     home     home        10.0
4   2021-05-01      DE     home      nav         3.0
9   2021-05-03      IN      nav  support         1.0
18  2021-05-04      AE     cart    about         4.0
14  2021-05-04      AE     cart     chat         1.0
2   2021-05-04      AE     cart     home         5.0

[python]：将字典项的 pandas 列转换为 DataFrame 中的单独行

[python]: Convert pandas column of dictionary items to individual rows in a DataFrame

python

vectorization

dataframe

python-3.x

pandas