在列数据帧 pandas 中分解数组 [(str), (int)]

Explode array [(str), (int)] in column dataframe pandas

我有一个数据框:

    df = pd.DataFrame({
       'day': ['11', '12'],
       'City': ['[(Mumbai, 1),(Bangalore, 2)]', '[(Pune, 3),(Mumbai, 4),(Delh, 5)]']
    })

   day                               City
0  11       [(Mumbai, 1),(Bangalore, 2)]
1  12  [(Pune, 3),(Mumbai, 4),(Delh, 5)]

我要爆款。但是当我这样做时,什么都没有改变。

df2 = df.explode('City')

我想在输出中得到什么

  day            City
0  11     (Mumbai, 1)
1  11  (Bangalore, 2)
2  12       (Pune, 3)
3  12     (Mumbai, 4)
4  12       (Delh, 5)

您可以 explode 字符串。您需要找到一种转换为列表的方法。

假设您的城市名称仅包含字母(或空格),您可以使用正则表达式添加引号并转换为包含 ast.literal_eval:

的列表
from ast import literal_eval

df['City'] = (df['City']
              .str.replace(r'([a-zA-Z ]+),', r'"",', regex=True)
              .apply(literal_eval)
              )

df2 = df.explode('City', ignore_index=True)

输出:

  day            City
0  11     (Mumbai, 1)
1  11  (Bangalore, 2)
2  12       (Pune, 3)
3  12     (Mumbai, 4)
4  12       (Delh, 5)
df = pd.DataFrame({
    'day': ['11', '12'],
    'City': ['[(Mumbai, 1),(Bangalore, 2)]', '[(Pune, 3),(Mumbai, 4),(Delh, 5)]']
})


df['City'] = [re.sub("\),\(",")-(", x) for x in df['City']]
df['City'] = [re.sub("\[|\]|\(|\)","", x) for x in df['City']]
df['City'] = [x.split("-") for x in df['City']]
df['City']
df2 = df.explode('City').reset_index(drop=True)

你必须在分解之前处理字符串并将其转换为列表

  day          City
0  11     Mumbai, 1
1  11  Bangalore, 2
2  12       Pune, 3
3  12     Mumbai, 4
4  12       Delh, 5