在列数据帧 pandas 中分解数组 [(str), (int)]
Explode array [(str), (int)] in column dataframe pandas
我有一个数据框:
df = pd.DataFrame({
'day': ['11', '12'],
'City': ['[(Mumbai, 1),(Bangalore, 2)]', '[(Pune, 3),(Mumbai, 4),(Delh, 5)]']
})
day City
0 11 [(Mumbai, 1),(Bangalore, 2)]
1 12 [(Pune, 3),(Mumbai, 4),(Delh, 5)]
我要爆款。但是当我这样做时,什么都没有改变。
df2 = df.explode('City')
我想在输出中得到什么
day City
0 11 (Mumbai, 1)
1 11 (Bangalore, 2)
2 12 (Pune, 3)
3 12 (Mumbai, 4)
4 12 (Delh, 5)
您可以 explode
字符串。您需要找到一种转换为列表的方法。
假设您的城市名称仅包含字母(或空格),您可以使用正则表达式添加引号并转换为包含 ast.literal_eval
:
的列表
from ast import literal_eval
df['City'] = (df['City']
.str.replace(r'([a-zA-Z ]+),', r'"",', regex=True)
.apply(literal_eval)
)
df2 = df.explode('City', ignore_index=True)
输出:
day City
0 11 (Mumbai, 1)
1 11 (Bangalore, 2)
2 12 (Pune, 3)
3 12 (Mumbai, 4)
4 12 (Delh, 5)
df = pd.DataFrame({
'day': ['11', '12'],
'City': ['[(Mumbai, 1),(Bangalore, 2)]', '[(Pune, 3),(Mumbai, 4),(Delh, 5)]']
})
df['City'] = [re.sub("\),\(",")-(", x) for x in df['City']]
df['City'] = [re.sub("\[|\]|\(|\)","", x) for x in df['City']]
df['City'] = [x.split("-") for x in df['City']]
df['City']
df2 = df.explode('City').reset_index(drop=True)
你必须在分解之前处理字符串并将其转换为列表
day City
0 11 Mumbai, 1
1 11 Bangalore, 2
2 12 Pune, 3
3 12 Mumbai, 4
4 12 Delh, 5
我有一个数据框:
df = pd.DataFrame({
'day': ['11', '12'],
'City': ['[(Mumbai, 1),(Bangalore, 2)]', '[(Pune, 3),(Mumbai, 4),(Delh, 5)]']
})
day City
0 11 [(Mumbai, 1),(Bangalore, 2)]
1 12 [(Pune, 3),(Mumbai, 4),(Delh, 5)]
我要爆款。但是当我这样做时,什么都没有改变。
df2 = df.explode('City')
我想在输出中得到什么
day City
0 11 (Mumbai, 1)
1 11 (Bangalore, 2)
2 12 (Pune, 3)
3 12 (Mumbai, 4)
4 12 (Delh, 5)
您可以 explode
字符串。您需要找到一种转换为列表的方法。
假设您的城市名称仅包含字母(或空格),您可以使用正则表达式添加引号并转换为包含 ast.literal_eval
:
from ast import literal_eval
df['City'] = (df['City']
.str.replace(r'([a-zA-Z ]+),', r'"",', regex=True)
.apply(literal_eval)
)
df2 = df.explode('City', ignore_index=True)
输出:
day City
0 11 (Mumbai, 1)
1 11 (Bangalore, 2)
2 12 (Pune, 3)
3 12 (Mumbai, 4)
4 12 (Delh, 5)
df = pd.DataFrame({
'day': ['11', '12'],
'City': ['[(Mumbai, 1),(Bangalore, 2)]', '[(Pune, 3),(Mumbai, 4),(Delh, 5)]']
})
df['City'] = [re.sub("\),\(",")-(", x) for x in df['City']]
df['City'] = [re.sub("\[|\]|\(|\)","", x) for x in df['City']]
df['City'] = [x.split("-") for x in df['City']]
df['City']
df2 = df.explode('City').reset_index(drop=True)
你必须在分解之前处理字符串并将其转换为列表
day City
0 11 Mumbai, 1
1 11 Bangalore, 2
2 12 Pune, 3
3 12 Mumbai, 4
4 12 Delh, 5