根据两列的匹配值重新索引数据框
Re-indexing a dataframe based on matching values of two columns
我在尝试根据其值的匹配对数据框进行分组时遇到了困难,比方说:
print(crosstabsdf1)
Index Area Area_2
0 188 181
1 190 188
2 192 190
3 115 110
4 138 121
... ... ...
2510 173 174
2511 177 178
2512 174 175
2513 176 177
2604 181 182
[361 rows x 2 columns]
当我寻找一个值的匹配时,例如:
crosstabsdf1[crosstabsdf1['Area']==181]
Index Area Area_2
9 181 175
260 181 182
crosstabsdf1[crosstabsdf1['Area_2']==181]
Index Area Area_2
0 188 181
157 180 181
所以,我想对每对之间的所有匹配项进行分组(我的匹配项是指,当我有一行时:
Area Area_2
181 175
181 182
188 181
180 181
表示181和175、181-182等区域相邻),
那么,有没有一种 pandas 方法(或者可能是一个定义的函数)来将每个区域分组并根据它与其他区域相邻的出现情况将其显示为多行,如下所示:
Index Area Area_2
0 181 175
1 181 180
2 181 182
3 181 188
谢谢
根据你提供的例子,你可以试试这个:
import pandas as pd
def match(df, col, other_col, value):
"""Find rows matching a giver value.
Args:
df (pd.DataFrame): target dataframe
col (str): label of the first column
other_col (str): label of the second column
value (int): target value
Returns:
pd.DataFrame: rows matching value
"""
# Find value in both columns
area = df.loc[(df[col] == value), other_col]
area_2 = df.loc[(df[other_col] == value), col]
# Concat rows, add new column and return new df with sorted columns
new_df = pd.DataFrame(pd.concat([area, area_2]), columns=[other_col])
new_df.loc[:, col] = value
return new_df.reindex(sorted(new_df.columns), axis=1)
df = pd.DataFrame(
{
"Area": [181, 181, 188, 180, 173, 176, 138],
"Area_2": [175, 182, 181, 181, 174, 177, 121],
}
)
print(match(df, "Area", "Area_2", 181))
# Outputs
Area Area_2
0 181 175
1 181 182
2 181 188
3 181 180
现在,要将其应用于整个数据框,您可以这样继续:
# Put all intermediate dataframes in a list by applying "match"
# to existing and unique values of "Area" column
dfs = [match(df, "Area", "Area_2", e) for e in df["Area"].unique()]
# Iterate and concatenate
new_df = dfs[0]
for df in dfs[1:]:
new_df = pd.concat([new_df, df])
# Clean up
new_df = new_df.sort_values(by=["Area", "Area_2"]).reset_index(drop=True)
print(new_df)
# Outputs
Area Area_2
0 138 121
1 173 174
2 176 177
3 180 181
4 181 175
5 181 180
6 181 182
7 181 188
8 188 181
我在尝试根据其值的匹配对数据框进行分组时遇到了困难,比方说:
print(crosstabsdf1)
Index Area Area_2
0 188 181
1 190 188
2 192 190
3 115 110
4 138 121
... ... ...
2510 173 174
2511 177 178
2512 174 175
2513 176 177
2604 181 182
[361 rows x 2 columns]
当我寻找一个值的匹配时,例如:
crosstabsdf1[crosstabsdf1['Area']==181]
Index Area Area_2
9 181 175
260 181 182
crosstabsdf1[crosstabsdf1['Area_2']==181]
Index Area Area_2
0 188 181
157 180 181
所以,我想对每对之间的所有匹配项进行分组(我的匹配项是指,当我有一行时:
Area Area_2
181 175
181 182
188 181
180 181
表示181和175、181-182等区域相邻),
那么,有没有一种 pandas 方法(或者可能是一个定义的函数)来将每个区域分组并根据它与其他区域相邻的出现情况将其显示为多行,如下所示:
Index Area Area_2
0 181 175
1 181 180
2 181 182
3 181 188
谢谢
根据你提供的例子,你可以试试这个:
import pandas as pd
def match(df, col, other_col, value):
"""Find rows matching a giver value.
Args:
df (pd.DataFrame): target dataframe
col (str): label of the first column
other_col (str): label of the second column
value (int): target value
Returns:
pd.DataFrame: rows matching value
"""
# Find value in both columns
area = df.loc[(df[col] == value), other_col]
area_2 = df.loc[(df[other_col] == value), col]
# Concat rows, add new column and return new df with sorted columns
new_df = pd.DataFrame(pd.concat([area, area_2]), columns=[other_col])
new_df.loc[:, col] = value
return new_df.reindex(sorted(new_df.columns), axis=1)
df = pd.DataFrame(
{
"Area": [181, 181, 188, 180, 173, 176, 138],
"Area_2": [175, 182, 181, 181, 174, 177, 121],
}
)
print(match(df, "Area", "Area_2", 181))
# Outputs
Area Area_2
0 181 175
1 181 182
2 181 188
3 181 180
现在,要将其应用于整个数据框,您可以这样继续:
# Put all intermediate dataframes in a list by applying "match"
# to existing and unique values of "Area" column
dfs = [match(df, "Area", "Area_2", e) for e in df["Area"].unique()]
# Iterate and concatenate
new_df = dfs[0]
for df in dfs[1:]:
new_df = pd.concat([new_df, df])
# Clean up
new_df = new_df.sort_values(by=["Area", "Area_2"]).reset_index(drop=True)
print(new_df)
# Outputs
Area Area_2
0 138 121
1 173 174
2 176 177
3 180 181
4 181 175
5 181 180
6 181 182
7 181 188
8 188 181