根据两列的匹配值重新索引数据框

Question

我在尝试根据其值的匹配对数据框进行分组时遇到了困难，比方说：

print(crosstabsdf1)
Index  Area     Area_2
0      188        181
1      190        188
2      192        190
3      115        110
4      138        121
...    ...        ...
2510   173        174
2511   177        178
2512   174        175
2513   176        177
2604   181        182

[361 rows x 2 columns]

当我寻找一个值的匹配时，例如：

crosstabsdf1[crosstabsdf1['Area']==181]

Index  Area  Area_2
9     181       175
260   181       182

crosstabsdf1[crosstabsdf1['Area_2']==181]

Index   Area   Area_2
0       188     181
157     180     181

所以，我想对每对之间的所有匹配项进行分组（我的匹配项是指，当我有一行时：

Area Area_2
181  175
181  182
188  181
180  181

表示181和175、181-182等区域相邻),

那么，有没有一种 pandas 方法（或者可能是一个定义的函数）来将每个区域分组并根据它与其他区域相邻的出现情况将其显示为多行，如下所示：

Index    Area       Area_2
0        181          175
1        181          180
2        181          182
3        181          188

谢谢

Answer 1

根据你提供的例子，你可以试试这个：

import pandas as pd


def match(df, col, other_col, value):
    """Find rows matching a giver value.

    Args:
        df (pd.DataFrame): target dataframe
        col (str): label of the first column
        other_col (str): label of the second column
        value (int): target value

    Returns:
        pd.DataFrame: rows matching value
    """
    # Find value in both columns
    area = df.loc[(df[col] == value), other_col]
    area_2 = df.loc[(df[other_col] == value), col]
    
    # Concat rows, add new column and return new df with sorted columns
    new_df = pd.DataFrame(pd.concat([area, area_2]), columns=[other_col])
    new_df.loc[:, col] = value

    return new_df.reindex(sorted(new_df.columns), axis=1)


df = pd.DataFrame(
    {
        "Area": [181, 181, 188, 180, 173, 176, 138],
        "Area_2": [175, 182, 181, 181, 174, 177, 121],
    }
)

print(match(df, "Area", "Area_2", 181))
# Outputs
   Area  Area_2
0   181     175
1   181     182
2   181     188
3   181     180

现在，要将其应用于整个数据框，您可以这样继续：

# Put all intermediate dataframes in a list by applying "match"
# to existing and unique values of "Area" column
dfs = [match(df, "Area", "Area_2", e) for e in df["Area"].unique()]

# Iterate and concatenate
new_df = dfs[0]
for df in dfs[1:]:
    new_df = pd.concat([new_df, df])

# Clean up
new_df = new_df.sort_values(by=["Area", "Area_2"]).reset_index(drop=True)

print(new_df)
# Outputs
   Area  Area_2
0   138     121
1   173     174
2   176     177
3   180     181
4   181     175
5   181     180
6   181     182
7   181     188
8   188     181

根据两列的匹配值重新索引数据框

Re-indexing a dataframe based on matching values of two columns

indexing

matching

reshape

dataframe

pandas