如何使用 Python 基于另一个 DataFrame 中的行 select DataFrame 中的行

Question

我有两个数据框，df1 如下所示：

id  year    CalendarWeek    DayName interval    counts
1   2014    1   sun 10:30   3
1   2014    1   sun 11:30   4
1   2014    2   wed 12:00   5
1   2014    2   fri 9:00    2
2   2014    1   sun 13:00   3
2   2014    1   sun 14:30   1
2   2014    1   mon 10:30   2
2   2014    2   wed 14:00   3
2   2014    2   fri 15:00   5
3   2014    1   thu 16:30   2
3   2014    1   thu 17:00   1
3   2014    2   sat 12:00   2
3   2014    2   sat 13:30   3

df2 如下所示：

id  year    CalendarWeek    DayName interval    NewCounts
1   2014    1   sun 10:00   2
1   2014    1   sun 10:30   4
1   2014    1   sun 11:30   5
1   2014    2   wed 10:30   6
1   2014    2   wed 12:00   3
1   2014    2   fri 8:30    1
1   2014    2   fri 9:00    2
2   2014    1   sun 12:30   3
2   2014    1   sun 13:00   4
2   2014    1   sun 14:30   4
2   2014    1   mon 9:00    35
2   2014    1   mon 10:30   1
2   2014    2   wed 12:30   23
2   2014    2   wed 14:00   4
2   2014    2   fri 15:00   3
3   2014    1   thu 14:30   1
3   2014    1   thu 15:00   3
3   2014    1   thu 16:30   34
3   2014    1   thu 17:00   5
3   2014    2   sat 12:00   3
3   2014    2   sat 13:30   4
3   2014    2   sat 14:00   2

我想获取 df2 中与 df1 中的列 id、year、CalendarWeek、DayName 和 interval 匹配的所有行。我想要的结果应该如下所示：

id  year    CalendarWeek    DayName interval    NewCounts
1   2014    1   sun 10:30   4
1   2014    1   sun 11:30   5
1   2014    2   wed 12:00   3
1   2014    2   fri 9:00    2
2   2014    1   sun 13:00   4
2   2014    1   sun 14:30   4
2   2014    1   mon 10:30   1
2   2014    2   wed 14:00   4
2   2014    2   fri 15:00   3
3   2014    1   thu 16:30   34
3   2014    1   thu 17:00   5
3   2014    2   sat 12:00   3
3   2014    2   sat 13:30   4

在Python中，如何select一个数据框中的这些特定行基于另一个数据框中的列？

谢谢！

Answer 1

执行 merge and pass the list of columns to param on, the default type of merge is 'inner' 仅匹配两个 dfs 中存在的值：

In [2]:

df.merge(df1, on=['id','year','CalendarWeek','DayName','interval'])
Out[2]:
    id  year  CalendarWeek DayName interval  counts  NewCounts
0    1  2014             1     sun    10:30       3          4
1    1  2014             1     sun    11:30       4          5
2    1  2014             2     wed    12:00       5          3
3    1  2014             2     fri     9:00       2          2
4    2  2014             1     sun    13:00       3          4
5    2  2014             1     sun    14:30       1          4
6    2  2014             1     mon    10:30       2          1
7    2  2014             2     wed    14:00       3          4
8    2  2014             2     fri    15:00       5          3
9    3  2014             1     thu    16:30       2         34
10   3  2014             1     thu    17:00       1          5
11   3  2014             2     sat    12:00       2          3
12   3  2014             2     sat    13:30       3          4

如果您的 'id' 列是您的索引，您必须重置两个 df 上的索引，以便它们成为 df 中的列，这是因为如果您指定 on 列列表并指定 left_index=True 和 right_index=True:

In [4]:

df.merge(df1, on=['year','CalendarWeek','DayName','interval'], left_index=True, right_index=True)
Out[4]:
    year  CalendarWeek DayName interval  counts  NewCounts
id                                                        
1   2014             1     sun    10:30       3          2
1   2014             1     sun    10:30       3          4
1   2014             1     sun    10:30       3          5
1   2014             1     sun    10:30       3          6
1   2014             1     sun    10:30       3          3
1   2014             1     sun    10:30       3          1
1   2014             1     sun    10:30       3          2
1   2014             1     sun    11:30       4          2
1   2014             1     sun    11:30       4          4
1   2014             1     sun    11:30       4          5
1   2014             1     sun    11:30       4          6
1   2014             1     sun    11:30       4          3
1   2014             1     sun    11:30       4          1
1   2014             1     sun    11:30       4          2
1   2014             2     wed    12:00       5          2
1   2014             2     wed    12:00       5          4
1   2014             2     wed    12:00       5          5
1   2014             2     wed    12:00       5          6
1   2014             2     wed    12:00       5          3
1   2014             2     wed    12:00       5          1
1   2014             2     wed    12:00       5          2
1   2014             2     fri     9:00       2          2
1   2014             2     fri     9:00       2          4
1   2014             2     fri     9:00       2          5
1   2014             2     fri     9:00       2          6
1   2014             2     fri     9:00       2          3
1   2014             2     fri     9:00       2          1
1   2014             2     fri     9:00       2          2
2   2014             1     sun    13:00       3          3
2   2014             1     sun    13:00       3          4
..   ...           ...     ...      ...     ...        ...
2   2014             2     fri    15:00       5          4
2   2014             2     fri    15:00       5          3
3   2014             1     thu    16:30       2          1
3   2014             1     thu    16:30       2          3
3   2014             1     thu    16:30       2         34
3   2014             1     thu    16:30       2          5
3   2014             1     thu    16:30       2          3
3   2014             1     thu    16:30       2          4
3   2014             1     thu    16:30       2          2
3   2014             1     thu    17:00       1          1
3   2014             1     thu    17:00       1          3
3   2014             1     thu    17:00       1         34
3   2014             1     thu    17:00       1          5
3   2014             1     thu    17:00       1          3
3   2014             1     thu    17:00       1          4
3   2014             1     thu    17:00       1          2
3   2014             2     sat    12:00       2          1
3   2014             2     sat    12:00       2          3
3   2014             2     sat    12:00       2         34
3   2014             2     sat    12:00       2          5
3   2014             2     sat    12:00       2          3
3   2014             2     sat    12:00       2          4
3   2014             2     sat    12:00       2          2
3   2014             2     sat    13:30       3          1
3   2014             2     sat    13:30       3          3
3   2014             2     sat    13:30       3         34
3   2014             2     sat    13:30       3          5
3   2014             2     sat    13:30       3          3
3   2014             2     sat    13:30       3          4
3   2014             2     sat    13:30       3          2

[96 rows x 6 columns]

所以要重置索引只需执行 df = df.reset_index(0) 并且对于其他 df 也是如此，合并后您可以将索引设置回 id so:

merged = df.merge(df1, on=['id','year','CalendarWeek','DayName','interval'])
merged = merged.reset_index()

如何使用 Python 基于另一个 DataFrame 中的行 select DataFrame 中的行

How to select rows in a DataFrame based on rows in another DataFrame using Python

python

rows

dataframe

pandas