选择 DataFrame 中不在 Series 中的行

Selecting rows in a DataFrame that are not in a Series

所以我有一个名为 trips 的 DataFrame,其中包含以下信息:

route_id     service_id  shape_id                      trip_id
0     BX12  GH_B6-Weekday  BX120805  GH_B6-Weekday-004000_BX12_1
1     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-009000_BX12_1
2     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-013000_BX12_1
3     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-017000_BX12_1
4     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-021000_BX12_1
...

我还有一个名为 invalidTrips 的系列,其中包含以下信息:

trip_id
11760139-BPPB6-BP_B6-Weekday-10         16
11760139-BPPB6-BP_B6-Weekday-10-SDon    16
11760140-BPPB6-BP_B6-Weekday-10         19
11760140-BPPB6-BP_B6-Weekday-10-SDon    19
11760141-BPPB6-BP_B6-Weekday-10         16
...

我如何 select trips 中没有 trip_id 匹配 invalid_trips 中的 trip_id 的所有行?

编辑:所以现在我有了这段代码:

# Grab the number of trips made outside min and max hour.
tooEarly = stopTimes['arrival_time'] < base_mintime
tooLate = stopTimes['departure_time'] > base_maxtime
invalidTrips = stopTimes[(tooEarly | tooLate)].groupby('trip_id').size()

# Filter out the invalid trips.
print(invalidTrips.size)
print(trips.size)
in_validTrips = ~trips.trip_id.isin(invalidTrips)
validTrips = trips[in_validTrips][['route_id', 'service_id', 'shape_id']]
print(validTrips.size)

无论出于何种原因,尽管 invalidTrips.size 可以根据 base_mintimebase_maxtime 发生变化,但 validTrips.size 保持不变,尽管我希望它是反向依赖的在 invalidTrips.size 上。为什么会这样?

(有关更多背景信息,这些都是从 GTFS 数据中提取的。)

更新:

尝试isin()函数和~运算符

根据@EdChum 在评论中的更正 - 如果 invalid_trips 是系列类型:

trips[~trips.trip_id.isin(invalidTrips.index)]

测试:

In [39]: invalidTrips
Out[39]:
trip_id
11760139-BPPB6-BP_B6-Weekday-10         16
11760139-BPPB6-BP_B6-Weekday-10-SDon    16
11760140-BPPB6-BP_B6-Weekday-10         19
11760140-BPPB6-BP_B6-Weekday-10-SDon    19
11760141-BPPB6-BP_B6-Weekday-10         16
GH_B6-Weekday-017000_BX12_1             11         # <-- i've added it intentionally
Name: val, dtype: int64

In [40]: trips
Out[40]:
  route_id     service_id  shape_id                      trip_id
0     BX12  GH_B6-Weekday  BX120805  GH_B6-Weekday-004000_BX12_1
1     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-009000_BX12_1
2     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-013000_BX12_1
3     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-017000_BX12_1  # <-- exclude this row 
4     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-021000_BX12_1

In [41]: trips[~trips.trip_id.isin(invalidTrips.index)]
Out[41]:
  route_id     service_id  shape_id                      trip_id
0     BX12  GH_B6-Weekday  BX120805  GH_B6-Weekday-004000_BX12_1
1     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-009000_BX12_1
2     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-013000_BX12_1
4     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-021000_BX12_1