如何 return 条件 lambda 语句中系列的最大值
How to return the max of a series in a conditional lambda statement
我正在使用下面的代码return df1 和 df2 中的 NID 和 Fdat 相同并且 df2 中的 Tdate 小于 df1 中的 RxDate 的行数。
def breedings(row):
row['TLSCC'] = len(df2[((df2['NID']==row['NID']) & (df2['Fdat']==row['Fdat']) & (df2['Tdate'] < row['RxDate']) )] )
return row
df1 = df1.apply(lambda x:breedings(x),axis=1)
结果放在 df1 中的新列 TLSCC 中
我想搜索 df2 中的记录以使用用于确定满足条件的行数的相同条件来查找最大 SCC 值。即我想 return 这些行中的 max.scc 值到 df1 中的新列。
用 max 替换 len 不起作用,因为它对 SCC 没有 link。我也试过将 max 放入 apply lambda 语句中。没有成功。
感谢任何指点。
下面提供了 df1 和 df2 的数据。
df1
{'NID': {1: '212017463',
2: '212017463',
3: '212017463',
4: '213015177',
5: '213015177',
6: '213015177',
7: '213015177',
8: '213015177',
9: '213015177',
10: '213015177'},
'Fdat': {1: Timestamp('2020-08-06 00:00:00'),
2: Timestamp('2020-08-06 00:00:00'),
3: Timestamp('2020-08-06 00:00:00'),
4: Timestamp('2018-08-30 00:00:00'),
5: Timestamp('2018-08-30 00:00:00'),
6: Timestamp('2018-08-30 00:00:00'),
7: Timestamp('2018-08-30 00:00:00'),
8: Timestamp('2018-08-30 00:00:00'),
9: Timestamp('2018-08-30 00:00:00'),
10: Timestamp('2018-08-30 00:00:00')},
'RxDate': {1: Timestamp('2020-09-30 00:00:00'),
2: Timestamp('2020-10-12 00:00:00'),
3: Timestamp('2020-10-15 00:00:00'),
4: Timestamp('2018-12-06 00:00:00'),
5: Timestamp('2018-12-07 00:00:00'),
6: Timestamp('2018-12-09 00:00:00'),
7: Timestamp('2018-12-11 00:00:00'),
8: Timestamp('2018-12-17 00:00:00'),
9: Timestamp('2018-12-18 00:00:00'),
10: Timestamp('2018-12-19 00:00:00')}}
df2
{'NID': {36: '212017463',
37: '212017463',
38: '212017463',
39: '212017463',
40: '212017463',
41: '212017463',
42: '212017463',
43: '212017463',
44: '212017463',
45: '212017463',
46: '212017463',
47: '212017463',
48: '212017463',
49: '212017463',
50: '212017463',
51: '212017463',
52: '212017463',
53: '212017463',
54: '212017463',
55: '212017463',
56: '212017463',
57: '212017463',
58: '212017463',
59: '212017463',
60: '212017463',
61: '212017463',
62: '212017463',
63: '212017463',
64: '212017463',
65: '212017463',
66: '212017463',
67: '212017463',
68: '212017463',
69: '212017463',
70: '213015177',
71: '213015177',
72: '213015177',
73: '213015177',
74: '213015177',
75: '213015177',
76: '213015177',
77: '213015177',
78: '213015177',
79: '213015177',
80: '213015177',
81: '213015177',
82: '213015177',
83: '213015177',
84: '213015177',
85: '213015177',
86: '213015177',
87: '213015177',
88: '213015177',
89: '213015177',
90: '213015177',
91: '213015177',
92: '213015177',
93: '213015177',
94: '213015177',
95: '213015177',
96: '213015177',
97: '213015177',
98: '213015177',
99: '213015177',
100: '213015177',
101: '213015177'},
'Fdat': {36: Timestamp('2014-08-17 00:00:00'),
37: Timestamp('2014-08-17 00:00:00'),
38: Timestamp('2014-08-17 00:00:00'),
39: Timestamp('2014-08-17 00:00:00'),
40: Timestamp('2014-08-17 00:00:00'),
41: Timestamp('2015-09-22 00:00:00'),
42: Timestamp('2015-09-22 00:00:00'),
43: Timestamp('2015-09-22 00:00:00'),
44: Timestamp('2015-09-22 00:00:00'),
45: Timestamp('2015-09-22 00:00:00'),
46: Timestamp('2015-09-22 00:00:00'),
47: Timestamp('2016-09-01 00:00:00'),
48: Timestamp('2016-09-01 00:00:00'),
49: Timestamp('2016-09-01 00:00:00'),
50: Timestamp('2016-09-01 00:00:00'),
51: Timestamp('2016-09-01 00:00:00'),
52: Timestamp('2017-08-31 00:00:00'),
53: Timestamp('2017-08-31 00:00:00'),
54: Timestamp('2017-08-31 00:00:00'),
55: Timestamp('2017-08-31 00:00:00'),
56: Timestamp('2017-08-31 00:00:00'),
57: Timestamp('2017-08-31 00:00:00'),
58: Timestamp('2018-09-17 00:00:00'),
59: Timestamp('2018-09-17 00:00:00'),
60: Timestamp('2018-09-17 00:00:00'),
61: Timestamp('2018-09-17 00:00:00'),
62: Timestamp('2018-09-17 00:00:00'),
63: Timestamp('2019-08-25 00:00:00'),
64: Timestamp('2019-08-25 00:00:00'),
65: Timestamp('2019-08-25 00:00:00'),
66: Timestamp('2019-08-25 00:00:00'),
67: Timestamp('2019-08-25 00:00:00'),
68: Timestamp('2020-08-06 00:00:00'),
69: Timestamp('2020-08-06 00:00:00'),
70: Timestamp('2015-09-01 00:00:00'),
71: Timestamp('2015-09-01 00:00:00'),
72: Timestamp('2015-09-01 00:00:00'),
73: Timestamp('2015-09-01 00:00:00'),
74: Timestamp('2015-09-01 00:00:00'),
75: Timestamp('2015-09-01 00:00:00'),
76: Timestamp('2016-08-18 00:00:00'),
77: Timestamp('2016-08-18 00:00:00'),
78: Timestamp('2016-08-18 00:00:00'),
79: Timestamp('2016-08-18 00:00:00'),
80: Timestamp('2016-08-18 00:00:00'),
81: Timestamp('2017-09-01 00:00:00'),
82: Timestamp('2017-09-01 00:00:00'),
83: Timestamp('2017-09-01 00:00:00'),
84: Timestamp('2017-09-01 00:00:00'),
85: Timestamp('2017-09-01 00:00:00'),
86: Timestamp('2017-09-01 00:00:00'),
87: Timestamp('2018-08-30 00:00:00'),
88: Timestamp('2018-08-30 00:00:00'),
89: Timestamp('2018-08-30 00:00:00'),
90: Timestamp('2018-08-30 00:00:00'),
91: Timestamp('2018-08-30 00:00:00'),
92: Timestamp('2019-08-09 00:00:00'),
93: Timestamp('2019-08-09 00:00:00'),
94: Timestamp('2019-08-09 00:00:00'),
95: Timestamp('2019-08-09 00:00:00'),
96: Timestamp('2019-08-09 00:00:00'),
97: Timestamp('2020-08-06 00:00:00'),
98: Timestamp('2020-08-06 00:00:00'),
99: Timestamp('2020-08-06 00:00:00'),
100: Timestamp('2020-08-06 00:00:00'),
101: Timestamp('2020-08-06 00:00:00')},
'Tdate': {36: Timestamp('2015-04-16 00:00:00'),
37: Timestamp('2015-02-13 00:00:00'),
38: Timestamp('2014-10-10 00:00:00'),
39: Timestamp('2014-12-17 00:00:00'),
40: Timestamp('2014-11-07 00:00:00'),
41: Timestamp('2015-09-29 00:00:00'),
42: Timestamp('2016-05-30 00:00:00'),
43: Timestamp('2016-02-10 00:00:00'),
44: Timestamp('2016-04-12 00:00:00'),
45: Timestamp('2015-11-06 00:00:00'),
46: Timestamp('2015-12-18 00:00:00'),
47: Timestamp('2017-05-31 00:00:00'),
48: Timestamp('2016-11-28 00:00:00'),
49: Timestamp('2017-04-04 00:00:00'),
50: Timestamp('2017-02-15 00:00:00'),
51: Timestamp('2016-10-03 00:00:00'),
52: Timestamp('2018-05-29 00:00:00'),
53: Timestamp('2018-03-20 00:00:00'),
54: Timestamp('2018-04-24 00:00:00'),
55: Timestamp('2018-02-05 00:00:00'),
56: Timestamp('2017-10-03 00:00:00'),
57: Timestamp('2017-11-28 00:00:00'),
58: Timestamp('2019-06-03 00:00:00'),
59: Timestamp('2019-04-10 00:00:00'),
60: Timestamp('2019-01-14 00:00:00'),
61: Timestamp('2018-10-02 00:00:00'),
62: Timestamp('2018-11-14 00:00:00'),
63: Timestamp('2020-03-24 00:00:00'),
64: Timestamp('2020-05-21 00:00:00'),
65: Timestamp('2019-11-14 00:00:00'),
66: Timestamp('2020-01-21 00:00:00'),
67: Timestamp('2019-09-27 00:00:00'),
68: Timestamp('2020-09-30 00:00:00'),
69: Timestamp('2020-12-16 00:00:00'),
70: Timestamp('2015-09-29 00:00:00'),
71: Timestamp('2016-04-12 00:00:00'),
72: Timestamp('2016-05-30 00:00:00'),
73: Timestamp('2016-02-10 00:00:00'),
74: Timestamp('2015-11-06 00:00:00'),
75: Timestamp('2015-12-18 00:00:00'),
76: Timestamp('2017-04-04 00:00:00'),
77: Timestamp('2017-05-31 00:00:00'),
78: Timestamp('2017-02-15 00:00:00'),
79: Timestamp('2016-11-28 00:00:00'),
80: Timestamp('2016-10-03 00:00:00'),
81: Timestamp('2018-05-29 00:00:00'),
82: Timestamp('2018-03-20 00:00:00'),
83: Timestamp('2018-04-24 00:00:00'),
84: Timestamp('2017-10-03 00:00:00'),
85: Timestamp('2018-02-05 00:00:00'),
86: Timestamp('2017-11-28 00:00:00'),
87: Timestamp('2019-04-10 00:00:00'),
88: Timestamp('2019-06-03 00:00:00'),
89: Timestamp('2019-01-14 00:00:00'),
90: Timestamp('2018-11-14 00:00:00'),
91: Timestamp('2018-10-02 00:00:00'),
92: Timestamp('2020-05-21 00:00:00'),
93: Timestamp('2020-03-24 00:00:00'),
94: Timestamp('2019-11-14 00:00:00'),
95: Timestamp('2020-01-21 00:00:00'),
96: Timestamp('2019-09-27 00:00:00'),
97: Timestamp('2021-05-13 00:00:00'),
98: Timestamp('2021-01-27 00:00:00'),
99: Timestamp('2021-03-18 00:00:00'),
100: Timestamp('2020-09-30 00:00:00'),
101: Timestamp('2020-12-16 00:00:00')},
'SCC': {36: 26,
37: 853,
38: 66,
39: 39,
40: 18,
41: 1704,
42: 16,
43: 13,
44: 23,
45: 51,
46: 15,
47: 95,
48: 21,
49: 51,
50: 57,
51: 65,
52: 55,
53: 77,
54: 53,
55: 65,
56: 17,
57: 51,
58: 113,
59: 57,
60: 58,
61: 520,
62: 735,
63: 50,
64: 98,
65: 12,
66: 109,
67: 19,
68: 6939,
69: 1891,
70: 166,
71: 15,
72: 25,
73: 14,
74: 34,
75: 24,
76: 45,
77: 74,
78: 17,
79: 3,
80: 7,
81: 32,
82: 27,
83: 122,
84: 4,
85: 62,
86: 7,
87: 35,
88: 74,
89: 117,
90: 110,
91: 2,
92: 55,
93: 316,
94: 17,
95: 19,
96: 8,
97: 77,
98: 12,
99: 50,
100: 16,
101: 19}}
我会先加入,然后对日期应用条件以生成符合您条件的加入 table:
>>> join = df1.merge(df2, on=['NID', 'Fdat'])
>>> join = join[join['Tdate'] < join['RxDate']]
>>> join
NID Fdat RxDate Tdate SCC
2 212017463 2020-08-06 2020-10-12 2020-09-30 6939
4 212017463 2020-08-06 2020-10-15 2020-09-30 6939
9 213015177 2018-08-30 2018-12-06 2018-11-14 110
10 213015177 2018-08-30 2018-12-06 2018-10-02 2
14 213015177 2018-08-30 2018-12-07 2018-11-14 110
15 213015177 2018-08-30 2018-12-07 2018-10-02 2
19 213015177 2018-08-30 2018-12-09 2018-11-14 110
20 213015177 2018-08-30 2018-12-09 2018-10-02 2
24 213015177 2018-08-30 2018-12-11 2018-11-14 110
25 213015177 2018-08-30 2018-12-11 2018-10-02 2
29 213015177 2018-08-30 2018-12-17 2018-11-14 110
30 213015177 2018-08-30 2018-12-17 2018-10-02 2
34 213015177 2018-08-30 2018-12-18 2018-11-14 110
35 213015177 2018-08-30 2018-12-18 2018-10-02 2
39 213015177 2018-08-30 2018-12-19 2018-11-14 110
40 213015177 2018-08-30 2018-12-19 2018-10-02 2
然后 table 上的 groupby 给你所有你想要的,.size()
的记录数和 ['SCC'].max()
的最大值:
>>> join.groupby(df1.columns.to_list()).size().rename('TLSCC')
NID Fdat RxDate
212017463 2020-08-06 2020-10-12 1
2020-10-15 1
213015177 2018-08-30 2018-12-06 2
2018-12-07 2
2018-12-09 2
2018-12-11 2
2018-12-17 2
2018-12-18 2
2018-12-19 2
Name: TLSCC, dtype: int64
>>> join.groupby(df1.columns.to_list())['SCC'].max()
NID Fdat RxDate
212017463 2020-08-06 2020-10-12 6939
2020-10-15 6939
213015177 2018-08-30 2018-12-06 110
2018-12-07 110
2018-12-09 110
2018-12-11 110
2018-12-17 110
2018-12-18 110
2018-12-19 110
然后您可以使用 .merge
:
将它们全部放回 df1
>>> group = join.groupby(df1.columns.to_list())
>>> df1 = df1.merge(group.size().rename('TLSCC').reset_index())\
... .merge(group['SCC'].max().rename('max.scc').reset_index())
...
>>> df1
NID Fdat RxDate TLSCC max.scc
0 212017463 2020-08-06 2020-10-12 1 6939
1 212017463 2020-08-06 2020-10-15 1 6939
2 213015177 2018-08-30 2018-12-06 2 110
3 213015177 2018-08-30 2018-12-07 2 110
4 213015177 2018-08-30 2018-12-09 2 110
5 213015177 2018-08-30 2018-12-11 2 110
6 213015177 2018-08-30 2018-12-17 2 110
7 213015177 2018-08-30 2018-12-18 2 110
8 213015177 2018-08-30 2018-12-19 2 110
我正在使用下面的代码return df1 和 df2 中的 NID 和 Fdat 相同并且 df2 中的 Tdate 小于 df1 中的 RxDate 的行数。
def breedings(row):
row['TLSCC'] = len(df2[((df2['NID']==row['NID']) & (df2['Fdat']==row['Fdat']) & (df2['Tdate'] < row['RxDate']) )] )
return row
df1 = df1.apply(lambda x:breedings(x),axis=1)
结果放在 df1 中的新列 TLSCC 中
我想搜索 df2 中的记录以使用用于确定满足条件的行数的相同条件来查找最大 SCC 值。即我想 return 这些行中的 max.scc 值到 df1 中的新列。
用 max 替换 len 不起作用,因为它对 SCC 没有 link。我也试过将 max 放入 apply lambda 语句中。没有成功。
感谢任何指点。
下面提供了 df1 和 df2 的数据。
df1
{'NID': {1: '212017463',
2: '212017463',
3: '212017463',
4: '213015177',
5: '213015177',
6: '213015177',
7: '213015177',
8: '213015177',
9: '213015177',
10: '213015177'},
'Fdat': {1: Timestamp('2020-08-06 00:00:00'),
2: Timestamp('2020-08-06 00:00:00'),
3: Timestamp('2020-08-06 00:00:00'),
4: Timestamp('2018-08-30 00:00:00'),
5: Timestamp('2018-08-30 00:00:00'),
6: Timestamp('2018-08-30 00:00:00'),
7: Timestamp('2018-08-30 00:00:00'),
8: Timestamp('2018-08-30 00:00:00'),
9: Timestamp('2018-08-30 00:00:00'),
10: Timestamp('2018-08-30 00:00:00')},
'RxDate': {1: Timestamp('2020-09-30 00:00:00'),
2: Timestamp('2020-10-12 00:00:00'),
3: Timestamp('2020-10-15 00:00:00'),
4: Timestamp('2018-12-06 00:00:00'),
5: Timestamp('2018-12-07 00:00:00'),
6: Timestamp('2018-12-09 00:00:00'),
7: Timestamp('2018-12-11 00:00:00'),
8: Timestamp('2018-12-17 00:00:00'),
9: Timestamp('2018-12-18 00:00:00'),
10: Timestamp('2018-12-19 00:00:00')}}
df2
{'NID': {36: '212017463',
37: '212017463',
38: '212017463',
39: '212017463',
40: '212017463',
41: '212017463',
42: '212017463',
43: '212017463',
44: '212017463',
45: '212017463',
46: '212017463',
47: '212017463',
48: '212017463',
49: '212017463',
50: '212017463',
51: '212017463',
52: '212017463',
53: '212017463',
54: '212017463',
55: '212017463',
56: '212017463',
57: '212017463',
58: '212017463',
59: '212017463',
60: '212017463',
61: '212017463',
62: '212017463',
63: '212017463',
64: '212017463',
65: '212017463',
66: '212017463',
67: '212017463',
68: '212017463',
69: '212017463',
70: '213015177',
71: '213015177',
72: '213015177',
73: '213015177',
74: '213015177',
75: '213015177',
76: '213015177',
77: '213015177',
78: '213015177',
79: '213015177',
80: '213015177',
81: '213015177',
82: '213015177',
83: '213015177',
84: '213015177',
85: '213015177',
86: '213015177',
87: '213015177',
88: '213015177',
89: '213015177',
90: '213015177',
91: '213015177',
92: '213015177',
93: '213015177',
94: '213015177',
95: '213015177',
96: '213015177',
97: '213015177',
98: '213015177',
99: '213015177',
100: '213015177',
101: '213015177'},
'Fdat': {36: Timestamp('2014-08-17 00:00:00'),
37: Timestamp('2014-08-17 00:00:00'),
38: Timestamp('2014-08-17 00:00:00'),
39: Timestamp('2014-08-17 00:00:00'),
40: Timestamp('2014-08-17 00:00:00'),
41: Timestamp('2015-09-22 00:00:00'),
42: Timestamp('2015-09-22 00:00:00'),
43: Timestamp('2015-09-22 00:00:00'),
44: Timestamp('2015-09-22 00:00:00'),
45: Timestamp('2015-09-22 00:00:00'),
46: Timestamp('2015-09-22 00:00:00'),
47: Timestamp('2016-09-01 00:00:00'),
48: Timestamp('2016-09-01 00:00:00'),
49: Timestamp('2016-09-01 00:00:00'),
50: Timestamp('2016-09-01 00:00:00'),
51: Timestamp('2016-09-01 00:00:00'),
52: Timestamp('2017-08-31 00:00:00'),
53: Timestamp('2017-08-31 00:00:00'),
54: Timestamp('2017-08-31 00:00:00'),
55: Timestamp('2017-08-31 00:00:00'),
56: Timestamp('2017-08-31 00:00:00'),
57: Timestamp('2017-08-31 00:00:00'),
58: Timestamp('2018-09-17 00:00:00'),
59: Timestamp('2018-09-17 00:00:00'),
60: Timestamp('2018-09-17 00:00:00'),
61: Timestamp('2018-09-17 00:00:00'),
62: Timestamp('2018-09-17 00:00:00'),
63: Timestamp('2019-08-25 00:00:00'),
64: Timestamp('2019-08-25 00:00:00'),
65: Timestamp('2019-08-25 00:00:00'),
66: Timestamp('2019-08-25 00:00:00'),
67: Timestamp('2019-08-25 00:00:00'),
68: Timestamp('2020-08-06 00:00:00'),
69: Timestamp('2020-08-06 00:00:00'),
70: Timestamp('2015-09-01 00:00:00'),
71: Timestamp('2015-09-01 00:00:00'),
72: Timestamp('2015-09-01 00:00:00'),
73: Timestamp('2015-09-01 00:00:00'),
74: Timestamp('2015-09-01 00:00:00'),
75: Timestamp('2015-09-01 00:00:00'),
76: Timestamp('2016-08-18 00:00:00'),
77: Timestamp('2016-08-18 00:00:00'),
78: Timestamp('2016-08-18 00:00:00'),
79: Timestamp('2016-08-18 00:00:00'),
80: Timestamp('2016-08-18 00:00:00'),
81: Timestamp('2017-09-01 00:00:00'),
82: Timestamp('2017-09-01 00:00:00'),
83: Timestamp('2017-09-01 00:00:00'),
84: Timestamp('2017-09-01 00:00:00'),
85: Timestamp('2017-09-01 00:00:00'),
86: Timestamp('2017-09-01 00:00:00'),
87: Timestamp('2018-08-30 00:00:00'),
88: Timestamp('2018-08-30 00:00:00'),
89: Timestamp('2018-08-30 00:00:00'),
90: Timestamp('2018-08-30 00:00:00'),
91: Timestamp('2018-08-30 00:00:00'),
92: Timestamp('2019-08-09 00:00:00'),
93: Timestamp('2019-08-09 00:00:00'),
94: Timestamp('2019-08-09 00:00:00'),
95: Timestamp('2019-08-09 00:00:00'),
96: Timestamp('2019-08-09 00:00:00'),
97: Timestamp('2020-08-06 00:00:00'),
98: Timestamp('2020-08-06 00:00:00'),
99: Timestamp('2020-08-06 00:00:00'),
100: Timestamp('2020-08-06 00:00:00'),
101: Timestamp('2020-08-06 00:00:00')},
'Tdate': {36: Timestamp('2015-04-16 00:00:00'),
37: Timestamp('2015-02-13 00:00:00'),
38: Timestamp('2014-10-10 00:00:00'),
39: Timestamp('2014-12-17 00:00:00'),
40: Timestamp('2014-11-07 00:00:00'),
41: Timestamp('2015-09-29 00:00:00'),
42: Timestamp('2016-05-30 00:00:00'),
43: Timestamp('2016-02-10 00:00:00'),
44: Timestamp('2016-04-12 00:00:00'),
45: Timestamp('2015-11-06 00:00:00'),
46: Timestamp('2015-12-18 00:00:00'),
47: Timestamp('2017-05-31 00:00:00'),
48: Timestamp('2016-11-28 00:00:00'),
49: Timestamp('2017-04-04 00:00:00'),
50: Timestamp('2017-02-15 00:00:00'),
51: Timestamp('2016-10-03 00:00:00'),
52: Timestamp('2018-05-29 00:00:00'),
53: Timestamp('2018-03-20 00:00:00'),
54: Timestamp('2018-04-24 00:00:00'),
55: Timestamp('2018-02-05 00:00:00'),
56: Timestamp('2017-10-03 00:00:00'),
57: Timestamp('2017-11-28 00:00:00'),
58: Timestamp('2019-06-03 00:00:00'),
59: Timestamp('2019-04-10 00:00:00'),
60: Timestamp('2019-01-14 00:00:00'),
61: Timestamp('2018-10-02 00:00:00'),
62: Timestamp('2018-11-14 00:00:00'),
63: Timestamp('2020-03-24 00:00:00'),
64: Timestamp('2020-05-21 00:00:00'),
65: Timestamp('2019-11-14 00:00:00'),
66: Timestamp('2020-01-21 00:00:00'),
67: Timestamp('2019-09-27 00:00:00'),
68: Timestamp('2020-09-30 00:00:00'),
69: Timestamp('2020-12-16 00:00:00'),
70: Timestamp('2015-09-29 00:00:00'),
71: Timestamp('2016-04-12 00:00:00'),
72: Timestamp('2016-05-30 00:00:00'),
73: Timestamp('2016-02-10 00:00:00'),
74: Timestamp('2015-11-06 00:00:00'),
75: Timestamp('2015-12-18 00:00:00'),
76: Timestamp('2017-04-04 00:00:00'),
77: Timestamp('2017-05-31 00:00:00'),
78: Timestamp('2017-02-15 00:00:00'),
79: Timestamp('2016-11-28 00:00:00'),
80: Timestamp('2016-10-03 00:00:00'),
81: Timestamp('2018-05-29 00:00:00'),
82: Timestamp('2018-03-20 00:00:00'),
83: Timestamp('2018-04-24 00:00:00'),
84: Timestamp('2017-10-03 00:00:00'),
85: Timestamp('2018-02-05 00:00:00'),
86: Timestamp('2017-11-28 00:00:00'),
87: Timestamp('2019-04-10 00:00:00'),
88: Timestamp('2019-06-03 00:00:00'),
89: Timestamp('2019-01-14 00:00:00'),
90: Timestamp('2018-11-14 00:00:00'),
91: Timestamp('2018-10-02 00:00:00'),
92: Timestamp('2020-05-21 00:00:00'),
93: Timestamp('2020-03-24 00:00:00'),
94: Timestamp('2019-11-14 00:00:00'),
95: Timestamp('2020-01-21 00:00:00'),
96: Timestamp('2019-09-27 00:00:00'),
97: Timestamp('2021-05-13 00:00:00'),
98: Timestamp('2021-01-27 00:00:00'),
99: Timestamp('2021-03-18 00:00:00'),
100: Timestamp('2020-09-30 00:00:00'),
101: Timestamp('2020-12-16 00:00:00')},
'SCC': {36: 26,
37: 853,
38: 66,
39: 39,
40: 18,
41: 1704,
42: 16,
43: 13,
44: 23,
45: 51,
46: 15,
47: 95,
48: 21,
49: 51,
50: 57,
51: 65,
52: 55,
53: 77,
54: 53,
55: 65,
56: 17,
57: 51,
58: 113,
59: 57,
60: 58,
61: 520,
62: 735,
63: 50,
64: 98,
65: 12,
66: 109,
67: 19,
68: 6939,
69: 1891,
70: 166,
71: 15,
72: 25,
73: 14,
74: 34,
75: 24,
76: 45,
77: 74,
78: 17,
79: 3,
80: 7,
81: 32,
82: 27,
83: 122,
84: 4,
85: 62,
86: 7,
87: 35,
88: 74,
89: 117,
90: 110,
91: 2,
92: 55,
93: 316,
94: 17,
95: 19,
96: 8,
97: 77,
98: 12,
99: 50,
100: 16,
101: 19}}
我会先加入,然后对日期应用条件以生成符合您条件的加入 table:
>>> join = df1.merge(df2, on=['NID', 'Fdat'])
>>> join = join[join['Tdate'] < join['RxDate']]
>>> join
NID Fdat RxDate Tdate SCC
2 212017463 2020-08-06 2020-10-12 2020-09-30 6939
4 212017463 2020-08-06 2020-10-15 2020-09-30 6939
9 213015177 2018-08-30 2018-12-06 2018-11-14 110
10 213015177 2018-08-30 2018-12-06 2018-10-02 2
14 213015177 2018-08-30 2018-12-07 2018-11-14 110
15 213015177 2018-08-30 2018-12-07 2018-10-02 2
19 213015177 2018-08-30 2018-12-09 2018-11-14 110
20 213015177 2018-08-30 2018-12-09 2018-10-02 2
24 213015177 2018-08-30 2018-12-11 2018-11-14 110
25 213015177 2018-08-30 2018-12-11 2018-10-02 2
29 213015177 2018-08-30 2018-12-17 2018-11-14 110
30 213015177 2018-08-30 2018-12-17 2018-10-02 2
34 213015177 2018-08-30 2018-12-18 2018-11-14 110
35 213015177 2018-08-30 2018-12-18 2018-10-02 2
39 213015177 2018-08-30 2018-12-19 2018-11-14 110
40 213015177 2018-08-30 2018-12-19 2018-10-02 2
然后 table 上的 groupby 给你所有你想要的,.size()
的记录数和 ['SCC'].max()
的最大值:
>>> join.groupby(df1.columns.to_list()).size().rename('TLSCC')
NID Fdat RxDate
212017463 2020-08-06 2020-10-12 1
2020-10-15 1
213015177 2018-08-30 2018-12-06 2
2018-12-07 2
2018-12-09 2
2018-12-11 2
2018-12-17 2
2018-12-18 2
2018-12-19 2
Name: TLSCC, dtype: int64
>>> join.groupby(df1.columns.to_list())['SCC'].max()
NID Fdat RxDate
212017463 2020-08-06 2020-10-12 6939
2020-10-15 6939
213015177 2018-08-30 2018-12-06 110
2018-12-07 110
2018-12-09 110
2018-12-11 110
2018-12-17 110
2018-12-18 110
2018-12-19 110
然后您可以使用 .merge
:
>>> group = join.groupby(df1.columns.to_list())
>>> df1 = df1.merge(group.size().rename('TLSCC').reset_index())\
... .merge(group['SCC'].max().rename('max.scc').reset_index())
...
>>> df1
NID Fdat RxDate TLSCC max.scc
0 212017463 2020-08-06 2020-10-12 1 6939
1 212017463 2020-08-06 2020-10-15 1 6939
2 213015177 2018-08-30 2018-12-06 2 110
3 213015177 2018-08-30 2018-12-07 2 110
4 213015177 2018-08-30 2018-12-09 2 110
5 213015177 2018-08-30 2018-12-11 2 110
6 213015177 2018-08-30 2018-12-17 2 110
7 213015177 2018-08-30 2018-12-18 2 110
8 213015177 2018-08-30 2018-12-19 2 110