Python 中的 Groupby 和 ffill 指定列

Groupby and ffill specified columns in Python

我想按 id_CodeTimestamp 对值进行排序(因为时间顺序很重要),然后使用 id_ 和 [=d1 分组 Code,然后使用 ffill for NaN for each group, only on columns V1 and V2 only , while保持其他列不变,return 完整的 table.

d1:


    Type_x  id_             Timestamp               V1   Code Type_y    V2
0   abcd    39-38-30-34     2012-09-20 23:46:05.870 35.5    2    NaN    0
1   abcd    39-38-30-34     2012-09-20 23:46:23.870 44.5    0    NaN    1
2   abcd    39-38-30-34     2012-09-20 23:48:07.870 43.5    0    NaN    1
3   abcd    39-38-30-34     2012-09-20 23:49:48.870 42.5    0    NaN    NaN
4   abcd    39-38-30-34     2012-09-20 23:50:44.870 34.5    2    NaN    NaN

尝试过:

d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code'])['V1', 'V2'].ffill()

只 return 编辑了两列:

        V1      V2
69659   21.5    NaN
300886  21.5    1.0
300887  21.5    0.0
70086   23.0    0.0
300955  23.0    1.0

我该如何正确操作?

您需要退回什么?

d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill()

    

             Type_x     Timestamp    V1  Type_y   V2
1 abcd   39-38-30-34  23:46:23.870  44.5     NaN  1.0
2 abcd   39-38-30-34  23:48:07.870  43.5     NaN  1.0
3 abcd   39-38-30-34  23:49:48.870  42.5     NaN  1.0
0 abcd   39-38-30-34  23:46:05.870  35.5     NaN  0.0
4 abcd  39-38-30-34-  23:50:44.870  34.5     NaN  0.0

d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill().dropna(1)
print(d2)

 

             Type_x     Timestamp    V1   V2
1 abcd   39-38-30-34  23:46:23.870  44.5  1.0
2 abcd   39-38-30-34  23:48:07.870  43.5  1.0
3 abcd   39-38-30-34  23:49:48.870  42.5  1.0
0 abcd   39-38-30-34  23:46:05.870  35.5  0.0
4 abcd  39-38-30-34-  23:50:44.870  34.5  0.0

如果您的实际数据框中除了您想要 groupby 的列和您想要 [=13= 的列之外还有其他列,您可以使用 transform 并逐列进行]:

d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp'])
d2['V1'] = d2.groupby(['id_', 'Code'])['V1'].transform(lambda x: x.ffill())
d2['V2'] = d2.groupby(['id_', 'Code'])['V2'].transform(lambda x: x.ffill())
d2
Out[1]: 
  Type_x          id_                Timestamp    V1  Code  Type_y   V2
1  abcd   39-38-30-34  2012-09-20 23:46:23.870  44.5  0    NaN      1.0
2  abcd   39-38-30-34  2012-09-20 23:48:07.870  43.5  0    NaN      1.0
3  abcd   39-38-30-34  2012-09-20 23:49:48.870  42.5  0    NaN      1.0
0  abcd   39-38-30-34  2012-09-20 23:46:05.870  35.5  2    NaN      0.0
4  abcd   39-38-30-34  2012-09-20 23:50:44.870  34.5  2    NaN      0.0