压缩一个数组，其中某些行仅相差一列（到具有唯一行但更多列的行）

Question

我有一个长数组（可以是 pandas 或 numpy，方便的话）其中有些行的前两列相同（x-y 位置），第三列是唯一的（时间），例如：

x     y     t
0.    0.    10.
0.    0.    11.
0.    0.    12.
0.    1.    13.
0.    1.    14.
1.    1.    15.

位置已分组，但每个位置可能列出 1、2 或 3 个时间值，这意味着可能有 1、2 或 3 列具有相同的 x 和 y。该数组需要 reshaped/condensed 这样每个位置都有自己的行，具有时间的最小值和最大值 - 即目标是：

x     y     t1    t2
0.    0.    10.   12.
0.    1.    13.   14.
1.    1.    15.   inf

在 pandas 或 numpy 中有 simple/elegant 方法吗？我试过循环，但它们很乱而且效率极低，我试过使用 np.unique:

target_array = np.unique(initial_array[:, 0:2], axis=0)

产量

x     y 
0.    0.
0.    1.
1.    1.

这是一个良好的开端，但后来我坚持生成最后两列。

Answer 1

IIUC，可以用

out = (df.groupby(['x', 'y'])['t']
       .agg(t1='min', t2='max', c='count')
       .reset_index()
       .pipe(lambda df: df.assign(t2=df['t2'].mask(df['c'].eq(1), np.inf)) )
       .drop(columns='c')
       )

print(out)

     x    y    t1    t2
0  0.0  0.0  10.0  12.0
1  0.0  1.0  13.0  14.0
2  1.0  1.0  15.0   inf

压缩一个数组，其中某些行仅相差一列（到具有唯一行但更多列的行）

Condensing an array where some rows differ only by one column (to one with unique rows but more columns)

python

numpy

pandas