Pandas groupby 发现列之间的模式

Question

我有一个大数据框 df:

Col1    Col2    Col3    Val1    Val2
A1      B1      c1      0.2    -0.3
A1      B1      c2     -0.3     0.3
A1      B1      c3      0.5     0.2
A2      B2      c1     -0.3     0.1
A2      B2      c2      0.7    -0.3
A3      B3      c1     -0.3     0.3
A3      B3      c2     -0.2     0.3
A3      B3      c3      0.5     0.2
A3      B3      c4      0.8     0.7

在每组 Col1-Col2 中有交替模式用于 Val1 and Val2 的符号，即 Val1 为正且 Val2 为负的一对，反之亦然。我想实现以下目标：

Col1    Col2    Col3    Val1    Val2  Pattern
A1      B1      c1      0.2    -0.3   Y
A1      B1      c2     -0.3     0.3   Y
A1      B1      c3      0.5     0.2   Y
A2      B2      c1     -0.3     0.1   Y
A2      B2      c2      0.7    -0.3   Y
A3      B3      c1     -0.3     0.3   N
A3      B3      c2     -0.2     0.3   N
A3      B3      c3     -0.5    -0.2   N
A3      B3      c4      0.8     0.7   N

A1-B1 和 A2-B2 有一对符号相反的 Val1 和 Val2 但 A3-B3 有 none.

考虑到数据帧很大，我不确定如何继续上述操作。

编辑：

原因 A1-B1 是 'Y' 是因为有 (0.2, -0.3) AND (-0.3, 0.3)

A2-B2 有 (-0.3, 0.1) 和 (0.7, -0.3)

A3-B3 没有 2 个这样的集合。它只有 (-0.3, 0.3) 之类的，而没有 Val1, Val2 之类的（正，负）。

即要归类为模式，它必须具有 (positive, negative) 和 (negative, positive)

Answer 1

使用np.sign checking with DataFrame.eq and Groupby.transform

signs = np.sign(df[['Val1', 'Val2']])
m1 = signs.eq([1,-1]).all(axis=1)
m2 = signs.eq([-1,1]).all(axis=1)
df['Pattern'] = pd.concat([m1, m2], axis=1)\
                  .groupby([df['Col1'], df['Col2']])\
                  .transform('any').all(axis=1)\
                  .map({True:'Y', False: 'N'})
print(df)
  Col1 Col2 Col3  Val1  Val2 Pattern
0   A1   B1   c1   0.2  -0.3       Y
1   A1   B1   c2  -0.3   0.3       Y
2   A1   B1   c3   0.5   0.2       Y
3   A2   B2   c1  -0.3   0.1       Y
4   A2   B2   c2   0.7  -0.3       Y
5   A3   B3   c1  -0.3   0.3       N
6   A3   B3   c2  -0.2   0.3       N
7   A3   B3   c3   0.5   0.2       N
8   A3   B3   c4   0.8   0.7       N

Answer 2

您可以按 Col1 和 Col2 列分组，然后使用 np.sign 检查系列中值的符号。然后减去Val1和Val2的符号。如果两个数字的符号相反，结果将是 2 或 -2

out = (df.groupby(['Col1', 'Col2'])
       .apply(lambda g: 'Y'
              if {2, -2}.issubset(set(np.sign(g['Val1']).sub(np.sign(g['Val2'])).unique()))
              else 'N')
       .to_frame('Pattern').reset_index())

print(out)

  Col1 Col2 Pattern
0   A1   B1       Y
1   A2   B2       Y
2   A3   B3       N

最后，合并结果到原始dataframe

df['Pattern'] = df.merge(out, on=['Col1', 'Col2'], how='left')['Pattern']

print(df)

  Col1 Col2 Col3  Val1  Val2 Pattern
0   A1   B1   c1   0.2  -0.3       Y
1   A1   B1   c2  -0.3   0.3       Y
2   A1   B1   c3   0.5   0.2       Y
3   A2   B2   c1  -0.3   0.1       Y
4   A2   B2   c2   0.7  -0.3       Y
5   A3   B3   c1  -0.3   0.3       N
6   A3   B3   c2  -0.2   0.3       N
7   A3   B3   c3   0.5   0.2       N
8   A3   B3   c4   0.8   0.7       N

Answer 3

您可以计算布尔掩码并将它们按组组合，然后按行组合：

m1 = df['Val1'].lt(0)  # Val1 negative
m2 = df['Val2'].lt(0)  # Val2 negative

mask = (pd.concat([m1&~m2,  # Val1 negative and Val2 positive
                   ~m1&m2], # Val1 positive and Val2 negative
                  axis=1)
          .groupby([df['Col1'], df['Col2']])
          .transform('any')  # is there at least one match per group?
          .all(1)            # were there both True for above?
        )

df['Pattern'] = np.where(mask, 'Y', 'N')

输出：

  Col1 Col2 Col3  Val1  Val2 Pattern
0   A1   B1   c1   0.2  -0.3       Y
1   A1   B1   c2  -0.3   0.3       Y
2   A1   B1   c3   0.5   0.2       Y
3   A2   B2   c1  -0.3   0.1       Y
4   A2   B2   c2   0.7  -0.3       Y
5   A3   B3   c1  -0.3   0.3       N
6   A3   B3   c2  -0.2   0.3       N
7   A3   B3   c3   0.5   0.2       N
8   A3   B3   c4   0.8   0.7       N

中间体：

  Col1 Col2 Col3  Val1  Val2 Pattern  m1&~m2  ~m1&m2  any(m1&~m2)  any(~m1&m2)   mask
0   A1   B1   c1   0.2  -0.3       Y   False    True         True         True   True
1   A1   B1   c2  -0.3   0.3       Y    True   False         True         True   True
2   A1   B1   c3   0.5   0.2       Y   False   False         True         True   True
3   A2   B2   c1  -0.3   0.1       Y    True   False         True         True   True
4   A2   B2   c2   0.7  -0.3       Y   False    True         True         True   True
5   A3   B3   c1  -0.3   0.3       N    True   False         True        False  False
6   A3   B3   c2  -0.2   0.3       N    True   False         True        False  False
7   A3   B3   c3   0.5   0.2       N   False   False         True        False  False
8   A3   B3   c4   0.8   0.7       N   False   False         True        False  False

Pandas groupby 发现列之间的模式

Pandas groupby spotting a pattern between columns

pandas

python-3.8