Pandas 熔化 n 列和顺序控制（计数器）

Question

我有一个宽格式的数据集，其中一行的 x 和 y 坐标位于不同的列中。下面的示例只有 4 个坐标对，但实际数据集有几十个。有一列包含纵坐标数（下例中的n）。

如何使用这么多 xn、yn 列来融合这个数据框？我可以在不显式调用 ('x1', 'y1', 'x2', 'y2', 'x3', 'y3', 'x4' 的情况下执行此操作吗'y4' ... 'xn', 'yn')？我需要跟踪订单，以便 (x1,y1) 对是计数器 1； (x2,y2) 是计数器 2；等等

idx = [1, 2, 3]
colA = [10, 5, 12]
n = [3, 2, 4]
x1 = [0, 1, 7]
y1 = [4, 0, 4]
x2 = [3, 2, 8]
y2 = [5, 1, 5]
x3 = [4, np.nan, 10]
y3 = [3,np.nan, 3]
x4 = [np.nan, np.nan, 11]
y4 = [np.nan, np.nan, 3]

df = pd.DataFrame(list(zip(idx, colA, n, 
                           x1, y1, x2, y2, x3, y3, x4, y4
                          )), 
                  columns =['idx', 'colA', 'n', 
                            'x1', 'y1', 'x2', 'y2', 
                            'x3', 'y3', 'x4', 'y4'
                           ])
display(df)

idx	colA	n	x1	y1	x2	y2	x3	y3	x4	y4
1	10	3	0	4	3	5	4.0	3.0	NaN	NaN
2	5	2	1	0	2	1	NaN	NaN	NaN	NaN
3	12	4	7	4	8	5	10.0	3.0	11.0	3.0

期望输出

idx	colA	计数器	x	y
1	10	1	0	4
1	10	2	3	5
1	10	3	4	3
2	5	1	1	0
2	5	2	2	1
3	12	1	7	4
3	12	2	8	5
3	12	3	10	3
3	12	4	11	3

Answer 1

让我们试试wide_to_long

out = pd.wide_to_long(df,['x','y'],i=['idx','colA','n'],j='cnt').dropna().reset_index()
Out[8]: 
   idx  colA  n  cnt     x    y
0   1    10  3     1   0.0  4.0
1   1    10  3     2   3.0  5.0
2   1    10  3     3   4.0  3.0
3   2     5  2     1   1.0  0.0
4   2     5  2     2   2.0  1.0
5   3    12  4     1   7.0  4.0
6   3    12  4     2   8.0  5.0
7   3    12  4     3  10.0  3.0
8   3    12  4     4  11.0  3.0

Answer 2

wide_to_long 轻松解决这个问题； pivot_longer from pyjanitor 是另一种选择：

# pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index = ['idx', 'colA', 'n'], 
                names_to = (".value", "counter"), 
                names_pattern=r"(.)(.)", 
                sort_by_appearance = True).dropna()
 
    idx  colA  n counter     x    y
0     1    10  3       1   0.0  4.0
1     1    10  3       2   3.0  5.0
2     1    10  3       3   4.0  3.0
4     2     5  2       1   1.0  0.0
5     2     5  2       2   2.0  1.0
8     3    12  4       1   7.0  4.0
9     3    12  4       2   8.0  5.0
10    3    12  4       3  10.0  3.0
11    3    12  4       4  11.0  3.0

names_to中的.value表示保留列名的那些部分，而其余列名加载到counter中。 names_pattern 是捕获组的正则表达式，表示 names_to 列的哪些部分转到哪个。

Pandas 熔化 n 列和顺序控制（计数器）

Pandas melt with n columns and order control (counter)

python

melt

pandas