pandas DataFrame如何使用groupby()拆分合并数据

how does pandas DataFrame use groupby() to split and combine data

我有这样的数据:(实际 DataFrame 结构的较小版本)

week day val 
1   0   8      
1   1   9
1   2   6
1   3   3
1   4   4
1   5   2
1   6   6
1   7   9
2   0   3
2   1   1
2   2   2
2   3   6
2   4   8
2   5   9
2   6   6
2   7   3
3   0   4
3   1   2
3   2   6
3   3   7
3   4   4
3   5   2
3   6   5
3   7   7
1   0   1
1   1   2
1   2   6
1   3   8
1   4   9
1   5   1
1   6   7
1   7   4
2   0   2
2   1   1
2   2   2
2   3   6
2   4   8
2   5   9
2   6   1
2   7   7
3   0   4
3   1   2
3   2   8
3   3   9
3   4   7
3   5   9
3   6   3
3   7   7

而且,我想使用 "week" 和 "day" 作为组键。就像我在下面所做的一样:

data.loc[:,wd_val] = data.groupby([data['week'],data['day']]).mean()

我收到一个错误:

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long 
long'

所以,(1) "longlong" 是什么意思?

其次,我添加参数as_index:

data.loc[:,'wd_val']= 
data[['val']].groupby([data['week'],data['day']],as_index=False).mean()
data

但是,"wd_val" 的值是 NaN:

week    day val wd_val
0   1   0   8   NaN
1   1   1   9   NaN
2   1   2   6   NaN
3   1   3   3   NaN

(2)为什么我写错了?

第三,我使用下面的代码得到了一个数据框:

temp = data[['val']].groupby([data['week'],data['day']]).mean()
temp

            val
week  day   
 1     1    5.5
       2    6.0
       3    5.5
       4    6.5
       5    1.5
       6    6.5
       7    6.5
  2    0    2.5
       1    1.0
       2    2.0

and ,我想将索引("week" 和 "day")切换到 DataFrame 的列中。我该怎么做?

输入:

data = pd.DataFrame([
    [1,0,0],
    [1,0,1],
    [1,1,0],
    [1,1,1],
    [1,2,0],
    [2,2,1],
    [2,2,2],
    [2,2,2]], columns=['week','day','val'])

尝试:

pd.merge(data, data.groupby(['week','day']).mean(),
    on=['week', 'day'], 
    suffixes=('_orig', '_wk_mean'))

输出:

   week  day  val_orig  val_wk_mean
0     1    0         0     0.500000
1     1    0         1     0.500000
2     1    1         0     0.500000
3     1    1         1     0.500000
4     1    2         0     0.000000
5     2    2         1     1.666667
6     2    2         2     1.666667
7     2    2         2     1.666667

long long是数据类型

  • Python type long vs C 'long long'

类似的 sql 语句可能如下所示:

select A.week
    , A.day
    , A.val as val_orig
    , B.val_wk_mean from data as A
join (
    SELECT avg(val) as val_wk_mean
        , week
        , day
    from data
    group by week, day
) as B
on A.week=B.week
and A.day=B.day

另见:

IIUC,你觉得你需要用transform

df['wd_val'] = df.groupby(['week','day'])['val'].transform('mean')

输出:

    week  day  val  wd_val
0      1    0    8     4.5
1      1    1    9     5.5
2      1    2    6     6.0
3      1    3    3     5.5
4      1    4    4     6.5
5      1    5    2     1.5
6      1    6    6     6.5
7      1    7    9     6.5
8      2    0    3     2.5
9      2    1    1     1.0
10     2    2    2     2.0
11     2    3    6     6.0
12     2    4    8     8.0
13     2    5    9     9.0
14     2    6    6     3.5
15     2    7    3     5.0
16     3    0    4     4.0
17     3    1    2     2.0
18     3    2    6     7.0
19     3    3    7     8.0
20     3    4    4     5.5
21     3    5    2     5.5
22     3    6    5     4.0
23     3    7    7     7.0
24     1    0    1     4.5
25     1    1    2     5.5
26     1    2    6     6.0
27     1    3    8     5.5
28     1    4    9     6.5
29     1    5    1     1.5
30     1    6    7     6.5
31     1    7    4     6.5
32     2    0    2     2.5
33     2    1    1     1.0
34     2    2    2     2.0
35     2    3    6     6.0
36     2    4    8     8.0
37     2    5    9     9.0
38     2    6    1     3.5
39     2    7    7     5.0
40     3    0    4     4.0
41     3    1    2     2.0
42     3    2    8     7.0
43     3    3    9     8.0
44     3    4    7     5.5
45     3    5    9     5.5
46     3    6    3     4.0
47     3    7    7     7.0