pandas DataFrame如何使用groupby()拆分合并数据
how does pandas DataFrame use groupby() to split and combine data
我有这样的数据:(实际 DataFrame 结构的较小版本)
week day val
1 0 8
1 1 9
1 2 6
1 3 3
1 4 4
1 5 2
1 6 6
1 7 9
2 0 3
2 1 1
2 2 2
2 3 6
2 4 8
2 5 9
2 6 6
2 7 3
3 0 4
3 1 2
3 2 6
3 3 7
3 4 4
3 5 2
3 6 5
3 7 7
1 0 1
1 1 2
1 2 6
1 3 8
1 4 9
1 5 1
1 6 7
1 7 4
2 0 2
2 1 1
2 2 2
2 3 6
2 4 8
2 5 9
2 6 1
2 7 7
3 0 4
3 1 2
3 2 8
3 3 9
3 4 7
3 5 9
3 6 3
3 7 7
而且,我想使用 "week" 和 "day" 作为组键。就像我在下面所做的一样:
data.loc[:,wd_val] = data.groupby([data['week'],data['day']]).mean()
我收到一个错误:
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long
long'
所以,(1) "longlong" 是什么意思?
其次,我添加参数as_index:
data.loc[:,'wd_val']=
data[['val']].groupby([data['week'],data['day']],as_index=False).mean()
data
但是,"wd_val" 的值是 NaN:
week day val wd_val
0 1 0 8 NaN
1 1 1 9 NaN
2 1 2 6 NaN
3 1 3 3 NaN
(2)为什么我写错了?
第三,我使用下面的代码得到了一个数据框:
temp = data[['val']].groupby([data['week'],data['day']]).mean()
temp
val
week day
1 1 5.5
2 6.0
3 5.5
4 6.5
5 1.5
6 6.5
7 6.5
2 0 2.5
1 1.0
2 2.0
and ,我想将索引("week" 和 "day")切换到 DataFrame 的列中。我该怎么做?
输入:
data = pd.DataFrame([
[1,0,0],
[1,0,1],
[1,1,0],
[1,1,1],
[1,2,0],
[2,2,1],
[2,2,2],
[2,2,2]], columns=['week','day','val'])
尝试:
pd.merge(data, data.groupby(['week','day']).mean(),
on=['week', 'day'],
suffixes=('_orig', '_wk_mean'))
输出:
week day val_orig val_wk_mean
0 1 0 0 0.500000
1 1 0 1 0.500000
2 1 1 0 0.500000
3 1 1 1 0.500000
4 1 2 0 0.000000
5 2 2 1 1.666667
6 2 2 2 1.666667
7 2 2 2 1.666667
long long
是数据类型
- Python type long vs C 'long long'
类似的 sql 语句可能如下所示:
select A.week
, A.day
, A.val as val_orig
, B.val_wk_mean from data as A
join (
SELECT avg(val) as val_wk_mean
, week
, day
from data
group by week, day
) as B
on A.week=B.week
and A.day=B.day
另见:
IIUC,你觉得你需要用transform
df['wd_val'] = df.groupby(['week','day'])['val'].transform('mean')
输出:
week day val wd_val
0 1 0 8 4.5
1 1 1 9 5.5
2 1 2 6 6.0
3 1 3 3 5.5
4 1 4 4 6.5
5 1 5 2 1.5
6 1 6 6 6.5
7 1 7 9 6.5
8 2 0 3 2.5
9 2 1 1 1.0
10 2 2 2 2.0
11 2 3 6 6.0
12 2 4 8 8.0
13 2 5 9 9.0
14 2 6 6 3.5
15 2 7 3 5.0
16 3 0 4 4.0
17 3 1 2 2.0
18 3 2 6 7.0
19 3 3 7 8.0
20 3 4 4 5.5
21 3 5 2 5.5
22 3 6 5 4.0
23 3 7 7 7.0
24 1 0 1 4.5
25 1 1 2 5.5
26 1 2 6 6.0
27 1 3 8 5.5
28 1 4 9 6.5
29 1 5 1 1.5
30 1 6 7 6.5
31 1 7 4 6.5
32 2 0 2 2.5
33 2 1 1 1.0
34 2 2 2 2.0
35 2 3 6 6.0
36 2 4 8 8.0
37 2 5 9 9.0
38 2 6 1 3.5
39 2 7 7 5.0
40 3 0 4 4.0
41 3 1 2 2.0
42 3 2 8 7.0
43 3 3 9 8.0
44 3 4 7 5.5
45 3 5 9 5.5
46 3 6 3 4.0
47 3 7 7 7.0
我有这样的数据:(实际 DataFrame 结构的较小版本)
week day val
1 0 8
1 1 9
1 2 6
1 3 3
1 4 4
1 5 2
1 6 6
1 7 9
2 0 3
2 1 1
2 2 2
2 3 6
2 4 8
2 5 9
2 6 6
2 7 3
3 0 4
3 1 2
3 2 6
3 3 7
3 4 4
3 5 2
3 6 5
3 7 7
1 0 1
1 1 2
1 2 6
1 3 8
1 4 9
1 5 1
1 6 7
1 7 4
2 0 2
2 1 1
2 2 2
2 3 6
2 4 8
2 5 9
2 6 1
2 7 7
3 0 4
3 1 2
3 2 8
3 3 9
3 4 7
3 5 9
3 6 3
3 7 7
而且,我想使用 "week" 和 "day" 作为组键。就像我在下面所做的一样:
data.loc[:,wd_val] = data.groupby([data['week'],data['day']]).mean()
我收到一个错误:
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long
long'
所以,(1) "longlong" 是什么意思?
其次,我添加参数as_index:
data.loc[:,'wd_val']=
data[['val']].groupby([data['week'],data['day']],as_index=False).mean()
data
但是,"wd_val" 的值是 NaN:
week day val wd_val
0 1 0 8 NaN
1 1 1 9 NaN
2 1 2 6 NaN
3 1 3 3 NaN
(2)为什么我写错了?
第三,我使用下面的代码得到了一个数据框:
temp = data[['val']].groupby([data['week'],data['day']]).mean()
temp
val
week day
1 1 5.5
2 6.0
3 5.5
4 6.5
5 1.5
6 6.5
7 6.5
2 0 2.5
1 1.0
2 2.0
and ,我想将索引("week" 和 "day")切换到 DataFrame 的列中。我该怎么做?
输入:
data = pd.DataFrame([
[1,0,0],
[1,0,1],
[1,1,0],
[1,1,1],
[1,2,0],
[2,2,1],
[2,2,2],
[2,2,2]], columns=['week','day','val'])
尝试:
pd.merge(data, data.groupby(['week','day']).mean(),
on=['week', 'day'],
suffixes=('_orig', '_wk_mean'))
输出:
week day val_orig val_wk_mean
0 1 0 0 0.500000
1 1 0 1 0.500000
2 1 1 0 0.500000
3 1 1 1 0.500000
4 1 2 0 0.000000
5 2 2 1 1.666667
6 2 2 2 1.666667
7 2 2 2 1.666667
long long
是数据类型
- Python type long vs C 'long long'
类似的 sql 语句可能如下所示:
select A.week
, A.day
, A.val as val_orig
, B.val_wk_mean from data as A
join (
SELECT avg(val) as val_wk_mean
, week
, day
from data
group by week, day
) as B
on A.week=B.week
and A.day=B.day
另见:
IIUC,你觉得你需要用transform
df['wd_val'] = df.groupby(['week','day'])['val'].transform('mean')
输出:
week day val wd_val
0 1 0 8 4.5
1 1 1 9 5.5
2 1 2 6 6.0
3 1 3 3 5.5
4 1 4 4 6.5
5 1 5 2 1.5
6 1 6 6 6.5
7 1 7 9 6.5
8 2 0 3 2.5
9 2 1 1 1.0
10 2 2 2 2.0
11 2 3 6 6.0
12 2 4 8 8.0
13 2 5 9 9.0
14 2 6 6 3.5
15 2 7 3 5.0
16 3 0 4 4.0
17 3 1 2 2.0
18 3 2 6 7.0
19 3 3 7 8.0
20 3 4 4 5.5
21 3 5 2 5.5
22 3 6 5 4.0
23 3 7 7 7.0
24 1 0 1 4.5
25 1 1 2 5.5
26 1 2 6 6.0
27 1 3 8 5.5
28 1 4 9 6.5
29 1 5 1 1.5
30 1 6 7 6.5
31 1 7 4 6.5
32 2 0 2 2.5
33 2 1 1 1.0
34 2 2 2 2.0
35 2 3 6 6.0
36 2 4 8 8.0
37 2 5 9 9.0
38 2 6 1 3.5
39 2 7 7 5.0
40 3 0 4 4.0
41 3 1 2 2.0
42 3 2 8 7.0
43 3 3 9 8.0
44 3 4 7 5.5
45 3 5 9 5.5
46 3 6 3 4.0
47 3 7 7 7.0