python groupby 使用与列表或字符串混合的变量的语法问题
syntax issue with python groupby using a variable mixed with a list or string
我正在尝试 运行 包含一个变量和一个字符串的 groupby 组合用作分组 fields/columns。有人可以帮助我了解语法吗,这是我可能需要一天时间才能弄清楚的事情之一。
Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'
所以下面的工作:
dfJoinsP2 = dfJoinsP2.groupby(Mix)['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()
但是当我尝试添加一个名为 'Period_Number' 的额外字段时,出现错误。
dfJoinsP2 = dfJoinsP2.groupby(Mix,'Period_Number')['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()
只是为了重现和说明您的问题:
In [22]:
# define our cols, create a dummy df
cols = ['business_unit','isoname','planning_channel','is_tracked','planning_partner','week','joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h', 'Period Number']
df = pd.DataFrame(columns=cols, data =np.random.randn(5, len(cols)))
df
Out[22]:
business_unit isoname planning_channel is_tracked planning_partner \
0 -0.818644 1.150678 -0.860677 -0.333496 -0.292689
1 0.476575 -0.018507 -1.917119 0.360656 0.381106
2 1.187570 1.105363 1.955066 0.154020 1.996389
3 0.318762 0.962469 0.565538 0.671002 -0.675688
4 -0.070671 -1.717793 -0.085815 0.089589 0.892412
week joined_subs_cmap initial_billed_subs billed_d1 churn_d1 \
0 -0.681875 1.138119 -1.071672 0.409712 -1.066456
1 -0.235040 0.559950 0.082890 -0.372671 0.804438
2 1.707340 0.893437 0.316266 1.852508 -2.554488
3 -2.055322 1.848388 -1.695563 -0.826089 -0.588229
4 -0.325098 0.827455 0.535827 -0.930963 0.211628
churn_24h Period Number
0 1.067530 0.377579
1 0.097042 -1.947681
2 -0.327243 -1.137146
3 0.230110 1.470183
4 1.191042 2.167251
In [23]:
# what you are trying to do
Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'
df.groupby(Mix, 'Period Number')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-dc75b3902303> in <module>()
1 Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'
----> 2 df.groupby(Mix, 'Period Number')
C:\WinPython-64bit-3.3.3.2\python-3.3.3.amd64\lib\site-packages\pandas\core\generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze)
2894 if level is None and by is None:
2895 raise TypeError("You have to supply one of 'by' and 'level'")
-> 2896 axis = self._get_axis_number(axis)
2897 return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
2898 sort=sort, group_keys=group_keys, squeeze=squeeze)
C:\WinPython-64bit-3.3.3.2\python-3.3.3.amd64\lib\site-packages\pandas\core\generic.py in _get_axis_number(self, axis)
294 pass
295 raise ValueError('No axis named {0} for object type {1}'
--> 296 .format(axis, type(self)))
297
298 def _get_axis_name(self, axis):
ValueError: No axis named Period Number for object type <class 'pandas.core.frame.DataFrame'>
所以你得到一个 ValueError 因为 'Period Number' 被解释为一个 axis 值,这当然是无效的,而不是你想要的。
这里的另一点是,您定义 Mix 的方式将产生一个元组,如果它是一个列表,那么我们可以附加感兴趣的附加列,一切都会很好:
In [24]:
Mix = ['business_unit','isoname','planning_channel','is_tracked','planning_partner','week']
Mix.append('Period Number')
df.groupby(Mix)['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()
Out[24]:
business_unit isoname planning_channel is_tracked planning_partner \
0 -0.818644 1.150678 -0.860677 -0.333496 -0.292689
1 -0.070671 -1.717793 -0.085815 0.089589 0.892412
2 0.318762 0.962469 0.565538 0.671002 -0.675688
3 0.476575 -0.018507 -1.917119 0.360656 0.381106
4 1.187570 1.105363 1.955066 0.154020 1.996389
week Period Number joined_subs_cmap initial_billed_subs billed_d1 \
0 -0.681875 0.377579 1.138119 -1.071672 0.409712
1 -0.325098 2.167251 0.827455 0.535827 -0.930963
2 -2.055322 1.470183 1.848388 -1.695563 -0.826089
3 -0.235040 -1.947681 0.559950 0.082890 -0.372671
4 1.707340 -1.137146 0.893437 0.316266 1.852508
churn_d1 churn_24h
0 -1.066456 1.067530
1 0.211628 1.191042
2 -0.588229 0.230110
3 0.804438 0.097042
4 -2.554488 -0.327243
我正在尝试 运行 包含一个变量和一个字符串的 groupby 组合用作分组 fields/columns。有人可以帮助我了解语法吗,这是我可能需要一天时间才能弄清楚的事情之一。
Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'
所以下面的工作:
dfJoinsP2 = dfJoinsP2.groupby(Mix)['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()
但是当我尝试添加一个名为 'Period_Number' 的额外字段时,出现错误。
dfJoinsP2 = dfJoinsP2.groupby(Mix,'Period_Number')['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()
只是为了重现和说明您的问题:
In [22]:
# define our cols, create a dummy df
cols = ['business_unit','isoname','planning_channel','is_tracked','planning_partner','week','joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h', 'Period Number']
df = pd.DataFrame(columns=cols, data =np.random.randn(5, len(cols)))
df
Out[22]:
business_unit isoname planning_channel is_tracked planning_partner \
0 -0.818644 1.150678 -0.860677 -0.333496 -0.292689
1 0.476575 -0.018507 -1.917119 0.360656 0.381106
2 1.187570 1.105363 1.955066 0.154020 1.996389
3 0.318762 0.962469 0.565538 0.671002 -0.675688
4 -0.070671 -1.717793 -0.085815 0.089589 0.892412
week joined_subs_cmap initial_billed_subs billed_d1 churn_d1 \
0 -0.681875 1.138119 -1.071672 0.409712 -1.066456
1 -0.235040 0.559950 0.082890 -0.372671 0.804438
2 1.707340 0.893437 0.316266 1.852508 -2.554488
3 -2.055322 1.848388 -1.695563 -0.826089 -0.588229
4 -0.325098 0.827455 0.535827 -0.930963 0.211628
churn_24h Period Number
0 1.067530 0.377579
1 0.097042 -1.947681
2 -0.327243 -1.137146
3 0.230110 1.470183
4 1.191042 2.167251
In [23]:
# what you are trying to do
Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'
df.groupby(Mix, 'Period Number')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-dc75b3902303> in <module>()
1 Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'
----> 2 df.groupby(Mix, 'Period Number')
C:\WinPython-64bit-3.3.3.2\python-3.3.3.amd64\lib\site-packages\pandas\core\generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze)
2894 if level is None and by is None:
2895 raise TypeError("You have to supply one of 'by' and 'level'")
-> 2896 axis = self._get_axis_number(axis)
2897 return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
2898 sort=sort, group_keys=group_keys, squeeze=squeeze)
C:\WinPython-64bit-3.3.3.2\python-3.3.3.amd64\lib\site-packages\pandas\core\generic.py in _get_axis_number(self, axis)
294 pass
295 raise ValueError('No axis named {0} for object type {1}'
--> 296 .format(axis, type(self)))
297
298 def _get_axis_name(self, axis):
ValueError: No axis named Period Number for object type <class 'pandas.core.frame.DataFrame'>
所以你得到一个 ValueError 因为 'Period Number' 被解释为一个 axis 值,这当然是无效的,而不是你想要的。
这里的另一点是,您定义 Mix 的方式将产生一个元组,如果它是一个列表,那么我们可以附加感兴趣的附加列,一切都会很好:
In [24]:
Mix = ['business_unit','isoname','planning_channel','is_tracked','planning_partner','week']
Mix.append('Period Number')
df.groupby(Mix)['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()
Out[24]:
business_unit isoname planning_channel is_tracked planning_partner \
0 -0.818644 1.150678 -0.860677 -0.333496 -0.292689
1 -0.070671 -1.717793 -0.085815 0.089589 0.892412
2 0.318762 0.962469 0.565538 0.671002 -0.675688
3 0.476575 -0.018507 -1.917119 0.360656 0.381106
4 1.187570 1.105363 1.955066 0.154020 1.996389
week Period Number joined_subs_cmap initial_billed_subs billed_d1 \
0 -0.681875 0.377579 1.138119 -1.071672 0.409712
1 -0.325098 2.167251 0.827455 0.535827 -0.930963
2 -2.055322 1.470183 1.848388 -1.695563 -0.826089
3 -0.235040 -1.947681 0.559950 0.082890 -0.372671
4 1.707340 -1.137146 0.893437 0.316266 1.852508
churn_d1 churn_24h
0 -1.066456 1.067530
1 0.211628 1.191042
2 -0.588229 0.230110
3 0.804438 0.097042
4 -2.554488 -0.327243