python groupby 使用与列表或字符串混合的变量的语法问题

Question

我正在尝试运行包含一个变量和一个字符串的 groupby 组合用作分组 fields/columns。有人可以帮助我了解语法吗，这是我可能需要一天时间才能弄清楚的事情之一。

Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'

所以下面的工作：

dfJoinsP2 = dfJoinsP2.groupby(Mix)['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()

但是当我尝试添加一个名为 'Period_Number' 的额外字段时，出现错误。

dfJoinsP2 = dfJoinsP2.groupby(Mix,'Period_Number')['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()

Answer 1

只是为了重现和说明您的问题：

In [22]:
# define our cols, create a dummy df
cols = ['business_unit','isoname','planning_channel','is_tracked','planning_partner','week','joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h', 'Period Number']
df = pd.DataFrame(columns=cols, data =np.random.randn(5, len(cols)))
df
Out[22]:
   business_unit   isoname  planning_channel  is_tracked  planning_partner  \
0      -0.818644  1.150678         -0.860677   -0.333496         -0.292689   
1       0.476575 -0.018507         -1.917119    0.360656          0.381106   
2       1.187570  1.105363          1.955066    0.154020          1.996389   
3       0.318762  0.962469          0.565538    0.671002         -0.675688   
4      -0.070671 -1.717793         -0.085815    0.089589          0.892412   

       week  joined_subs_cmap  initial_billed_subs  billed_d1  churn_d1  \
0 -0.681875          1.138119            -1.071672   0.409712 -1.066456   
1 -0.235040          0.559950             0.082890  -0.372671  0.804438   
2  1.707340          0.893437             0.316266   1.852508 -2.554488   
3 -2.055322          1.848388            -1.695563  -0.826089 -0.588229   
4 -0.325098          0.827455             0.535827  -0.930963  0.211628   

   churn_24h  Period Number  
0   1.067530       0.377579  
1   0.097042      -1.947681  
2  -0.327243      -1.137146  
3   0.230110       1.470183  
4   1.191042       2.167251  
In [23]:
# what you are trying to do
Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'
df.groupby(Mix, 'Period Number')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-dc75b3902303> in <module>()
      1 Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'
----> 2 df.groupby(Mix, 'Period Number')

C:\WinPython-64bit-3.3.3.2\python-3.3.3.amd64\lib\site-packages\pandas\core\generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze)
   2894         if level is None and by is None:
   2895             raise TypeError("You have to supply one of 'by' and 'level'")
-> 2896         axis = self._get_axis_number(axis)
   2897         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   2898                        sort=sort, group_keys=group_keys, squeeze=squeeze)

C:\WinPython-64bit-3.3.3.2\python-3.3.3.amd64\lib\site-packages\pandas\core\generic.py in _get_axis_number(self, axis)
    294                 pass
    295         raise ValueError('No axis named {0} for object type {1}'
--> 296                          .format(axis, type(self)))
    297 
    298     def _get_axis_name(self, axis):

ValueError: No axis named Period Number for object type <class 'pandas.core.frame.DataFrame'>

所以你得到一个 ValueError 因为 'Period Number' 被解释为一个 axis 值，这当然是无效的，而不是你想要的。

这里的另一点是，您定义 Mix 的方式将产生一个元组，如果它是一个列表，那么我们可以附加感兴趣的附加列，一切都会很好：

In [24]:

Mix = ['business_unit','isoname','planning_channel','is_tracked','planning_partner','week']
Mix.append('Period Number')
df.groupby(Mix)['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()
Out[24]:
   business_unit   isoname  planning_channel  is_tracked  planning_partner  \
0      -0.818644  1.150678         -0.860677   -0.333496         -0.292689   
1      -0.070671 -1.717793         -0.085815    0.089589          0.892412   
2       0.318762  0.962469          0.565538    0.671002         -0.675688   
3       0.476575 -0.018507         -1.917119    0.360656          0.381106   
4       1.187570  1.105363          1.955066    0.154020          1.996389   

       week  Period Number  joined_subs_cmap  initial_billed_subs  billed_d1  \
0 -0.681875       0.377579          1.138119            -1.071672   0.409712   
1 -0.325098       2.167251          0.827455             0.535827  -0.930963   
2 -2.055322       1.470183          1.848388            -1.695563  -0.826089   
3 -0.235040      -1.947681          0.559950             0.082890  -0.372671   
4  1.707340      -1.137146          0.893437             0.316266   1.852508   

   churn_d1  churn_24h  
0 -1.066456   1.067530  
1  0.211628   1.191042  
2 -0.588229   0.230110  
3  0.804438   0.097042  
4 -2.554488  -0.327243

python groupby 使用与列表或字符串混合的变量的语法问题

syntax issue with python groupby using a variable mixed with a list or string

python

syntax

group-by

pandas