查找 pandas 数据框的分组行的平均值

Question

我处于 python 的非常基础的水平。在这里我遇到了一个问题，有人可以帮我吗？我有一个大的 pandas 数据框，我想找到行并做意思，如果每行的第一列有一些相似的值（例如：由另一个整数“_”分隔的一些整数）。

我尝试使用 .split 来匹配列表的第一个数字，它适用于单行但如果我迭代行，它会抛出错误。我的数据框看起来像：

d = {'ID' : pd.Series(['1_1', '2_1', '1_2', '2_2' ], index=['0','1','2', '3']),
     'one' : pd.Series([2.5, 2, 3.5, 2.5], index=['0','1', '2', '3']),
     'two' : pd.Series([1, 2, 3, 4], index=['0', '1', '2', '3'])}
df2 = pd.DataFrame(d)

要求：

拆分后第一个位置具有相似 ID 的行的平均值。前任。 1_1 和 1_2、2_1 和 2_2

的平均值

输出：

 ID  one  two
0  1  3    2
1  2  2.25 3

这是我的代码，工作版本：((df2.ix[0,0]).split('_'))[0]

错误版本：

 for i in df2.iterrows():
                   df2[df2.columns[((df2.ix[0,0]).split('_'))[0] == ((df2.ix[0,0]).split('_'))[0]]]

期待早日回复.. 提前致谢..

Answer 1

您可以使用 [str 方法](http://pandas.pydata.org/pandas-docs/stable/text.html#splitting-and-replacing-strings) and then usegroupby` 方法仅使用 ID 列的第一个数字创建新列：

df['groupedID'] = df.ID.str.split('_').str.get(0)

In [347]: df
Out[347]:
     ID  one  two groupedID
0  10_1  2.5    1        10
1   2_1  2.0    2         2
2  10_2  3.5    3        10
3   2_2  2.5    4         2

df1 = df.groupby('groupedID').mean()

In [349]: df1
Out[349]:
            one  two
groupedID
10         3.00    2
2          2.25    3

如果您需要将索引名称改回 'ID':

df1.index.name = 'ID'

In [351]: df1
Out[351]:
     one  two
ID
10   3.00    2
2   2.25    3

查找 pandas 数据框的分组行的平均值

Find mean of the grouped rows of pandas dataframe

python

rows

mean

pandas