如何使用 pandas 处理多索引数据

Question

我有一个这样的系列。

我想处理这个系列以获得每个 ip 的最大值 p。

结果：

ip 192.168.1.1 22 192.168.1.2 4 192.168.1.3 3 192.168.1.4 4

有什么方法可以轻松做到这一点？

Answer 1

您要找的是 pandas groupby clause: s.groupby(level=0).max()

示例：

iterables = [['192.168.1.1', '192.168.1.2', '192.168.1.3', '192.168.1.4'],
             ['123455', '123456', '123457']]
index = pd.MultiIndex.from_product(iterables, names=['ip', 'p'])
s = pd.Series(np.random.randint(30, size=12), index=index)
s

输出：

ip           p     

192.168.1.1  123455    18
             123456    20
             123457    12
192.168.1.2  123455    25
             123456     1
             123457     4
192.168.1.3  123455    28
             123456    19
             123457    22
192.168.1.4  123455    20
             123456    10
             123457    12

并获取每个 IP 的最大值：

s.groupby(level=0).max()

输出：

ip
192.168.1.1    20
192.168.1.2    25
192.168.1.3    28
192.168.1.4    20

编辑： 已从 s.groupby['ip'].max() 更改为 s.groupby(level=0).max()，因为我所做的一些测试没有起作用

Answer 2

如果 ip 在索引中（在第一个位置），您应该使用此语法。

s.groupby(level=0).max()

# ip
# 192.168.1.1    22
# 192.168.1.2     4
# 192.168.1.3     3
# 192.168.1.4     4
# Name: p, dtype: int64

如何使用 pandas 处理多索引数据

How to process data with multiindex using pandas

python

machine-learning

feature-extraction

pandas