比较列表并获取 python 中的索引

Question

我有一个数据框 A['name', 'frequency'] 和一个 'name' 的列表 B 都很长。 B 是我每天都得到的较小的一个。我必须检查 DataFrame 中是否存在 'name' 的 B 元素。 A['name'] 如果存在，我必须在每次出现在 B 中时更新数据框中 'name' 的频率，如果 B 有一些新元素，我必须将其添加为 DataFrame A 中的新行频率为 1。我必须在 python 2.7 中执行此操作。谢谢 A是我的mac_list这样

mac_list.iloc[0:6]
Out[59]: 
mac_address  frequency
0  20c9d0892feb          2
1  28e34789c4c2          1
2  3480b3d51d5f          1
3  4480ebb4e28c          1
4  4c60de5dad72          1
5  4ca56dab4550          1

B 是我的新人_mac_list 像这样

['20c9d0892feb' '3480b3d51d5f' '20c9d0892feb' '249cji39fj4g']

我想要 mac_list 的输出，例如

mac_address  frequency
0  20c9d0892feb          4
1  28e34789c4c2          1
2  3480b3d51d5f          2
3  4480ebb4e28c          1
4  4c60de5dad72          1
5  4ca56dab4550          1
6  249cji39fj4g          1

我试过了

b = mac_list['mac_address'].isin(new_mac_list)
b=list(b)
for i in range(len(b)):
    if b[i]==True:
        mac_list['frequency'].iloc[i]+=1

更新频率，但问题是频率增加了 1，即使它在 new_mac_list

中出现超过 1

我用它来插入新元素

c = new_mac_list.isin(mac_list['mac_address'])
c=list(c)
    for i in range(len(c)):
        if c[i]==False:
            mac_list.append(new_mac_list[i],1)

但这是一种非常低效的方式，我想它可以通过只比较一次来完成。

Answer 1

这是初始数据帧：

mac_list

    mac_address  frequency
0  20c9d0892feb          2
1  28e34789c4c2          1
2  3480b3d51d5f          1
3  4480ebb4e28c          1
4  4c60de5dad72          1
5  4ca56dab4550          1

以及新列表：

new_mac_list = ['20c9d0892feb', '3480b3d51d5f', '20c9d0892feb', '249cji39fj4g']

我首先将 mac_list 的索引设置为 mac_address:

mac_list = mac_list.set_index("mac_address")

然后计算新列表中的频率：

new_freq = pd.Series(new_mac_list).value_counts()

然后您可以在系列上使用 add 方法：

res = mac_list["frequency"].add(new_freq, fill_value=0)

20c9d0892feb    4.0
249cji39fj4g    1.0
28e34789c4c2    1.0
3480b3d51d5f    2.0
4480ebb4e28c    1.0
4c60de5dad72    1.0
4ca56dab4550    1.0
dtype: float64

回到原来的格式：

mac_list = pd.DataFrame(res, columns = ["frequency"])
print(mac_list)

              frequency
20c9d0892feb        4.0
249cji39fj4g        1.0
28e34789c4c2        1.0
3480b3d51d5f        2.0
4480ebb4e28c        1.0
4c60de5dad72        1.0
4ca56dab4550        1.0

Answer 2

创建索引

如果说效率，首先想到的应该是索引。我假设 mac 个地址是唯一的。

A = A.set_index("mac_address")

并访问项目

A.loc[i]

B 上的迭代具有次要的相关性

比较列表并获取 python 中的索引

Comparing List and get indices in python

python

python-2.7

pandas

anaconda

创建索引