尝试定义用于为应用程序创建价格桶的函数时出错

Error while trying to define a function to create price bucket for apps

我有一个 csv 数据集,我在 Jupyter 中导入并存储在 inp0 下。 我正在尝试使用 .loc 函数为这些创建价格桶 pandas 打赌低于错误。

我的代码:

inp0.loc[inp0.price==0.00, 'Price_Bucket'] = 'Free App'
inp0.loc[[inp0.price>0.00 and inp0.price<3.00],'Price_Bucket'] = 'Apps that cost <3'
inp0.loc[[inp0.price>=3.00 and inp0.price<5.00],'Price_Bucket'] = 'Apps that cost <5'
inp0.loc[inp0.price>=5.00,'Price_Bucket'] = 'Apps that cost >=5'
inp0.price_bucket.value_counts()

ValueError:Series 的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

如何解决?

尝试使用 np.where,它的工作方式与 columns/vectors 中的 if else 类似:

import numpy as np
inp0['Price_Bucket'] = np.where(inp0['price']==0.00, 'Free App', np.where(inp0['price']<3.00, 'Apps that cost <3', np.where(inp0['price']<5.00, 'Apps that cost <5', 'Apps that cost >=5')))

您可以使用 pandas cut 函数,而不是编写多个 ifelsenp.where 条件:

import pandas as pd
import numpy as np
import math

bins_defined = [0, 0.000001, 3, 5, math.inf] ## price = 0 --> 'Free APP' that's why i've selected the first two interval in a tricky way
labels_defined = ['Free App', 'Apps that cost <3', 'Apps that cost <5', 'Apps that cost >=5']

inp0['Price_Bucket'] = pd.cut(inp0['price'], bins = bins_defined, labels = label_defined, right = False)

#  `right` Indicates whether bins includes the rightmost edge or not. 
# If right == True (the default), then the bins [1, 2, 3, 4] indicate (1,2], (2,3], (3,4].

为了更好地理解,请参阅 pandas.cut 文档