python 对列值进行分类时出现性能问题

Question

这个问题和我之前的问题密切相关：
不好意思又要问了！

下面的代码是运行并提供了正确的结果，但它又有点慢（80K 行需要 4 分钟）。我在使用 pandas 中的系列 class 获取具体值时遇到问题。有人可以推荐我如何 class 化这些列吗？

未能在纪录片中找到相关信息：
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html

运行代码：

# p_test_SOLL_test_D10

for x in range (0,len(tableContent[6])):
    var = tableContent[6].loc[x, ('p_test_LAENGE')]
    if float(tableContent[6].loc[x, ('p_test_LAENGE')])>=100.0:
        tableContent[6].loc[x, ('p_test_LAENGE')]='yes'
    elif (float(tableContent[6].loc[x, ('p_test_LAENGE')]) <30.0 and float(tableContent[6].loc[x, ('p_test_LAENGE')]) >= 10):
        tableContent[6].loc[x, ('p_test_LAENGE')]='yes2'
    elif (float(tableContent[6].loc[x, ('p_test_LAENGE')]) <10.0 and float(tableContent[6].loc[x, ('p_test_LAENGE')]) >= 5):
        tableContent[6].loc[x, ('p_test_LAENGE')]='yes3'
    else:
        tableContent[6].loc[x, ('p_test_LAENGE')]='no'

print (tableContent[6]['p_test_LAENGE'])

尝试系列：

if tableContent[6]['p_test_LAENGE'].astype(float) >=100.0:
    tableContent[6]['p_test_LAENGE']='yes'
elif (tableContent[6]['p_test_LAENGE'].astype(float) <30.0 and tableContent[6]['p_test_LAENGE'].astype(float) >= 10):
    tableContent[6]['p_test_LAENGE']='yes1'
elif (tableContent[6]['p_test_LAENGE'].astype(float) <10.0 and tableContent[6]['p_test_LAENGE'].astype(float) >= 5):
    tableContent[6]['p_test_LAENGE']='yes2'
else:
    tableContent[6]['p_test_LAENGE']='no'


print (tableContent[6]['p_test_LAENGE'])

Answer 1

我没有你的df来测试所以你需要修改下面的代码。假设 df 的最小值大于 10e-7 而 df 的最大值小于 10e7

bin = [10e-7,5,10,30,100,10e7]
label = ['no','yes2','yes1','no','yes']
df['p_test_LAENGE_class'] = pd.cut(df['p_test_LAENGE'], bins=bin, labels=label)

希望对您有所帮助

python 对列值进行分类时出现性能问题

python performance problems while classifing column values

python

pandas

data-science-experience

运行代码：

尝试系列：