是否有一种计算成本较低的方法来替换数据框中的名称？

Question

我只想将“-PD”添加到大数据帧（约 600 万行）中满足特定条件的名称子集。

我制作了一个我想要更改的名字的 bool 列表：

bool_list = df_geo_clean_tag['index'].isin(lessorequalto0_ids)

然后我循环遍历 bool_list 为真的数据框以找到应替换名称的索引：

for row in range(len(df_geo_clean_tag)):
    if bool_list[row] == True:
        name = df_geo_clean_tag.iloc[row, 0]
        df_geo_clean_tag.iloc[row, 0] = name + '-PD'

但这需要很长时间（目前已经等了一个多小时）。有没有计算成本更低的方法来做到这一点？

Answer 1

这应该比循环快得多：

df_geo_clean_tag.loc[bool_list, df.columns[0]] += '-PD'

此代码 df_geo_clean_tag 按 bool_list 过滤行，然后在修改过滤后的字符串之前提取第一列。

是否有一种计算成本较低的方法来替换数据框中的名称？

Is there a less computationally expensive method to replace names in a dataframe?

python

string

indexing

performance

list-comprehension