Python、Pandas - 将函数应用于数据框中的列以仅替换某些项目

Question

我有一本我们的系统（出于某种原因）应用于数据的一些城市名称缩写的字典（即 'Kansas City' 缩写为 'Kansas CY'，俄克拉荷马城拼写正确）。

我在将我的函数应用于数据框的列时遇到问题，但是当我传入数据字符串时它可以正常工作。下面的代码示例：

def multiple_replace(text, dict):
  # Create a regular expression  from the dictionary keys
  regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

  # For each match, look-up corresponding value in dictionary
  return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

testDict = {"Kansas CY": "Kansas City"}

dfData['PREV_CITY'] = dfData['PREV_CITY'].apply(multiple_replace, dict=testDict)

当我在最后一行添加 'axis=1' 时，它会出错，说我提供了太多的参数。否则，它运行时不会出错，只是在与字典匹配时不会进行更改。

提前致谢！ -里斯

Answer 1

您可以在匹配前先使用 map and pass a dict to replace exact matches against the dict keys with the dict values, as you may have case-sensitive matches I'd lower 所有字符串：

dfData['PREV_CITY'] = dfData['PREV_CITY'].str.lower().map(testDict, na_action='ignore')

这假设您的字典中的键也是小写的

Python、Pandas - 将函数应用于数据框中的列以仅替换某些项目

Python, Pandas - Issue applying function to a column in a dataframe to replace only certain items

python

pandas

data-munging