将数据帧中的子字符串映射到 return 值作为新列

Question

如果我有一个邮政编码列，我希望能够将每行的子字符串关联到特定区域。我考虑过使用字典

dict = { 'SW1': 'London','NE':'London','W1A':'Other','CT': 'Other'}

Postal Code  
SW1E 5Z
NE99 1AR
SW1
W1A 1ER
CT21 4JF

想要table：

Postal Code   Region
SW1E 5Z       London
NE99 1AR      London
SW1           London
W1A 1ER       Other
CT21 4JF      Other

但是，我不知道如何解析列的子字符串以使用 python (pandas) 创建区域列。请就语法提出建议

Answer 1

使用 series.str.extract 基于字典键并将它们映射回以创建新列。

df['Region']=(df['Postal Code'].str.extract('('+'|'.join(mydict.keys())+')',expand=False)
                                                                           .map(mydict))
print(df)

  Postal Code  Region
0     SW1E 5Z  London
1    NE99 1AR  London
2         SW1  London
3     W1A 1ER   Other
4    CT21 4JF   Other

请注意，我已将 dict 重命名为 mydict，因为 dict 是一个内置变量，如果存储为变量，它将覆盖字典的行为。

Answer 2

我认为您可以通过 lambda 函数轻松找到城市：

dict_ = {'SW1': 'London','NE':'London','W1A':'Other','CT':'Other'}

firstpostal = 'SW1E'
secondpostal = 'abc'

findcountry = lambda postal: [dict_[i] for i in dict_.keys() if i in postal]


print(findcountry(firstpostal))
print(findcountry(secondpostal))

和输出：

['London']
[]

您可以查看输出列表是否找到城市。

将数据帧中的子字符串映射到 return 值作为新列

Mapping Substrings from dataframe to return values as a new column

python

dictionary

substring

partial

pandas