
pandas: replacing the last word in string column with values from list


df = pd.DataFrame({'data1':['the weather is nice today','This is interesting','the weather is good'],
             'data2':['It is raining','The plant is greenery','the weather is sunnyday']})

my_list = ['sunny','green']

如果最后一个单词以列表中的单词开头,我想用 my_list 替换 data2 列中的最后一个单词。所以,这就是我所做的,

for k in ke:
    for val in df2.data2:
        if val.split()[-1].startswith(k):
            print(val.replace(val.split()[-1], k))

但是当我打印出来时,顺序受列表中顺序的影响,我不知道如何将它们分配回相同的 所需的输出是,

     data1                      data2
0  the weather is nice today    It is raining
1  This is interesting          The plant is green
2  the weather is good          the weather is sunny


pat = re.compile(f"\b({'|'.join(my_list)})\S+$")
dfnew = df.assign(data2=df['data2'].str.replace(pat, r'', regex=True))

>>> dfnew
                       data1                 data2
0  the weather is nice today         It is raining
1        This is interesting    The plant is green
2        the weather is good  the weather is sunny

Pierre D 的回答非常好。正则表达式和 .str.replace 似乎是合适的工具。

如果你想用新值替换 df 的列,你可以使用 assign 或简单地 =.

您可以使用 apply 将任何函数应用于列的每个值。


def replace_last_word(val, prefixes=['sunny', 'green']):
    rest, last_word = val.rsplit(' ', 1)
    for prefix in prefixes:
        if last_word.startswith(prefix):
            return f"{rest} {prefix}"
    return val

df['data2'] = df['data2'].apply(replace_last_word)

# or df = df.assign(data2=df['data2'].apply(replace_last_word))

请注意,您必须决定如何处理彼此包含的前缀,例如 ["sun", "sunny"]。此解决方案将选择第一个匹配项。


df['data2'].str.rsplit(' ',n=1, expand=True)会给你

                0         1
0           It is   raining
1    The plant is  greenery
2  the weather is  sunnyday