用另一个值更改数据框中值的第一部分

Change first section of value in dataframe with another value

我正在尝试使用 lambda 函数替换数据框中值的第一部分。

目标:

到目前为止我写了这段代码。我收到一条错误消息“TypeError: string indices must be integers”,我认为这是来自 loc. 函数。

df['Resource_Name'] = df['Resource_Name'].apply(
                                                lambda x: x.split()[2] + ", " + x['Preferred_Name']
                                                if x['Preferred_Name'] == pd.isnull(df.loc[x, 'Preferred_Name'])
                                                else x['Resource_Name']
                                            )
df = df.drop(['Preferred_Name'], axis = 1)

视觉图示:

当前显示:

Objective显示:

感谢您的协助!

您可以使用 .str.split() 以逗号分隔,取第一部分,然后添加 Prefered_Name 的值,其中 Prefered_Name 不是空字符串:

new_data = df['Resource_Name'].str.split(r',\s+').str[0] + ', ' + df['Prefered_Name']
df.loc[df['Prefered_Name'] != '', 'Resource_Name'] = new_data
df = df.drop('Prefered_Name', axis=1)

输出:

>>> df
Resource_name
0     Lewis, Calvin
1     Lewis, Calvin
2  Lewis, William C
3     Lewis, Calvin
4     Lewis, Calvin

我是这样得出结论的

import pandas as pd
import numpy as np

data = {
    'Resource_Name' : ['Will, Turner C', 'Will, William', 'Will, Williamson', 'Will, Will iam son'],
    'Prefered_Name' : [None, 'Bill', None, 'Billy']
}

df = pd.DataFrame(data)
condition_list = [df['Prefered_Name'].values != None]
choice_list = [df['Resource_Name'].apply(lambda x : x.split(',')[0]) + ', ' + df['Prefered_Name']]
df['Resource_Name'] = np.select(condition_list, choice_list, df['Resource_Name'])
df = df[['Resource_Name']]
df

这是一个解决方案:

df['Resource_Name'] = df[['Resource_Name', 'Preferred_Name']].apply(
    lambda x: f'{x[0].split(",")[0]}, {x[1]}' if x[1] else x[0], axis=1
)
df.drop('Preferred_Name', axis=1, inplace=True)

输出:

>>> df
      Resource_Name
0     Lewis, Calvin
1     Lewis, Calvin
2  Lewis, William C
3     Lewis, Calvin
4     Lewis, Calvin
df.loc[~df['Preferred_Name'].isna(), 'Resource_Name'] = \
    df[~df['Preferred_Name'].isna()]['Resource_Name'].str.extract('(.*,)').squeeze() + ' ' + df['Preferred_Name']

df.drop('Preferred_Name', axis=1)

      Resource_Name
0     Lewis, Calvin
1     Lewis, Calvin
2  Lewis, William C
3     Lewis, Calvin
4     Lewis, Calvin