pandas 从现有列值创建新列

Question

我希望根据现有列中的值创建一个新列。如果现有列以 'abc' 或 'def' 开头，则将新列设置为 'x'。否则将其设置为 'y'.

检查不区分大小写。

我有一个看起来像这样的东西 –

import pandas as pd

df = pd.DataFrame({'command': ['abc123', 'abcdef', 'hold',
                               'release', 'hold', 'abcxyz',
                               'kill', 'def123', 'hold'],
                   'name': ['fred', 'wilma', 'barney',
                            'fred', 'barney', 'betty',
                            'pebbles', 'dino', 'wilma'],
                   'date': ['2020-05', '2020-05', '2020-05',
                            '2020-06', '2020-06', '2020-06',
                            '2020-07', '2020-07', '2020-07']})

有打印-

   command     date     name
0   abc123  2020-05     fred
1   abcdef  2020-05    wilma
2     hold  2020-05   barney
3  release  2020-06     fred
4     hold  2020-06   barney
5   abcxyz  2020-06    betty
6     kill  2020-07  pebbles
7   def123  2020-07     dino
8     hold  2020-07    wilma

我想要这样的东西 -

  command     date     name   status
0  abc123  2020-05     fred        x
1  abcdef  2020-05    wilma        x
2    hold  2020-05   barney        y
3     CHG  2020-06     fred        y
4    hold  2020-06   barney        y
5  abcxyz  2020-06    betty        x
6    kill  2020-07  pebbles        y
7  def123  2020-07     dino        x
8    hold  2020-07    wilma        y

如果值等于 -

，我可以使用以下内容来工作

def source(row):
    if row['command'] == 'abcdef':
        return 'x'
    else:
        return 'y'


# Apply the results from the above Function
df['source'] = df.apply(source, axis=1)

然而，命令值可以是任何值，我无法对所有可能性进行硬编码搜索。

我不知道如何使用 startswith 让它工作。

Answer 1

对条件列使用Series.str.startswith和np.where：

m = df['command'].str.startswith(('abc', 'def'))
df['status'] = np.where(m, 'x', 'y')

或者只对前 3 个字符进行字符串切片并使用 Series.isin:

m = df['command'].str[:3]
m = m.isin(['abc', 'def'])

df['status'] = np.where(m, 'x', 'y')

   command     name     date status
0   abc123     fred  2020-05      x
1   abcdef    wilma  2020-05      x
2     hold   barney  2020-05      y
3  release     fred  2020-06      y
4     hold   barney  2020-06      y
5   abcxyz    betty  2020-06      x
6     kill  pebbles  2020-07      y
7   def123     dino  2020-07      x
8     hold    wilma  2020-07      y

pandas 从现有列值创建新列

pandas create new column from existing column values

startswith

pandas