pandas：根据多个其他列的值创建一个新列

Question

我有以下 pandas 数据框 my_df:

col_A       col_B  
-------------------
blue        medium 
red         small
yellow      big

我想根据以下条件添加一个新的col_C：

if col_A == 'blue', col_C = 'A_blue'
if col_B == 'big', col_C = 'B_big'

For all other cases, col_C = ''

为此，我做了以下工作：

def my_bad_data(row):
    if row['col_A'] == 'blue':
        return 'A_blue'
    elif row['col_B'] == 'big':
        return 'B_big'
    else:
        return ''

my_df['col_C'] = my_df.apply(lambda row: my_bad_data(row))

但是我得到了以下错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8125)()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-20-3898742c4378> in <module>()
----> 1 my_df['col_C'] = my_df.apply(lambda row: my_bad_data(row))
      2 asset_df

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4161                     if reduce is None:
   4162                         reduce = True
-> 4163                     return self._apply_standard(f, axis, reduce=reduce)
   4164             else:
   4165                 return self._apply_broadcast(f, axis)

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4257             try:
   4258                 for i, v in enumerate(series_gen):
-> 4259                     results[i] = func(v)
   4260                     keys.append(v.name)
   4261             except Exception as e:

<ipython-input-20-3898742c4378> in <lambda>(row)
----> 1 asset_df['quality_flag'] = my_df.apply(lambda row: my_bad_data(row))
      2 my_df

<ipython-input-19-2a09810e2dd4> in my_bad_data(row)
      1 def bug_function(row):
----> 2     if row['col_A'] == 'blue':
      3         return 'A_blue'
      4     elif row['col_B'] == 'big':
      5         return 'B_big'

/usr/local/lib/python3.4/dist-packages/pandas/core/series.py in __getitem__(self, key)
    599         key = com._apply_if_callable(key, self)
    600         try:
--> 601             result = self.index.get_value(self, key)
    602 
    603             if not is_scalar(result):

/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_value(self, series, key)
   2167         try:
   2168             return self._engine.get_value(s, k,
-> 2169                                           tz=getattr(series.dtype, 'tz', None))
   2170         except KeyError as e1:
   2171             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3342)()

pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3045)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4094)()

KeyError: ('col_A', 'occurred at index id')

知道我在这里做错了什么吗？谢谢！

Answer 1

是啊，我运行成这个半常，你要dataframe.apply(func, axis=1)。请参阅文档 here:

axis : {0 or ‘index’, 1 or ‘columns’}, default 0
    0 or ‘index’: apply function to each column
    1 or ‘columns’: apply function to each row

pandas：根据多个其他列的值创建一个新列

pandas: create a new column based on the values of multiple other columns

lambda

apply

python-3.x

pandas