Pandas:使用从另一列获取一个参数的函数创建新列时出错
Pandas: error when creating a new column using a function that takes one argument from another column
我有以下数据框df
:
df = pd.DataFrame({'result' : ['s17h10e7', 's5e3h2S105h90e15',
's17H10e7S5e3H2s105h90e15'],
'status' : [102, 117, 205]})
result status
s17h10e7 102
s5e3h2S105h90e15 117
s17H10e7S5e3H2s105h90e15 205
我有一个名为 get_number_after_code
的函数,它读取一个字符串和 returns 紧跟在用户定义代码之后的任何数字的 SUM(例如一封信):
def get_number_after_code(string_to_read, code):
code_indices = [i for i, char in enumerate(string_to_read) if char == code]
joined_numbers = []
list_of_int_values = []
for idx in code_indices:
temp_number = []
for character in string_to_read[idx + 1: ]:
if not character.isdigit():
break
else:
temp_number.append(character)
joined_numbers = ''.join(temp_number)
list_of_int_values.append(int(joined_numbers))
return sum(list_of_int_values)
示例:
get_number_after_code('s5e3h2s105h90e15', 'h')
>> 92
get_number_after_code('s5e3h2s105h90e15', 's')
>> 105
我想将名为 col_NEW
的列添加到 df
数据框。此 col_NEW
列将显示 get_number_after_code()
函数的输出,因为它应用于 result
列中的行元素。例如,假设我们使用代码 'h'(但它可以是 's' 或 'e')。输出将是:
result status col_NEW
s17h10e7 102 10
s5e3h2s105h90e15 117 92
s17h10e7s5e3h2s105h807e15 205 819
为此,我使用:
df['col_NEW'] = df.apply(get_number_after_code(df['result'], 'h'), axis=1)
我得到这个不太有用 AssertionError
:
AssertionError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_21060/915445793.py in <module>
----> 1 df['col_NEW'] = df.apply(count_tests_new(df['result'], 's'), axis=1)
~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwargs)
8738 kwargs=kwargs,
8739 )
-> 8740 return op.apply()
8741
8742 def applymap(
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply(self)
686 return self.apply_raw()
687
--> 688 return self.apply_standard()
689
690 def agg(self):
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
810
811 def apply_standard(self):
--> 812 results, res_index = self.apply_series_generator()
813
814 # wrap results
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
816
817 def apply_series_generator(self) -> tuple[ResType, Index]:
--> 818 assert callable(self.f)
819
820 series_gen = self.series_generator
AssertionError:
我使用 .apply()
语法 是否正确地添加了 col_NEW
?如果是,有谁知道是什么原因造成的AssertionError
?
您在每一行上调用 get_number_after_code
,但将一个 Series 对象传递给它。由于您似乎只需要“结果”列,因此请在该列上使用 apply
。此外,您可以将字母(例如“h”)作为位置参数传递。见 docs:
df['col_NEW'] = df['result'].apply(get_number_after_code, args=('h',))
或通过关键字:
df['col_NEW'] = df['result'].apply(get_number_after_code, code='h')
输出:
result status col_NEW
0 s17h10e7 102 10
1 s5e3h2S105h90e15 117 92
2 s17H10e7S5e3H2s105h90e15 205 90
我有以下数据框df
:
df = pd.DataFrame({'result' : ['s17h10e7', 's5e3h2S105h90e15',
's17H10e7S5e3H2s105h90e15'],
'status' : [102, 117, 205]})
result status
s17h10e7 102
s5e3h2S105h90e15 117
s17H10e7S5e3H2s105h90e15 205
我有一个名为 get_number_after_code
的函数,它读取一个字符串和 returns 紧跟在用户定义代码之后的任何数字的 SUM(例如一封信):
def get_number_after_code(string_to_read, code):
code_indices = [i for i, char in enumerate(string_to_read) if char == code]
joined_numbers = []
list_of_int_values = []
for idx in code_indices:
temp_number = []
for character in string_to_read[idx + 1: ]:
if not character.isdigit():
break
else:
temp_number.append(character)
joined_numbers = ''.join(temp_number)
list_of_int_values.append(int(joined_numbers))
return sum(list_of_int_values)
示例:
get_number_after_code('s5e3h2s105h90e15', 'h')
>> 92
get_number_after_code('s5e3h2s105h90e15', 's')
>> 105
我想将名为 col_NEW
的列添加到 df
数据框。此 col_NEW
列将显示 get_number_after_code()
函数的输出,因为它应用于 result
列中的行元素。例如,假设我们使用代码 'h'(但它可以是 's' 或 'e')。输出将是:
result status col_NEW
s17h10e7 102 10
s5e3h2s105h90e15 117 92
s17h10e7s5e3h2s105h807e15 205 819
为此,我使用:
df['col_NEW'] = df.apply(get_number_after_code(df['result'], 'h'), axis=1)
我得到这个不太有用 AssertionError
:
AssertionError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_21060/915445793.py in <module>
----> 1 df['col_NEW'] = df.apply(count_tests_new(df['result'], 's'), axis=1)
~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwargs)
8738 kwargs=kwargs,
8739 )
-> 8740 return op.apply()
8741
8742 def applymap(
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply(self)
686 return self.apply_raw()
687
--> 688 return self.apply_standard()
689
690 def agg(self):
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
810
811 def apply_standard(self):
--> 812 results, res_index = self.apply_series_generator()
813
814 # wrap results
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
816
817 def apply_series_generator(self) -> tuple[ResType, Index]:
--> 818 assert callable(self.f)
819
820 series_gen = self.series_generator
AssertionError:
我使用 .apply()
语法 是否正确地添加了 col_NEW
?如果是,有谁知道是什么原因造成的AssertionError
?
您在每一行上调用 get_number_after_code
,但将一个 Series 对象传递给它。由于您似乎只需要“结果”列,因此请在该列上使用 apply
。此外,您可以将字母(例如“h”)作为位置参数传递。见 docs:
df['col_NEW'] = df['result'].apply(get_number_after_code, args=('h',))
或通过关键字:
df['col_NEW'] = df['result'].apply(get_number_after_code, code='h')
输出:
result status col_NEW
0 s17h10e7 102 10
1 s5e3h2S105h90e15 117 92
2 s17H10e7S5e3H2s105h90e15 205 90