TypeError: got an unexpected keyword argument
TypeError: got an unexpected keyword argument
下面看似简单的代码抛出如下错误:
Traceback (most recent call last):
File "/home/nirmal/process.py", line 165, in <module>
'time_diff': f.last(adf['time_diff']).over(window_device_rows)
TypeError: __call__() got an unexpected keyword argument 'this_campaign'
代码:
# Function to flag network timeouts
def flag_network_timeout(**kwargs):
if kwargs['this_network'] != kwargs['last_network'] \
or kwargs['this_campaign'] != kwargs['last_campaign'] \
or kwargs['this_adgroup'] != kwargs['last_adgroup'] \
or kwargs['this_creative'] != kwargs['last_creative'] \
or kwargs['time_diff'] > network_timeout:
return 1
else:
return 0
flag_network_timeout = f.udf(flag_network_timeout, IntegerType())
# Column spec to go over the device events and flag network resets
network_timeout_flag = flag_network_timeout(**{
'last_network': f.first(adf['network']).over(window_device_rows),
'last_campaign': f.first(adf['campaign']).over(window_device_rows),
'last_adgroup': f.first(adf['adgroup']).over(window_device_rows),
'last_creative': f.first(adf['creative']).over(window_device_rows),
'this_network': f.last(adf['network']).over(window_device_rows),
'this_campaign': f.last(adf['campaign']).over(window_device_rows),
'this_adgroup': f.last(adf['adgroup']).over(window_device_rows),
'this_creative': f.last(adf['creative']).over(window_device_rows),
'time_diff': f.last(adf['time_diff']).over(window_device_rows)
})
# Update dataframe with the new columns
adf = adf.select('*', network_timeout_flag.alias('network_timeout'))
请问我做错了什么?谢谢。
你得到一个异常,因为 UserDefinedFunction.__call__
只支持可变参数而不支持关键字参数。
def __call__(self, *cols):
sc = SparkContext._active_spark_context
jc = self._judf.apply(_to_seq(sc, cols, _to_java_column))
return Column(jc)
在更基本的层面上,UDF 只能接收 Column
个参数,这些参数将在运行时扩展为相应的值,而不是标准的 Python 个对象。
就我个人而言,我根本不会为此使用 **kwargs
,但忽略了您可以通过编写 SQL 表达式来实现您想要的:
def flag_network_timeout_(**kwargs):
cond = (
(kwargs['this_network'] != kwargs['last_network']) |
(kwargs['this_campaign'] != kwargs['last_campaign']) |
(kwargs['this_adgroup'] != kwargs['last_adgroup']) |
(kwargs['this_creative'] != kwargs['last_creative']) |
(kwargs['time_diff'] > network_timeout))
return f.when(cond, 1).otherwise(0)
下面看似简单的代码抛出如下错误:
Traceback (most recent call last):
File "/home/nirmal/process.py", line 165, in <module>
'time_diff': f.last(adf['time_diff']).over(window_device_rows)
TypeError: __call__() got an unexpected keyword argument 'this_campaign'
代码:
# Function to flag network timeouts
def flag_network_timeout(**kwargs):
if kwargs['this_network'] != kwargs['last_network'] \
or kwargs['this_campaign'] != kwargs['last_campaign'] \
or kwargs['this_adgroup'] != kwargs['last_adgroup'] \
or kwargs['this_creative'] != kwargs['last_creative'] \
or kwargs['time_diff'] > network_timeout:
return 1
else:
return 0
flag_network_timeout = f.udf(flag_network_timeout, IntegerType())
# Column spec to go over the device events and flag network resets
network_timeout_flag = flag_network_timeout(**{
'last_network': f.first(adf['network']).over(window_device_rows),
'last_campaign': f.first(adf['campaign']).over(window_device_rows),
'last_adgroup': f.first(adf['adgroup']).over(window_device_rows),
'last_creative': f.first(adf['creative']).over(window_device_rows),
'this_network': f.last(adf['network']).over(window_device_rows),
'this_campaign': f.last(adf['campaign']).over(window_device_rows),
'this_adgroup': f.last(adf['adgroup']).over(window_device_rows),
'this_creative': f.last(adf['creative']).over(window_device_rows),
'time_diff': f.last(adf['time_diff']).over(window_device_rows)
})
# Update dataframe with the new columns
adf = adf.select('*', network_timeout_flag.alias('network_timeout'))
请问我做错了什么?谢谢。
你得到一个异常,因为 UserDefinedFunction.__call__
只支持可变参数而不支持关键字参数。
def __call__(self, *cols):
sc = SparkContext._active_spark_context
jc = self._judf.apply(_to_seq(sc, cols, _to_java_column))
return Column(jc)
在更基本的层面上,UDF 只能接收 Column
个参数,这些参数将在运行时扩展为相应的值,而不是标准的 Python 个对象。
就我个人而言,我根本不会为此使用 **kwargs
,但忽略了您可以通过编写 SQL 表达式来实现您想要的:
def flag_network_timeout_(**kwargs):
cond = (
(kwargs['this_network'] != kwargs['last_network']) |
(kwargs['this_campaign'] != kwargs['last_campaign']) |
(kwargs['this_adgroup'] != kwargs['last_adgroup']) |
(kwargs['this_creative'] != kwargs['last_creative']) |
(kwargs['time_diff'] > network_timeout))
return f.when(cond, 1).otherwise(0)