Applying function objects with lambda functions. ValueError: Columns must be same length as key
Applying function objects with lambda functions. ValueError: Columns must be same length as key
上下文 我正在尝试将函数对象列表中的几个函数应用到特定的数据框列中,但是,我不断收到此错误“ValueError:列的长度必须与键相同”
possible_message_names = ['x','y','z']
path_of_the_directory= r'{}'.format(path_of_the_directory)
processing_list = [remove_whitespace,convert_to_unicode]
for root, dirs, files in os.walk(path_of_the_directory):
print("Normalizing the files in the directory: {}".format(root))
for individual_file in tqdm(files):
dataframe = pd.DataFrame(pd.read_excel(os.path.join(root, individual_file)))
for possible_column_name in possible_message_names:
if possible_column_name in dataframe.columns:
dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else text for method in processing_list )
dataframe.to_excel('{}\normalized_{}'.format(root,individual_file), index=False)
非常欢迎任何帮助
P.S。我正在尝试标准化 unicode(因此列表中的 convert_to_unicode 函数)
编辑:我注意到这样做
dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else next for method in processing_list )
而不是
dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else next for method in processing_list )
解决了这个错误,但是函数没有以这种方式应用...
类似这样的方法似乎有效:
for method in processing_list : #iterates over the methods added by the user in the pipeline and applies to the column to be cleaned
if callable(method): #if the method is a function object
dataframe[possible_column_name ] = dataframe[possible_column_name ].apply(method)
不要遍历 apply
参数中的方法,而是在脚本中遍历它们,然后将该方法应用于所有行。
这将累积所有修改,而不是返回对原始文本的调用生成器。
for possible_column_name in possible_message_names:
if possible_column_name in dataframe.columns:
for method in processing_list:
dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else text)
上下文 我正在尝试将函数对象列表中的几个函数应用到特定的数据框列中,但是,我不断收到此错误“ValueError:列的长度必须与键相同”
possible_message_names = ['x','y','z']
path_of_the_directory= r'{}'.format(path_of_the_directory)
processing_list = [remove_whitespace,convert_to_unicode]
for root, dirs, files in os.walk(path_of_the_directory):
print("Normalizing the files in the directory: {}".format(root))
for individual_file in tqdm(files):
dataframe = pd.DataFrame(pd.read_excel(os.path.join(root, individual_file)))
for possible_column_name in possible_message_names:
if possible_column_name in dataframe.columns:
dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else text for method in processing_list )
dataframe.to_excel('{}\normalized_{}'.format(root,individual_file), index=False)
非常欢迎任何帮助
P.S。我正在尝试标准化 unicode(因此列表中的 convert_to_unicode 函数)
编辑:我注意到这样做
dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else next for method in processing_list )
而不是
dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else next for method in processing_list )
解决了这个错误,但是函数没有以这种方式应用...
类似这样的方法似乎有效:
for method in processing_list : #iterates over the methods added by the user in the pipeline and applies to the column to be cleaned
if callable(method): #if the method is a function object
dataframe[possible_column_name ] = dataframe[possible_column_name ].apply(method)
不要遍历 apply
参数中的方法,而是在脚本中遍历它们,然后将该方法应用于所有行。
这将累积所有修改,而不是返回对原始文本的调用生成器。
for possible_column_name in possible_message_names:
if possible_column_name in dataframe.columns:
for method in processing_list:
dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else text)