Applying function objects with lambda functions. ValueError: Columns must be same length as key

Question

上下文我正在尝试将函数对象列表中的几个函数应用到特定的数据框列中，但是，我不断收到此错误“ValueError：列的长度必须与键相同”

possible_message_names = ['x','y','z']
path_of_the_directory= r'{}'.format(path_of_the_directory)
processing_list = [remove_whitespace,convert_to_unicode]

for root, dirs, files in os.walk(path_of_the_directory):
    print("Normalizing the files in the directory: {}".format(root))
    for individual_file in tqdm(files):
        dataframe = pd.DataFrame(pd.read_excel(os.path.join(root, individual_file)))
        for possible_column_name in possible_message_names:
            if possible_column_name in dataframe.columns:
                
                dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else text for method in processing_list )
        dataframe.to_excel('{}\normalized_{}'.format(root,individual_file), index=False)

非常欢迎任何帮助

P.S。我正在尝试标准化 unicode（因此列表中的 convert_to_unicode 函数）

编辑：我注意到这样做

dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else next for method in processing_list )

而不是

dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else next for method in processing_list )

解决了这个错误，但是函数没有以这种方式应用...

类似这样的方法似乎有效：

for method in processing_list : #iterates over the methods added by the user in the pipeline and applies to the column to be cleaned
    if callable(method): #if the method is a function object
        dataframe[possible_column_name ] = dataframe[possible_column_name ].apply(method)

Answer 1

不要遍历 apply 参数中的方法，而是在脚本中遍历它们，然后将该方法应用于所有行。

这将累积所有修改，而不是返回对原始文本的调用生成器。

        for possible_column_name in possible_message_names:
            if possible_column_name in dataframe.columns:
                for method in processing_list:
                    dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else text)

Applying function objects with lambda functions. ValueError: Columns must be same length as key

Applying function objects with lambda functions. ValueError: Columns must be same length as key

python

pandas