Applying function objects with lambda functions. ValueError: Columns must be same length as key

Applying function objects with lambda functions. ValueError: Columns must be same length as key

上下文 我正在尝试将函数对象列表中的几个函数应用到特定的数据框列中,但是,我不断收到此错误“ValueError:列的长度必须与键相同”

possible_message_names = ['x','y','z']
path_of_the_directory= r'{}'.format(path_of_the_directory)
processing_list = [remove_whitespace,convert_to_unicode]

for root, dirs, files in os.walk(path_of_the_directory):
    print("Normalizing the files in the directory: {}".format(root))
    for individual_file in tqdm(files):
        dataframe = pd.DataFrame(pd.read_excel(os.path.join(root, individual_file)))
        for possible_column_name in possible_message_names:
            if possible_column_name in dataframe.columns:
                
                dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else text for method in processing_list )
        dataframe.to_excel('{}\normalized_{}'.format(root,individual_file), index=False)

非常欢迎任何帮助

P.S。我正在尝试标准化 unicode(因此列表中的 convert_to_unicode 函数)

编辑:我注意到这样做

dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else next for method in processing_list )

而不是

dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else next for method in processing_list )

解决了这个错误,但是函数没有以这种方式应用...

类似这样的方法似乎有效:

for method in processing_list : #iterates over the methods added by the user in the pipeline and applies to the column to be cleaned
    if callable(method): #if the method is a function object
        dataframe[possible_column_name ] = dataframe[possible_column_name ].apply(method)

不要遍历 apply 参数中的方法,而是在脚本中遍历它们,然后将该方法应用于所有行。

这将累积所有修改,而不是返回对原始文本的调用生成器。

        for possible_column_name in possible_message_names:
            if possible_column_name in dataframe.columns:
                for method in processing_list:
                    dataframe[possible_column_name] = dataframe[possible_column_name].apply(lambda text: method(text) if type(text) == str else text)