嵌套 for 循环内分配的数据类型不符合预期

Question

我收到错误：

AttributeError: 'float' object 没有属性 'lower'

尝试编译这个三重嵌套 for 循环时：

for row_data in df_row_list:
    for row_item_data in row_data:
        for param in search_params:
            if row_item_data.lower() == param.lower():
                row_index = df_row_list.index(row_data)

df_row_list是18个系列的列表。我正在尝试遍历它并梳理数据。如何将 str 数据类型分配给 row_item_data 以便我可以使用 .lower() 属性？

这是我正在处理的数据的样子：

0         NaN        NaN  ...             NaN              NaN
1      REV. :         NC  ...             NaN              NaN
2      OP.# :  0200-00-0  ...             NaN              NaN
3         NaN        NaN  ...             NaN              NaN
4    WI ASM #   HOLDER #  ...  TOOL STICK OUT  TOOL LIFE (MIN)
5         NaN        NaN  ...            0.55              120
6         NaN        NaN  ...            0.55              120
7         NaN        NaN  ...            0.55              120
8         NaN        NaN  ...            0.55              240
9         NaN        NaN  ...            0.55              300

搜索参数正在寻找包含以下词的系列：HOLDER DESCRIPTION、CUTTER #、Operation、TOOL DESCRIPTION 我创建了一个电子表格，其中存储了数百个选项，我将与之进行比较。

我希望它从 df_row_list（其中包含多个系列的列表）中吐出系列的索引，这样我就可以知道我想使用的数据行在哪里“标题行”是。

或者这甚至不是尝试针对特定关键字梳理系列列表的最佳方式吗？我是 python 的新手，愿意接受任何帮助。

Answer 1

只是发帖以防有人遇到类似问题并正在寻找不同的解决方案

这就是我找到解决方案的方式：

import os
import pandas as pd

#the file path I want to pull from
in_path = r'W:\R1_Manufacturing\Parts List Project\Tool_scraping\Excel'
#the file path where row search items are stored
search_parameters = r'W:\R1_Manufacturing\Parts List Project\search_params.xlsx'
#the file I will write the dataframes to
outfile_path = r'W:\R1_Manufacturing\Parts List Project\xlsx_reader.xlsx'

#establishing my list that I will store looped data into
file_list = []
main_header_list = []

#open the file path to store the directory in files
files = os.listdir(in_path)

#database with terms that I want to track
search = pd.read_excel(search_parameters)
length_search = search.index 

#turn search dataframe into string to do case-insensitive compare
search_string = search.to_string(header = False, index = False)

#function for case-insenitive string compare
def insensitive_compare(x1, y1):
    if x1.lower() == y1.lower():
        return True  

#function to iterate through current_file for strings and compare to 
#search_parameters to grab data column headers
def filter_data(columns, rows): #I need to fix this to stop getting that A
    for name in columns:
        for number in rows:
            cell = df.at[number, name]
            if cell == '':
                continue
            for place in length_search:
                #main compare, if str and matches search params, then do...
                if isinstance(cell, str) and insensitive_compare(search.at[place, 'Parameters'], cell) == True:
                    #this is to prevent repeats in the header list
                    if cell in header_list:
                        continue
                    else:
                        header_list.append(cell) #store data headers
                        row_list.append(number)  #store row number where it is in that data frame
                        column_list.append(name) #store column number where it is in that data frame
                else:
                    continue

#searching only for files that end with .xlsx
for file in files:
    if file.endswith('.xlsx'):
        file_list.append(in_path + '/' + file)
        
        
#read in the files to a dataframe, main loop the files will be maninpulated in
for current_file in file_list:
    df = pd.read_excel(current_file)
    
    header_list = []
    
    #get columns headers and a range for total rows
    columns = df.columns
    total_rows = df.index
    
    #adding to store where headers are stored in DF
    row_list = []
    column_list = []
    storage_list = []
    
    #add the file name to the header file so it can be separated by file
    #header_list.append(current_file)
    main_header_list.append(header_list)
    
    #running function to grab header names
    filter_data(columns, total_rows)

所以现在当我编译并输出数据时，我得到：

WI ASM #
HOLDER #
HOLDER DESCRIPTION
A.63.140.1/8z
A.63.140.1/8z
A.63.140.1/8z
A.63.140.1/8z
A.63.140.1/8z
CUTTER #
Harvey 980215
Harvey 980215
Harvey 28178
Harvey 28178
Harvey 74362-C3
OPERATION
GROOVE
ROUGHING
SEMI-FINISH
FINISH
DEBURR & BLEND
TOOL DESCRIPTION
CREM_.125_.015R_1
CREM_.125_.015R_2
CREM_.0781_.015R_1
CREM_.0781_.015R_2
BEM_.0625
Starting Radial Wear
-
-
-
-0.0002
-
TOOL STICK OUT
0.55
0.55
0.55
0.55
0.55
TOOL LIFE (MIN)
120
120
120
240
300

已按我查找的顺序进行清理。

嵌套 for 循环内分配的数据类型不符合预期

Data type assigned inside nested for loop isn't as expected

python

list

series

dataframe

pandas