"Series[Dtype]" 类型的参数不能分配给 "DataFrame" 类型的参数

Question

我定义了以下辅助方法

def load_excel(file_path: str, sheet_name: str = ''):
    if sheet_name == '':
        df = pd.read_excel(file_path).fillna('').apply(lambda x: x.astype(str).str.lower())
    else:
        df = pd.read_excel(file_path, sheet_name).fillna('').apply(lambda x: x.astype(str).str.lower())
        
    return df

def build_score_dict(keywords_df: pd.DataFrame, tokens: list):
    """
    Returns a tuple of two dictionories. i.e. tuple[dict, dict]
    """
    matched_keywords_by_cat_dict={}
    score_dict={}

    cnt_cols = keywords_df.shape[1]
    
    for col_idx in range(0, cnt_cols):
        keyword_list=list(keywords_df.iloc[:,col_idx])
        matched_keywords=[]
        parent_cat=0
        for j in range(0,len(tokens)):
            token = tokens[j]
            if token in keyword_list:
                parent_cat= parent_cat + 1
                matched_keywords.append(token)
                parent_cat_name = keywords_df.columns[col_idx]
                matched_keywords_by_cat_dict[parent_cat_name]=matched_keywords
                score_dict[parent_cat_name]=parent_cat
    
    return matched_keywords_by_cat_dict, score_dict

我的调用build_score_dict，如下图

third_level_closing=load_excel(input_dir+'third_level_keywords.xlsx',sheet_name='closing')     
_, level3_score_dict = build_score_dict(third_level_closing, tokens)

Pylance 在 VSCode 中为我提供了以下 warning/error。这里发生了什么以及如何解决它？

Argument of type "Series[Dtype]" cannot be assigned to parameter "keywords_df" of type "DataFrame" in function "build_score_dict"
  "Series[Dtype]" is incompatible with "DataFrame"Pylance (reportGeneralTypeIssues)

Answer 1

解决方法

如果您在调用 apply 时给 axis 一个值，它应该可以解决问题：

def load_excel(file_path: str, sheet_name: str = ''):
    if sheet_name == '':
        df = pd.read_excel(file_path).fillna('').apply(lambda x: x.astype(str).str.lower(), axis='index')
    else:
        df = pd.read_excel(file_path, sheet_name).fillna('').apply(lambda x: x.astype(str).str.lower(), axis='index')
        
    return df

说明

如果将类型信息添加到函数 load_excel 的 return 值中，您将看到类型检查器将 df 视为 Series 而不是 DataFrame:

如果我们将函数代码写成下面这样，我们可以很快发现apply方法是问题的根源：

def load_excel(file_path: str, sheet_name: str = "") -> pd.DataFrame:
    if sheet_name == "":
        df: pd.DataFrame = pd.read_excel(file_path)
    else:
        df: pd.DataFrame = pd.read_excel(file_path, sheet_name)

    df = df.fillna("")
    df = df.apply(lambda x: x.astype(str).str.lower())

    return df

如果我们在 apply 上按住 VSCode（在 Windows 上），我们可以看到以下内容：

这表明，如果 apply 方法接收的唯一参数是 f，则类型检查器无法判断 apply 方法的哪个版本是您想要的。似乎 Pylance 实现采用了它找到的第一个定义，这就是为什么你最终将 apply 的 return 假定为 Series 的原因。当您添加 axis 参数时，类型检查器现在可以寻找第二个定义 return 是 DataFrame。

"Series[Dtype]" 类型的参数不能分配给 "DataFrame" 类型的参数

Argument of type "Series[Dtype]" cannot be assigned to parameter of type "DataFrame"

python

dataframe

python-3.x

pandas

pylance

解决方法

说明