如何在使用 pandas 读取多个文件时重命名列

How to rename columns while reading multiple files using pandas

我有两个数据框(到 excel 个文件),其中包含以下几列

文件 1- 列

person_ID   Test_CODE   REGISTRATION_DATE   subject_CD   subject_DESCRIPTION    subject_TYPE

文件 2- 列

person_ID   Test_CODE   REGISTRATION_DATE   subject_Code subject_DESCRIPTION    subject_Indicator

但是,subject_CDsubject_Code 列的含义相同。同样,subject_TYPEsubject_Indicator 意思相同。所以,我想在阅读 excel 文件时重命名它们

我尝试了下面的方法,但它不起作用

dfs = []       
for f in files:
    df = pd.read_excel(f, sep=",",low_memory=False)
    print(df.columns)
    df1 = df[df.columns.intersection(['person_ID','Test_CODE','REGISTRATION_DATE','subject_CD','subject_DESCRIPTION','subject_TYPE'])].rename(columns={'subject_TYPE':'subject_Indicator','subject_CD':'subject_Code'})
    dfs.append(df1)

因为我想 append/merge 这两个文件,我希望最终数据框中的列名称如下所示

person_ID   Test_CODE   REGISTRATION_DATE   subject_Code subject_DESCRIPTION subject_Indicator

可以帮我解决这个问题吗?

重命名来自特定 df 的 2 列:

 df.rename({"subject_CD": "subject_Code", "subject_TYPE": "subject_Indicator"}, axis='columns', inplace =True) 

您还可以在同一列上连接 df1 和 df2:

frames = [df1, df2]
result = pd.concat(frames)

如果您想保留读取的第一个文件的列,您可以这样做,它存储第一次迭代的列并将该列分配给其余文件:

dfs = []       
for e,f in enumerate(files):
    df = pd.read_excel(f)
    print(df.columns)
    if e == 0:
        col = df.columns
    df.columns=col
    dfs.append(df)


Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
       'subject_DESCRIPTION', 'subject_TYPE'],
      dtype='object')
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_Code',
       'subject_DESCRIPTION', 'subject_Indicator'],
      dtype='object')

[df.columns for df in dfs] #pd.concat(dfs)

[Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
        'subject_DESCRIPTION', 'subject_TYPE'],
       dtype='object'),
 Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
        'subject_DESCRIPTION', 'subject_TYPE'],
       dtype='object')]