如何在使用 pandas 读取多个文件时重命名列
How to rename columns while reading multiple files using pandas
我有两个数据框(到 excel 个文件),其中包含以下几列
文件 1- 列
person_ID Test_CODE REGISTRATION_DATE subject_CD subject_DESCRIPTION subject_TYPE
文件 2- 列
person_ID Test_CODE REGISTRATION_DATE subject_Code subject_DESCRIPTION subject_Indicator
但是,subject_CD
和 subject_Code
列的含义相同。同样,subject_TYPE
和 subject_Indicator
意思相同。所以,我想在阅读 excel 文件时重命名它们
我尝试了下面的方法,但它不起作用
dfs = []
for f in files:
df = pd.read_excel(f, sep=",",low_memory=False)
print(df.columns)
df1 = df[df.columns.intersection(['person_ID','Test_CODE','REGISTRATION_DATE','subject_CD','subject_DESCRIPTION','subject_TYPE'])].rename(columns={'subject_TYPE':'subject_Indicator','subject_CD':'subject_Code'})
dfs.append(df1)
因为我想 append/merge 这两个文件,我希望最终数据框中的列名称如下所示
person_ID Test_CODE REGISTRATION_DATE subject_Code subject_DESCRIPTION subject_Indicator
可以帮我解决这个问题吗?
重命名来自特定 df 的 2 列:
df.rename({"subject_CD": "subject_Code", "subject_TYPE": "subject_Indicator"}, axis='columns', inplace =True)
您还可以在同一列上连接 df1 和 df2:
frames = [df1, df2]
result = pd.concat(frames)
如果您想保留读取的第一个文件的列,您可以这样做,它存储第一次迭代的列并将该列分配给其余文件:
dfs = []
for e,f in enumerate(files):
df = pd.read_excel(f)
print(df.columns)
if e == 0:
col = df.columns
df.columns=col
dfs.append(df)
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
'subject_DESCRIPTION', 'subject_TYPE'],
dtype='object')
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_Code',
'subject_DESCRIPTION', 'subject_Indicator'],
dtype='object')
[df.columns for df in dfs] #pd.concat(dfs)
[Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
'subject_DESCRIPTION', 'subject_TYPE'],
dtype='object'),
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
'subject_DESCRIPTION', 'subject_TYPE'],
dtype='object')]
我有两个数据框(到 excel 个文件),其中包含以下几列
文件 1- 列
person_ID Test_CODE REGISTRATION_DATE subject_CD subject_DESCRIPTION subject_TYPE
文件 2- 列
person_ID Test_CODE REGISTRATION_DATE subject_Code subject_DESCRIPTION subject_Indicator
但是,subject_CD
和 subject_Code
列的含义相同。同样,subject_TYPE
和 subject_Indicator
意思相同。所以,我想在阅读 excel 文件时重命名它们
我尝试了下面的方法,但它不起作用
dfs = []
for f in files:
df = pd.read_excel(f, sep=",",low_memory=False)
print(df.columns)
df1 = df[df.columns.intersection(['person_ID','Test_CODE','REGISTRATION_DATE','subject_CD','subject_DESCRIPTION','subject_TYPE'])].rename(columns={'subject_TYPE':'subject_Indicator','subject_CD':'subject_Code'})
dfs.append(df1)
因为我想 append/merge 这两个文件,我希望最终数据框中的列名称如下所示
person_ID Test_CODE REGISTRATION_DATE subject_Code subject_DESCRIPTION subject_Indicator
可以帮我解决这个问题吗?
重命名来自特定 df 的 2 列:
df.rename({"subject_CD": "subject_Code", "subject_TYPE": "subject_Indicator"}, axis='columns', inplace =True)
您还可以在同一列上连接 df1 和 df2:
frames = [df1, df2]
result = pd.concat(frames)
如果您想保留读取的第一个文件的列,您可以这样做,它存储第一次迭代的列并将该列分配给其余文件:
dfs = []
for e,f in enumerate(files):
df = pd.read_excel(f)
print(df.columns)
if e == 0:
col = df.columns
df.columns=col
dfs.append(df)
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
'subject_DESCRIPTION', 'subject_TYPE'],
dtype='object')
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_Code',
'subject_DESCRIPTION', 'subject_Indicator'],
dtype='object')
[df.columns for df in dfs] #pd.concat(dfs)
[Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
'subject_DESCRIPTION', 'subject_TYPE'],
dtype='object'),
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
'subject_DESCRIPTION', 'subject_TYPE'],
dtype='object')]