如何在索引级别 python 中的 xls 文件中删除 sheet 中的特定列？

Question

我的 xlsx 文件中有多个 sheet。我所拥有的特定 sheet 被命名为已取消会员。

这个sheet（取消的会员）的数据框看起来像

Address State Zip Status Status Date  Partner
xx      NY    110  G      O      1     V

我想从此特定 sheet 中删除第一个状态列。

我试过了

import pandas as pd
from openpyxl import load_workbook 
temp = pd.read_excel(file, sheet_name=None,skiprows=5)
temp = if ws.startswith("Cancelled"): temp.drop(temp.columns[[3]], axis=1)

我试图将其删除到索引级别 [[3]] 但是但是我收到一个无效的语法错误。如何在 sheet 中删除 col?

Answer 1

您可以像这样执行您的要求：

import pandas as pd
temp = pd.DataFrame(columns=['Address', 'State', 'Zip', 'Status', 'Status', 'Date', 'Partner'],
    data=[['xx','NY',110,'G','O',1,'V']])
print(temp)
temp.columns = [col + str(i) if col == "Status" else col for i, col in enumerate(temp.columns)]
temp = temp.drop(temp.columns[3], axis=1).rename(columns={col:"Status" for col in temp.columns if col.startswith("Status")})
print(temp)

输出：

  Address State  Zip Status Status  Date Partner
0      xx    NY  110      G      O     1       V
  Address State  Zip Status  Date Partner
0      xx    NY  110      O     1       V

temp.columns 赋值给每个名为 Status 的列附加一个唯一编号，从而确保没有重复的列具有该名称。然后我们将列放在您想要的位置，然后将以 Status 开头的所有剩余列的名称再次恢复为 Status.

完整的测试代码如下所示：

file="PSI 001.xlsx"
import pandas as pd
from openpyxl import load_workbook 
dfs = pd.read_excel(file, sheet_name=None,skiprows=5)
output = dict()
for ws, df in dfs.items():
    if ws.startswith("Cancelled"):
        temp = df
        temp.columns = [col + str(i) if col == "Status" else col for i, col in enumerate(temp.columns)]
        temp = temp.drop(temp.columns[3], axis=1).rename(columns={col:"Status" for col in temp.columns if col.startswith("Status")})
        output[ws] = temp
writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
for ws, df in output.items():
    df.to_excel(writer, index=None, sheet_name=ws)
writer.save()
writer.close()

我用一个名为 PSI 001.xlsx 的文件和一个名为 Cancelled 的 sheet 文件对此进行了测试，其中包含以下内容：

skip                        
skip                        
skip                        
skip                        
skip                        
Address State   Zip Status  Status  Date    Partner
xx  NY  110 G   O   1   V

... 它生成了一个名为 PSI 001 (updated headers) 的文件，其中包含一个名为 sheet 的文件，其内容如下：

Address State   Zip Status  Date    Partner
xx  NY  110 O   1   V

Answer 2

您可以尝试这样做：

# loop through each in dictionary of dataframes
for sheet in temp:
    # check if sheet starts with 'Cancelled'
    if sheet.startswith('Cancelled'):
        # for each column in dataframe
        for column in list(temp[sheet]):
            # check if column is named 'Status'
            if column == 'Status':
                # drop column if True
                temp[sheet] = temp[sheet].drop(column, axis=1)

如何在索引级别 python 中的 xls 文件中删除 sheet 中的特定列？

How do i drop a specific column from a sheet in my xls file in python at index level?

python

startswith

dataframe

pandas