向每个 excel 文件添加一个月份列,然后将所有文件合并到一个 .csv 文件中

Add a month column to each excel file and then merge all files into a .csv

我是新手 python,出于工作目的,我在这里寻求您的帮助。

我在同一个文件夹中每月有 12 个 excel 个文件,其中包含如下列:Product_Name、数量和 Total_Value


  1. 在文件名中包含相同日期的每个文件上添加月份列
  2. 将那些 excel 个文件合并到一个唯一的文件中


1 月-21.xls:

Product_Name (type:string) Quantity (type:float) Total_Value (type:float) Month (type:Date)
Product A 10 250 "File Name" (January-21)
Product B 20 500 "File Name" (January-21)
Product C 15 400 "File Name" (January-21)


Product_Name (type:string) Quantity (type:float) Total_Value (type:float) Month (type:Date)
Product A 40 800 "File Name" (February-21)
Product B 25 700 "File Name" (February-21)
Product C 30 500 "File Name" (February-21)


Product_Name (type:string) Quantity (type:float) Total_Value (type:float) Month (type:Date)
Product A 10 250 "File Name" (January-21)
Product B 20 500 "File Name" (January-21)
Product C 15 400 "File Name" (January-21)
Product A 40 800 "File Name" (February-21)
Product B 25 700 "File Name" (February-21)
Product C 30 500 "File Name" (February-21)




这就是我合并、创建 csv 文件并使用 pandas 转换为数据帧的方式:

import pandas as pd
import os

path = "/content/drive/MyDrive/Colab_Notebooks/sq_datas"
files = [file for file in os.listdir(path) if not file.startswith('.')] # Ignore hidden files

all_months_data = pd.DataFrame()

for file in files:
    current_data = pd.read_excel(path+"/"+file)
    all_months_data = pd.concat([all_months_data, current_data])
all_months_data.to_csv("/content/drive/MyDrive/Colab_Notebooks/sq_datas/all_months.csv", index=False)


在基本层面上,您首先需要阅读 Excel 文件,例如 pandas.read_excel:

import pandas as pd

jan21_df = pd.read_excel('January-21.xls')
feb21_df = pd.read_excel('February-21.xls')

您为月份栏填写了 type:Date。向每个数据框添加日期列:

jan21_df['Month'] = pd.to_datetime('2021-01-01')
feb21_df['Month'] = pd.to_datetime('2021-02-01')


jan21_df['Month'] = "File Name (January-21)"
feb21_df['Month'] = "File Name (February-21)"


combined = pd.concat([jan21_df, feb21_df])


EDIT:基于 OP 中的编辑,循环中的少量添加:

for file in files:
    current_data = pd.read_excel(path+"/"+file)
    current_data['Month'] = file
    all_months_data = pd.concat([all_months_data, current_data])


from pathlib import Path

path = Path("/content/drive/MyDrive/Colab_Notebooks/sq_datas")
all_data = []

for file in path.glob("*.xls"):
    # Parse the month from the file's name
    # month will be something like "January" and "February"
    # year will be something like "20" and "21"
    # date will be something like pd.Timestamp("2021-01-01")
    month, year = file.stem.split("-")
    date = pd.Timestamp(f"{month} 1, 20{year}")
    # Read data from the current file
    current_data = pd.read_excel(file).assign(Month=date)

    # Append the data to the list

# Combine all data from the list into a single DataFrame
all_data = pd.concat(all_data)