使用 Python 3 将多个 excel 工作簿和工作表导入单个数据框
Using Python 3 to import multiple excel workbooks and sheets into single data frame
我还在学习python。我正在尝试将多个工作簿和所有工作表导入一个数据框中。
这是我目前的情况:
import pandas as pd
import numpy as np
import os #checking the working directory
print(os.getcwd())
all_data = pd.DataFrame() #creating an empty data frame
for file in glob.glob("*.xls"): #import every file that ends in .xls
df = pd.read_excel(file)
all_data = all_data.append(df, ignore_index = True)
all_data.shape #12796 rows with 19 columns # we will have to find a way to check if this is accurate
我很难找到任何文档来 confirm/explain 无论此代码是否导入每个工作簿中的所有数据表。其中一些文件有 15-20 张
这是我找到 glob 解释的 link:http://pbpython.com/excel-file-combine.html
非常感谢任何和所有建议。我对 R 和 Python 还是很陌生,所以如果你能尽可能详细地解释这一点,我将不胜感激!
您缺少的是导入工作簿中的所有工作表。
import pandas as pd
import numpy as np
import os #checking the working directory
print(os.getcwd())
all_data = pd.DataFrame() #creating an empty data frame
rows = 0
for file in glob.glob("*.xls"): #import every file that ends in .xls
# df = pd.read_excel(file).. This will import only first sheet
xls = pd.ExcelFile(file)
sheets = xls.sheet_names # To get names of all the sheets
for sheet_name in sheets:
df = pd.read_excel(file, sheetname=sheet_name)
rows += df.shape[0]
all_data = all_data.append(df, ignore_index = True)
print(all_data.shape[0]) # Now you will get all the rows which should be equal to rows
print(rows)
我还在学习python。我正在尝试将多个工作簿和所有工作表导入一个数据框中。
这是我目前的情况:
import pandas as pd
import numpy as np
import os #checking the working directory
print(os.getcwd())
all_data = pd.DataFrame() #creating an empty data frame
for file in glob.glob("*.xls"): #import every file that ends in .xls
df = pd.read_excel(file)
all_data = all_data.append(df, ignore_index = True)
all_data.shape #12796 rows with 19 columns # we will have to find a way to check if this is accurate
我很难找到任何文档来 confirm/explain 无论此代码是否导入每个工作簿中的所有数据表。其中一些文件有 15-20 张
这是我找到 glob 解释的 link:http://pbpython.com/excel-file-combine.html
非常感谢任何和所有建议。我对 R 和 Python 还是很陌生,所以如果你能尽可能详细地解释这一点,我将不胜感激!
您缺少的是导入工作簿中的所有工作表。
import pandas as pd
import numpy as np
import os #checking the working directory
print(os.getcwd())
all_data = pd.DataFrame() #creating an empty data frame
rows = 0
for file in glob.glob("*.xls"): #import every file that ends in .xls
# df = pd.read_excel(file).. This will import only first sheet
xls = pd.ExcelFile(file)
sheets = xls.sheet_names # To get names of all the sheets
for sheet_name in sheets:
df = pd.read_excel(file, sheetname=sheet_name)
rows += df.shape[0]
all_data = all_data.append(df, ignore_index = True)
print(all_data.shape[0]) # Now you will get all the rows which should be equal to rows
print(rows)