如何将所有csv文件合并为一个文件,并将数据堆叠在原始headers下?
How to merge all csv files into one file and have the data stacked under the original headers?
我是 python 的新手,正在尝试了解数据操作。
我有一个包含多个文件的文件夹。其中一些是 csv。我想合并所有的 csv——大约 400 个 csv 到一个 csv 和所有要堆叠的数据
例如,如果第一个 csv 有一个数据框-
transcript confidence from to speaker Negative Neutral Positive compound
thank you 0.85 1.39 1.65 0 0 0.754 0.246 0.7351
第二个有一个数据框:
transcript confidence from to speaker Negative Neutral Positive compound
welcome 0.95 1.39 1.65 0 0 0.754 0.201 0.8351
我希望最终的 df 看起来像 -
transcript confidence from to speaker Negative Neutral Positive compound
thank you 0.85 1.39 1.65 0 0 0.754 0.246 0.7351
welcome 0.95 1.39 1.65 0 0 0.754 0.201 0.8351
我试过了-
import glob
import pandas as pd
# Folder containing the .csv files to merge
file_path = "C:\Users\Desktop"
# This pattern \* selects all files in a directory
pattern = file_path + "\*"
files = glob.glob(pattern)
# Import first file to initiate the dataframe
df = pd.read_csv(files[0],encoding = "utf-8", delimiter = ",")
# Append all the files as dataframes to the first one
for file in files[1:len(file_list)]:
df_csv = pd.read_csv(file,encoding = "utf-8", delimiter = ",")
df = df.append(df_csv)
但是没有用。我该如何解决这个问题?
这应该有帮助:
import pandas as pd
import glob
import os.path
file_path = "C:/Users/Desktop"
data = []
for csvfile in glob.glob(os.path.join(file_path, "*.csv")):
df = pd.read_csv(csvfile, encoding="utf-8", delimiter=",")
data.append(df)
data = pd.concat(data, ignore_index=True)
注意:- 我会建议您而不是从桌面获取所有 CSV 文件。请将其保存到一个目录中,如果您想在将来分析该特定数据集,这也会有所帮助。
解决方案前的基本要求:-您要合并的所有 CSV 文件应位于同一目录中。
# Import all Important Libraries
# 'os' module will provide a portable way of using an operating system with dependent functionality such as 'Open File', and much more...
import os
# 'glob' module helps to find all the pathnames matched with a specified pattern according to the rules. Such as '*.csv' which is used in our case for finding all CSV Files
import glob
# 'pandas' is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool
import pandas as pd
# First of all declare 'path' variable for finding all the CSV
path = "C:/Users/Desktop"
# Store all files in 'all_files' using 'glob' function. and a pattern used is '*.csv' Which will find all the CSV and 'join' it
all_files = glob.glob(os.path.join(path, "*.csv"))
# Initialize 'DataFrame' Variable from each fetched CSV file
df_from_each_file = (pd.read_csv(csvfiles) for csvfiles in all_files)
# if you have 'Seperator' then use 'pd.read_csv(csvfiles, sep='seprator pattern ('\', ',', etc.)')' in above code
# Concat all the 'DataFrame' using 'pd.concat()'
df_merged = pd.concat(df_from_each_file, ignore_index=True)
# Store Merged CSV Files into 'merged.csv' File
df_merged.to_csv("merged.csv")
我是 python 的新手,正在尝试了解数据操作。
我有一个包含多个文件的文件夹。其中一些是 csv。我想合并所有的 csv——大约 400 个 csv 到一个 csv 和所有要堆叠的数据
例如,如果第一个 csv 有一个数据框-
transcript confidence from to speaker Negative Neutral Positive compound
thank you 0.85 1.39 1.65 0 0 0.754 0.246 0.7351
第二个有一个数据框:
transcript confidence from to speaker Negative Neutral Positive compound
welcome 0.95 1.39 1.65 0 0 0.754 0.201 0.8351
我希望最终的 df 看起来像 -
transcript confidence from to speaker Negative Neutral Positive compound
thank you 0.85 1.39 1.65 0 0 0.754 0.246 0.7351
welcome 0.95 1.39 1.65 0 0 0.754 0.201 0.8351
我试过了-
import glob
import pandas as pd
# Folder containing the .csv files to merge
file_path = "C:\Users\Desktop"
# This pattern \* selects all files in a directory
pattern = file_path + "\*"
files = glob.glob(pattern)
# Import first file to initiate the dataframe
df = pd.read_csv(files[0],encoding = "utf-8", delimiter = ",")
# Append all the files as dataframes to the first one
for file in files[1:len(file_list)]:
df_csv = pd.read_csv(file,encoding = "utf-8", delimiter = ",")
df = df.append(df_csv)
但是没有用。我该如何解决这个问题?
这应该有帮助:
import pandas as pd
import glob
import os.path
file_path = "C:/Users/Desktop"
data = []
for csvfile in glob.glob(os.path.join(file_path, "*.csv")):
df = pd.read_csv(csvfile, encoding="utf-8", delimiter=",")
data.append(df)
data = pd.concat(data, ignore_index=True)
注意:- 我会建议您而不是从桌面获取所有 CSV 文件。请将其保存到一个目录中,如果您想在将来分析该特定数据集,这也会有所帮助。
解决方案前的基本要求:-您要合并的所有 CSV 文件应位于同一目录中。
# Import all Important Libraries
# 'os' module will provide a portable way of using an operating system with dependent functionality such as 'Open File', and much more...
import os
# 'glob' module helps to find all the pathnames matched with a specified pattern according to the rules. Such as '*.csv' which is used in our case for finding all CSV Files
import glob
# 'pandas' is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool
import pandas as pd
# First of all declare 'path' variable for finding all the CSV
path = "C:/Users/Desktop"
# Store all files in 'all_files' using 'glob' function. and a pattern used is '*.csv' Which will find all the CSV and 'join' it
all_files = glob.glob(os.path.join(path, "*.csv"))
# Initialize 'DataFrame' Variable from each fetched CSV file
df_from_each_file = (pd.read_csv(csvfiles) for csvfiles in all_files)
# if you have 'Seperator' then use 'pd.read_csv(csvfiles, sep='seprator pattern ('\', ',', etc.)')' in above code
# Concat all the 'DataFrame' using 'pd.concat()'
df_merged = pd.concat(df_from_each_file, ignore_index=True)
# Store Merged CSV Files into 'merged.csv' File
df_merged.to_csv("merged.csv")