如何将一些 CSV 文件合并到一个 DataFrame 中?
How to merge some CSV files into one DataFrame?
我有一些股票报价结构完全相同的 CSV 文件(时间范围是一天):
date,open,high,low,close
2001-10-15 00:00:00 UTC,56.11,59.8,55.0,57.9
2001-10-22 00:00:00 UTC,57.9,63.63,56.88,62.18
我想将它们全部合并到一个 DataFrame 中,每只股票只有收盘价列。问题是不同的文件具有不同的历史深度(它们从不同年份的不同日期开始)。我想在一个 DataFrame 中按日期对齐它们。
我正在尝试 运行 以下代码,但我在结果 df:
中有废话
files = ['FB', 'MSFT', 'GM', 'IBM']
stock_d = {}
for file in files: #reading all files into one dictionary:
stock_d[file] = pd.read_csv(file + '.csv', parse_dates=['date'])
date_column = pd.Series() #the column with all dates from all CSV
for stock in stock_d:
date_column = date_column.append(stock_d[stock]['date'])
date_column = date_column.drop_duplicates().sort_values(ignore_index=True) #keeping only unique values, then sorting by date
df = pd.DataFrame(date_column, columns=['date']) #creating final DataFrame
for stock in stock_d:
stock_df = stock_d[stock] #this is one of CSV files, for example FB.csv
df[stock] = [stock_df.iloc[stock_df.index[stock_df['date'] == date]]['close'] for date in date_column] #for each date in date_column adding close price to resulting DF, or should be None if date not found
print(df.tail()) #something strange here - Series objects in every column
想法是首先从每个文件中提取所有日期,然后根据列和日期分配收盘价。但显然我做错了什么。
你能帮帮我吗?
如果我没理解错的话,你要找的是主元运算:
files = ['FB', 'MSFT', 'GM', 'IBM']
df = [] # this is a list, not a dictionary
for file in files:
# You only care about date and closing price
# so only keep those 2 columns to save memory
tmp = pd.read_csv(file + '.csv', parse_dates=['date'], usecols=['date', 'close']).assign(symbol=file)
df.append(tmp)
# A single `concat` is faster then sequential `append`s
df = pd.concat(df).pivot(index='date', columns='symbol')
我有一些股票报价结构完全相同的 CSV 文件(时间范围是一天):
date,open,high,low,close
2001-10-15 00:00:00 UTC,56.11,59.8,55.0,57.9
2001-10-22 00:00:00 UTC,57.9,63.63,56.88,62.18
我想将它们全部合并到一个 DataFrame 中,每只股票只有收盘价列。问题是不同的文件具有不同的历史深度(它们从不同年份的不同日期开始)。我想在一个 DataFrame 中按日期对齐它们。 我正在尝试 运行 以下代码,但我在结果 df:
中有废话files = ['FB', 'MSFT', 'GM', 'IBM']
stock_d = {}
for file in files: #reading all files into one dictionary:
stock_d[file] = pd.read_csv(file + '.csv', parse_dates=['date'])
date_column = pd.Series() #the column with all dates from all CSV
for stock in stock_d:
date_column = date_column.append(stock_d[stock]['date'])
date_column = date_column.drop_duplicates().sort_values(ignore_index=True) #keeping only unique values, then sorting by date
df = pd.DataFrame(date_column, columns=['date']) #creating final DataFrame
for stock in stock_d:
stock_df = stock_d[stock] #this is one of CSV files, for example FB.csv
df[stock] = [stock_df.iloc[stock_df.index[stock_df['date'] == date]]['close'] for date in date_column] #for each date in date_column adding close price to resulting DF, or should be None if date not found
print(df.tail()) #something strange here - Series objects in every column
想法是首先从每个文件中提取所有日期,然后根据列和日期分配收盘价。但显然我做错了什么。 你能帮帮我吗?
如果我没理解错的话,你要找的是主元运算:
files = ['FB', 'MSFT', 'GM', 'IBM']
df = [] # this is a list, not a dictionary
for file in files:
# You only care about date and closing price
# so only keep those 2 columns to save memory
tmp = pd.read_csv(file + '.csv', parse_dates=['date'], usecols=['date', 'close']).assign(symbol=file)
df.append(tmp)
# A single `concat` is faster then sequential `append`s
df = pd.concat(df).pivot(index='date', columns='symbol')