如何将多个文件作为单独的数据帧读取并在列上执行计算？

Question

我正在计算单个股票return如下：

data = pd.read_csv(r'**file**.csv')
data.index = data.Date
data['Return %'] = data['AAPL'].pct_change(-1)*100
data

输出：

    Date    AAPL    Return %
Data            
2020-09-11  2020-09-11  56.00   0.000000
2020-09-10  2020-09-10  56.00   -3.879162
2020-09-09  2020-09-09  58.26   2.138850
2020-09-08  2020-09-08  57.04   -2.211555
2020-09-04  2020-09-04  58.33   0.882048
2020-09-03  2020-09-03  57.82   -3.585126
2020-09-02  2020-09-02  59.97   -0.133222

现在，我保存了许多其他 csv 文件作为股票代码，我想使用这些代码中的每一个来执行上面相同的计算。最重要的是，我想为每个符号 returns.

的最佳日期打印一份报告

如果需要更多详细信息，请告诉我。

提前致谢！

Answer 1

我认为您的数据的最佳选择是将文件读入数据帧字典。
- 使用 pathlib 和 .glob 创建所有文件的列表
- 使用字典理解来创建数据框字典。
字典可以用字典的标准方式迭代，dict.items()。
df_dict[k] 寻址每个数据帧，其中 k 是字典键，即文件名。
根据你上次，我希望 .csv 文件用一列 Date 读入，而不是两列。
将Date设置为索引后，每个文件的数值数据应在索引0的列中。
- 由于每个文件的列名不同，所以最好使用.iloc来寻址该列。
- : 表示所有行，0 是数字数据的列索引。
df_dict.keys() 将 return 所有键的列表
使用 df_dict[key].

import pandas as pd
from pathlib import Path

# create the path to the files
p = Path('c:/Users/<<user_name>>/Documents/stock_files')

# get all the files
files = p.glob('*.csv')

# created the dict of dataframes
df_dict = {f.stem: pd.read_csv(f, parse_dates=['Date'], index_col='Date') for f in files}

# apply calculations to each dataframe and update the dataframe
# since the stock data is in column 0 of each dataframe, use .iloc
for k, df in df_dict.items():
    df_dict[k]['Return %'] = df.iloc[:, 0].pct_change(-1)*100

如何将多个文件作为单独的数据帧读取并在列上执行计算？

How to read in multiple files as separate dataframes and perform calculations on a column?

python

loops

report

pandas