如何解决 IndexError 以及如何将 for 循环中计算的 3 个数据保存到 array/.csv？

Question

我使用 pandas 导入了一个文件。数据如下：

我编码获取 'open' 的数据，从每年的第一天保存为 start_open 到每年的最后一天保存为 end_open 27 年。我的代码如下：

import pandas as pd
df = pd.read_csv(r'C:\Users\Shivank Chadda\Desktop\Data Analysis\BATS_SPY, 1D.csv')
df['time'] = pd.to_datetime(df['time'],unit='s').dt.normalize()
df['year'] = pd.DatetimeIndex(df['time']).year
sub_df=df[['year','open']]
n=1993
for i in sub_df['year']:
      sub_93 = sub_df[(sub_df['year']==n) & (sub_df['year']<2022)]
      start_open=sub_93.iloc[0]['open']
      end_open=sub_93.iloc[-1]['open']
      per= ((end_open-start_open)/start_open)*100
      print('The value at the start of the year',n,'is:',start_open,'\nThe value at the end of year',n,' is:',end_open)
      n+=1
      i+=1

代码打印如下

The value at the start of the year 1993 is: 43.9688 

The value at the end of year 1993  is: 46.9375

The value at the start of the year 1994 is: 46.59375 

The value at the end of year 1994  is: 46.20312

The value at the start of the year 1995 is: 45.70312 

The value at the end of year 1995  is: 61.46875

The value at the start of the year 1996 is: 61.40625 

The value at the end of year 1996  is: 75.28125

The value at the start of the year 1997 is: 74.375 

The value at the end of year 1997  is: 96.875

（持续到 2021 年）

出现以下错误：

  File "C:\Users\Shivank Chadda\Desktop\Data Analysis\untitled7.py", line 16, in <module>
    start_open=sub_93.iloc[0]['open']

  File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 879, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)

  File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1496, in _getitem_axis
    self._validate_integer(key, axis)

  File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1437, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")

IndexError: single positional indexer is out-of-bounds

我有两个问题

(1) 我该如何解决这个错误？

(2) 我想得到一个包含年份、start_open、end_open 和百分比的数组，而不是在句子中打印。如果可能的话，我想制作一个收集到的数据的 .csv。

请告诉我下一步应该做什么

Answer 1

我无法测试它，但错误显示

中的 IndexError 有问题

start_open = sub_93.iloc[0]['open']

所以你可能得到空 sub_93 而它没有 [0]（和 [-1]）。

你应该检查它并跳过计算

  sub_93 = sub_df[(sub_df['year'] == n) & (sub_df['year'] < 2022)]

  if len(sub_93) == 0:
      print('No data for year', n)
  else:
      start_open = sub_93.iloc[0]['open']
      end_open = sub_93.iloc[-1]['open']
      per = ((end_open-start_open)/start_open)*100
      print('The value at the start of the year', n, 'is:', start_open, '\nThe value at the end of year', n,'is:', end_open)

  n += 1

编辑：

第二个问题——创建列表——看起来很简单，所以我什至没有考虑它。

之前for-循环创建列表results = [].
内部for-循环追加值results.append([year, start_open, end_open, percentage])

你会得到包含子列表的列表。

您可以将其转换为pandas.DataFrame并保存为CSV

# - before `for`-loop -

results = []

# - `for`-loop -

for i in sub_df['year']:
    # ... code ...

    results.append( [year, start_open, end_open, percentage] )

# - after `for`-loop -

df_results = pd.DataFrame(results, header=["Year", "Start", "End", "Percentage"])

#df_results.to_csv("output.csv", index=False)
df_results.to_csv("output.csv")

如何解决 IndexError 以及如何将 for 循环中计算的 3 个数据保存到 array/.csv？

How to resolve IndexError and how to save 3 data computed in for loop to array/.csv?

python

arrays

numpy

data-analysis

pandas