Pandas SetIndex 和 DatetimeIndex

Question

我有一个包含以下内容的 csv 文件

Symbol, Date, Unix_Tick, OpenPrice, HighPrice, LowPrice, ClosePrice, volume,
AAPL, 2021-01-04 09:00:00, 1609750800, 133.31, 133.49, 133.02, 133.49, 25000
AAPL, 2021-01-04 09:01:00, 1609750860, 133.49, 133.49, 133.49, 133.49, 700
AAPL, 2021-01-04 09:02:00, 1609750920, 133.6, 133.6, 133.5, 133.5, 500

所以我尝试像这样使用 Date 创建一个 pandas 索引

import pandas as pd
import numpy as np

df = pd.read_csv(csvFile)
df = df.set_index(pd.DatetimeIndex(df["Date"]))

我得到 KeyError：'Date'

Answer 1

问题很可能在 , 之后的 space 中。您可以尝试使用自定义 sep= 参数加载数据：

df = pd.read_csv("a1.txt", sep=r",\s+", engine="python")
df = df.set_index(pd.DatetimeIndex(df["Date"]))
print(df)

打印：

                    Symbol                 Date   Unix_Tick  OpenPrice  HighPrice  LowPrice  ClosePrice  volume,
Date                                                                                                            
2021-01-04 09:00:00   AAPL  2021-01-04 09:00:00  1609750800     133.31     133.49    133.02      133.49    25000
2021-01-04 09:01:00   AAPL  2021-01-04 09:01:00  1609750860     133.49     133.49    133.49      133.49      700
2021-01-04 09:02:00   AAPL  2021-01-04 09:02:00  1609750920     133.60     133.60    133.50      133.50      500

Answer 2

因为文件不是严格的逗号分隔文件，而是逗号加 space 分隔的文件。

您可以 strip 要删除的列名称 spaces:

df = pd.read_csv(csvFile)

df.columns = df.columns.str.strip()

df = df.set_index(pd.DatetimeIndex(df["Date"]))

或读取带分隔符 ", ":

的 CSV 文件

df = pd.read_csv(csvFile, sep=", ")

df = df.set_index(pd.DatetimeIndex(df["Date"]))

Pandas SetIndex 和 DatetimeIndex

Pandas SetIndex with DatetimeIndex

pandas

datetimeindex