Python: pandas 调试器中 {Series}0 的数据帧和含义

Question

我在 Python 2.7 中使用 pandas 并读取这样的 csv 文件：

import pandas as pd

df = pd.read_csv("test_file.csv")

df 有一个标题为 rating 的列，还有一个标题为 'review' 的列，我对 df 做了一些操作，例如：

df3 = df[df['rating'] != 3]

现在，如果我在调试器中查看 df['review'] 和 df3['review']，我会看到以下信息：

df['review'] = {Series}0
df3['review'] = {Series}1

此外，如果我想查看 df['review'] 的第一个元素，我会使用：

df['review'][0]

很好，但是如果我对 df3 执行相同的操作，我会收到此错误：

df3['review'][0]
{KeyError}0L

不过，看起来我可以这样做：

df3['review'][1]

有人可以解释一下区别吗？

Answer 1

在 Series 上使用整数编制索引与列表不同。特别是，df['review'][0] 不获取 "review" 列的第一个元素，它获取索引为 0 的元素：

In [4]: s = pd.Series(['a', 'b', 'c', 'd'], index=[1, 0, 2, 3])

In [5]: s
Out[5]:
1    a
0    b
2    c
3    d
dtype: object

In [6]: s[0]
Out[6]: 'b'

据推测，在生成 df3 时，您删除了索引为 0 的行。如果您实际上想要获取第一个元素而不考虑索引，请使用 iloc:

In [7]: s.iloc[0]
Out[7]: 'a'

Python: pandas 调试器中 {Series}0 的数据帧和含义

Python: pandas Data Frame and meaning of {Series}0 in debugger

python

series

python-2.7

pandas