iloc 和 loc 有何不同？

Question

谁能解释一下这两种切片方法有何不同？
我看过 the docs, 我也看过 answers，但我仍然无法理解这三者有何不同。对我来说，它们在很大程度上似乎可以互换，因为它们处于较低的切片级别。

例如，假设我们要获取 DataFrame 的前五行。这两个效果如何？

df.loc[:5]
df.iloc[:5]

谁能举出三种用法区别更清楚的案例？

曾几何时，我也想知道这两个函数与df.ix[:5]有何不同，但ix已从pandas 1.0中删除，所以我不再关心了.

Answer 1

标签与位置

两种方法的主要区别是：

loc 获取具有特定 标签的行（and/or 列）.
iloc 在整数位置 .
获取行（and/or 列）

为了演示，考虑具有非单调整数索引的一系列 s 个字符：

>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2]) 
49    a
48    b
47    c
0     d
1     e
2     f

>>> s.loc[0]    # value at index label 0
'd'

>>> s.iloc[0]   # value at index location 0
'a'

>>> s.loc[0:1]  # rows at index labels between 0 and 1 (inclusive)
0    d
1    e

>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)
49    a

以下是传递各种对象时 s.loc 和 s.iloc 之间的一些 differences/similarities：

<object>	description	`s.loc[<object>]`	`s.iloc[<object>]`
`0`	single item	Value at index label `0` (the string `'d'`)	Value at index location 0 (the string `'a'`)
`0:1`	slice	Two rows (labels `0` and `1`)	One row (first row at location 0)
`1:47`	slice with out-of-bounds end	Zero rows (empty Series)	Five rows (location 1 onwards)
`1:47:-1`	slice with negative step	three rows (labels `1` back to `47`)	Zero rows (empty Series)
`[2, 0]`	integer list	Two rows with given labels	Two rows with given locations
`s > 'e'`	Bool series (indicating which values have the property)	One row (containing `'f'`)	`NotImplementedError`
`(s>'e').values`	Bool array	One row (containing `'f'`)	Same as `loc`
`999`	int object not in index	`KeyError`	`IndexError` (out of bounds)
`-1`	int object not in index	`KeyError`	Returns last value in `s`
`lambda x: x.index[3]`	callable applied to series (here returning 3^rd item in index)	`s.loc[s.index[3]]`	`s.iloc[s.index[3]]`

loc 的标签查询功能远远超出了整数索引，值得强调几个额外的示例。

这是一个系列，其中索引包含字符串对象：

>>> s2 = pd.Series(s.index, index=s.values)
>>> s2
a    49
b    48
c    47
d     0
e     1
f     2

由于 loc 是基于标签的，它可以使用 s2.loc['a'] 获取系列中的第一个值。它还可以对非整数对象进行切片：

>>> s2.loc['c':'e']  # all rows lying between 'c' and 'e' (inclusive)
c    47
d     0
e     1

对于 DateTime 索引，我们不需要传递准确的 date/time 来按标签获取。例如：

>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M')) 
>>> s3
2021-01-31 16:41:31.879768    a
2021-02-28 16:41:31.879768    b
2021-03-31 16:41:31.879768    c
2021-04-30 16:41:31.879768    d
2021-05-31 16:41:31.879768    e

然后要获取 March/April 2021 年的行，我们只需要：

>>> s3.loc['2021-03':'2021-04']
2021-03-31 17:04:30.742316    c
2021-04-30 17:04:30.742316    d

行和列

loc 和 iloc 对 DataFrames 的处理方式与对 Series 的处理方式相同。值得注意的是，这两种方法都可以同时处理列和行。

当给定一个元组时，第一个元素用于索引行，如果存在，第二个元素用于索引列。

考虑下面定义的 DataFrame：

>>> import numpy as np 
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),  
                      index=list('abcde'), 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24

然后例如：

>>> df.loc['c': , :'z']  # rows 'c' and onwards AND columns up to 'z'
    x   y   z
c  10  11  12
d  15  16  17
e  20  21  22

>>> df.iloc[:, 3]        # all rows, but only the column at index location 3
a     3
b     8
c    13
d    18
e    23

有时我们想为行和列混合标签和位置索引方法，以某种方式结合 loc 和 iloc 的功能。

例如，考虑以下 DataFrame。如何最好地将包含 'c' 和的行分割为前四列？

>>> import numpy as np 
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),  
                      index=list('abcde'), 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24

我们可以使用 iloc 和另一种方法的帮助来实现这个结果：

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a   0   1   2   3
b   5   6   7   8
c  10  11  12  13

get_loc()是一个索引方法，意思是“获取标签在这个索引中的位置”。请注意，由于使用 iloc 进行切片不包括其端点，因此如果我们还需要行 'c'，则必须将此值加 1。

Answer 2

iloc 基于整数定位工作。所以无论你的行标签是什么，你总是可以，例如，通过

获得第一行

df.iloc[0]

或最后五行

df.iloc[-5:]

您也可以在柱子上使用它。这将检索第 3 列：

df.iloc[:, 2]    # the : in the first position indicates all rows

您可以组合它们以获得行和列的交集：

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

另一方面，.loc 使用命名索引。让我们设置一个以字符串作为行和列标签的数据框：

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

然后我们可以通过

得到第一行

df.loc['a']     # equivalent to df.iloc[0]

和 'date' 列的后两行

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

等等。现在，可能值得指出的是 DataFrame 的默认行和列索引是从 0 开始的整数，在这种情况下 iloc 和 loc 将以相同的方式工作。这就是为什么你的三个例子是等价的。 如果您有非数字索引，例如字符串或日期时间， df.loc[:5] 会引发错误。

此外，您可以仅使用数据框的 __getitem__:

进行列检索

df['time']    # equivalent to df.loc[:, 'time']

现在假设您想混合使用位置索引和命名索引，即使用行上的名称和列上的位置进行索引（澄清一下，我的意思是 select 来自我们的数据框，而不是创建数据框在行索引中使用字符串，在列索引中使用整数）。这就是 .ix 的用武之地：

df.ix[:2, 'time']    # the first two rows of the 'time' column

我认为还值得一提的是，您也可以将布尔向量传递给 loc 方法。例如：

 b = [True, False, True]
 df.loc[b]

将return第1行和第3行df。这等效于 selection 的 df[b]，但它也可用于通过布尔向量进行分配：

df.loc[b, 'name'] = 'Mary', 'John'

Answer 3

在我看来，接受的答案令人困惑，因为它使用的 DataFrame 只有缺失值。我也不喜欢 .iloc 的 position-based 一词，相反，我更喜欢 integer location 因为它更具描述性和精确性.iloc 代表什么。关键字是 INTEGER - .iloc 需要 INTEGERS。

有关子集 selection 的详细信息，请参阅我的 blog series

.ix 已弃用且不明确，永远不应使用

因为 .ix 已弃用，我们将只关注 .loc 和 .iloc 之间的差异。

在我们讨论差异之前，重要的是要了解 DataFrame 具有有助于识别每一列和每个索引的标签。让我们看一个示例 DataFrame：

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

粗体中的所有单词都是标签。标签 age、color、food、height、score 和 state 用于列。其他标签 Jane、Nick、Aaron、Penelope、Dean、Christina、Cornelia 用于索引.

select DataFrame 中特定行的主要方法是使用 .loc 和 .iloc 索引器。这些索引器中的每一个也可以同时用于 select 列，但现在只关注行更容易。此外，每个索引器都使用一组紧跟其名称的括号来构成其 selections.

.loc selects data only by labels

我们将首先讨论 .loc 索引器，它只有 select 索引或列标签的数据。在我们的示例 DataFrame 中，我们提供了有意义的名称作为索引的值。许多 DataFrame 没有任何有意义的名称，而是默认为 0 到 n-1 之间的整数，其中 n 是 DataFrame 的长度。

您可以使用三种不同的输入 .loc

一个字符串
字符串列表
使用字符串作为起始值和终止值的切片符号

使用带有字符串的 .loc 选择单行

对于select单行数据，将索引标签放在.loc.

后面的括号内

df.loc['Penelope']

这 return 将数据行作为一个系列

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

使用带有字符串列表的 .loc 选择多行

df.loc[['Cornelia', 'Jane', 'Dean']]

这 return 是一个 DataFrame，其中的行按列表中指定的顺序排列：

使用带有切片符号的 .loc 选择多行

切片符号由开始值、停止值和步长值定义。按标签切片时，pandas 将停止值包含在 return 中。以下是从 Aaron 到 Dean 的片段，包括在内。它的步长没有明确定义，但默认为 1。

df.loc['Aaron':'Dean']

可以采用与 Python 列表相同的方式获取复杂切片。

.iloc selects data only by integer location

现在让我们转向.iloc。 DataFrame 中的每一行和每一列数据都有一个定义它的整数位置。 这是对输出中直观显示的标签的补充。整数位置只是 rows/columns 从 top/left 从 0 开始的 rows/columns 的数量。

您可以使用三种不同的输入 .iloc

一个整数
整数列表
使用整数作为起始值和终止值的切片表示法

使用带整数的 .iloc 选择单行

df.iloc[4]

这 return 作为系列的第 5 行（整数位置 4）

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

使用带有整数列表的 .iloc 选择多行

df.iloc[[2, -2]]

这return是倒数第三行和倒数第二行的数据帧：

使用带切片符号的 .iloc 选择多行

df.iloc[:5:3]

同时 select行和列与 .loc 和 .iloc

.loc/.iloc 两者的一项出色能力是它们能够同时 select 行和列。在上面的示例中，所有列都是从每个 select 离子中 return 编辑的。我们可以选择具有与行相同输入类型的列。我们只需要用逗号.

分隔行和列 selection

例如，我们可以 select 行 Jane 和 Dean，只有列的高度、分数和状态如下：

df.loc[['Jane', 'Dean'], 'height':]

这对行使用标签列表，对列使用切片符号

我们自然可以只使用整数对 .iloc 进行类似的操作。

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

同时select带有标签和整数位置的离子

.ix 用于同时生成带有标签和整数位置的 select 离子，这很有用，但有时令人困惑和模棱两可，幸好它已被弃用。如果您需要制作一个混合了标签和整数位置的 selection，则必须同时制作 selections 标签或整数位置。

例如，如果我们想要 select 行 Nick 和 Cornelia 以及第 2 列和第 4 列，我们可以通过将整数转换为标签来使用 .loc具有以下内容：

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names]

或者，使用 get_loc 索引方法将索引标签转换为整数。

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

布尔选择

.loc 索引器也可以做布尔值 selection。例如，如果我们有兴趣查找年龄大于 30 的所有行并且 return 只是 food 和 score 列，我们可以执行以下操作：

df.loc[df['age'] > 30, ['food', 'score']]

您可以使用 .iloc 复制它，但不能将其传递给布尔系列。您必须像这样将布尔系列转换为 numpy 数组：

df.iloc[(df['age'] > 30).values, [2, 4]]

选择所有行

可以仅对 selection 列使用 .loc/.iloc。您可以 select 所有行使用冒号，如下所示：

df.loc[:, 'color':'score':2]

索引运算符 `[]` 也可以 select 行和列，但不能同时。

大多数人都熟悉 DataFrame 索引运算符的主要用途，即 select 列。一个字符串 select 是作为 Series 的单列，字符串列表 select 是作为 DataFrame 的多列。

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

使用列表selects 多列

df[['food', 'score']]

人们不太熟悉的是，当使用切片符号时，select离子按行标签或整数位置发生。这非常令人困惑，我几乎从不使用它，但它确实有效。

df['Penelope':'Christina'] # slice rows by label

df[2:6:2] # slice rows by integer location

.loc/.iloc 对 select 行的明确性是高度优先的。仅索引运算符无法同时 select 行和列。

df[3:5, 'color']
TypeError: unhashable type: 'slice'

Answer 4

DataFrame.loc() : Select 索引值行
DataFrame.iloc() : Select 行按行数

示例：

Select table 的前 5 行，df1 是您的数据框

df1.iloc[:5]

Select 第一个 A, B 行 table, df1 是你的数据帧

df1.loc['A','B']

Answer 5

.loc 和 .iloc 用于索引，即提取部分数据。本质上，区别在于 .loc 允许基于标签的索引，而 .iloc 允许基于位置的索引。

如果您对 .loc 和 .iloc 感到困惑，请记住 .iloc 是基于索引（从 i )位置，而.loc是基于标签（从l开始）。

`.loc`

.loc 应该是基于索引标签而不是位置，所以它类似于 Python 基于字典的索引。但是，它可以接受布尔数组、切片和标签列表（none 其中与 Python 字典一起使用）。

`iloc`

.iloc 根据索引位置进行查找，即 pandas 的行为类似于 Python 列表。如果该位置没有索引，pandas 将引发 IndexError。

例子

以下示例用于说明.iloc 和.loc 之间的区别。让我们考虑以下系列：

>>> s = pd.Series([11, 9], index=["1990", "1993"], name="Magic Numbers")
>>> s
1990    11
1993     9
Name: Magic Numbers , dtype: int64

.iloc 例子

>>> s.iloc[0]
11
>>> s.iloc[-1]
9
>>> s.iloc[4]
Traceback (most recent call last):
    ...
IndexError: single positional indexer is out-of-bounds
>>> s.iloc[0:3] # slice
1990 11
1993  9
Name: Magic Numbers , dtype: int64
>>> s.iloc[[0,1]] # list
1990 11
1993  9
Name: Magic Numbers , dtype: int64

.loc 例子

>>> s.loc['1990']
11
>>> s.loc['1970']
Traceback (most recent call last):
    ...
KeyError: ’the label [1970] is not in the [index]’
>>> mask = s > 9
>>> s.loc[mask]
1990 11
Name: Magic Numbers , dtype: int64
>>> s.loc['1990':] # slice
1990    11
1993     9
Name: Magic Numbers, dtype: int64

因为s有字符串索引值，.loc会失败用整数索引：

>>> s.loc[0]
Traceback (most recent call last):
    ...
KeyError: 0

Answer 6

这个例子将说明区别：

df = pd.DataFrame({'col1': [1,2,3,4,5], 'col2': ["foo", "bar", "baz", "foobar", "foobaz"]})
  col1  col2
0   1   foo
1   2   bar
2   3   baz
3   4   foobar
4   5   foobaz

df = df.sort_values('col1', ascending = False)
      col1  col2
    4   5   foobaz
    3   4   foobar
    2   3   baz
    1   2   bar
    0   1   foo

基于索引的访问：

df.iloc[0, 0:2]
col1         5
col2    foobaz
Name: 4, dtype: object

我们得到了排序数据帧的第一行。（这不是索引为 0 的行，而是索引为 4 的行）。

基于位置的访问：

df.loc[0, 'col1':'col2']
col1      1
col2    foo
Name: 0, dtype: object

我们得到索引为 0 的行，即使 df 已排序。

iloc 和 loc 有何不同？

How are iloc and loc different?

python

indexing

dataframe

pandas

pandas-loc

标签与位置

行和列

.ix 已弃用且不明确，永远不应使用

.loc selects data only by labels

.iloc selects data only by integer location

同时 select行和列与 .loc 和 .iloc

同时select带有标签和整数位置的离子

布尔选择

选择所有行

索引运算符 `[]` 也可以 select 行和列，但不能同时。

`.loc`

`iloc`

例子

iloc 和 loc 有何不同？

How are iloc and loc different?

python

indexing

dataframe

pandas

pandas-loc

标签与位置

行和列

.ix 已弃用且不明确，永远不应使用

.loc selects data only by labels

.iloc selects data only by integer location

同时 select行和列与 .loc 和 .iloc

同时select带有标签和整数位置的离子

布尔选择

选择所有行

索引运算符 [] 也可以 select 行和列，但不能同时。

.loc

iloc

例子

索引运算符 `[]` 也可以 select 行和列，但不能同时。

`.loc`

`iloc`