使用 pandas inplace 关键字参数的指南

Question

inplace 的使用指南是什么？

例如，

df = df.reset_index()

或

df.reset_index(inplace=True)

相同但不同？

Answer 1

就生成的DataFrame df而言，这两种方法是相同的。区别在于（最大）内存使用量，因为就地版本不会创建 DataFrame 的副本。

考虑这个设置：

import numpy as np
import pandas as pd

def make_data():
    return pd.DataFrame(np.random.rand(1000000, 100))

def func_copy():
    df = make_data()
    df = df.reset_index()
    
def func_inplace():
    df = make_data()
    df.reset_index(inplace=True)

我们可以使用 memory_profiler 库对内存使用情况执行一些基准测试：

%load_ext memory_profiler

%memit func_copy()
# peak memory: 1602.66 MiB, increment: 1548.66 MiB

%memit func_inplace()
# peak memory: 817.02 MiB, increment: 762.94 MiB

正如预期的那样，就地版本的内存效率更高。

另一方面，当数据量足够大时（例如在上面的示例中），两种方法之间的运行宁时间似乎也存在不小的差异：

%timeit func_copy()
1 loops, best of 3: 2.56 s per loop

%timeit func_inplace()
1 loops, best of 3: 1.35 s per loop

根据用例（例如临时探索性分析与生产代码）、数据大小和可用的硬件资源，这些差异可能显着也可能不显着。一般来说，为了更好的内存和运行时间效率，尽可能使用就地版本可能是个好主意。

使用 pandas inplace 关键字参数的指南

guidelines on using pandas inplace keyword argument

in-place

python-2.7

pandas