pandas 数据帧的屏蔽（过滤）速度太慢

Question

我有一个数据框，它有大约 19000 行和 3 列（X、Y、Z），我正在尝试屏蔽数据框，以便我有 X_max>X>=[= 的数据30=]、Y_max>Y>Y_min 和 Z_max>Z>Z_min.

在这个例子中，

df['X'] is 0.0, 0.1, 0.2, 0.3, ..., 5.0
df['Y'] is -3.0, -2.9, -2.8, ..., 3.0
df['Z'] is -2.0, -1.9, ..., -1.5

所以，行数是51 * 61 * 6 = 18666

当我创建屏蔽条件时，大约需要 1 秒。

cond1 = df['X']>=X_min

我有如下6个条件，创建6个条件大约需要3-3.5秒。

start1 = time()
cond1 = df['X']>=X_min
cond2 = df['X']>=X_max
cond3 = df['X']>=Y_min
cond4 = df['X']>=Y_max
cond5 = df['X']>=Z_min
cond6 = df['X']>=Z_max
finish1 = time()
print(finish1 - start1)  # this is about 3-3.5 sec

start2 = time()
df2= df[conjunction(cond1&cond2&cond3&cond4&cond5&cond6)] does not take long.
finish2 = time()
print(finish2 - start2)  # this is about 0.002 sec

顺便说一句，下面的代码花费了类似的时间（3-3.5 秒）。

df2 = df[(df['X']>=X_min)&(df['X']<X_max)&(df['Y']>=Y_min)&(df['Y']<Y_max)&(df['Z']>=Z_min)&(df['Z']<Z_max)]

我怎样才能提高速度？我可以通过保留 pandas 数据帧使其更快吗？

Answer 1

Pandas .query 往往比通常的索引更快。

Answer 2

您可能需要运行 df.info() 仔细检查列的数据类型。数值比较应该快得多。如果列是字符串，速度会慢得多。

pandas 数据帧的屏蔽（过滤）速度太慢

The masking (filtering) of pandas dataframe is too slow

python

filtering

masking

dataframe

pandas