子集 pandas df 使用列索引切片的串联

Question

我有一个大型数据框，我试图仅使用列索引对其进行子集化。我正在使用以下代码：

df = df.ix[:, [3,21:28,30:34,36:57,61:64,67:]]

代码很容易解释。我试图通过保留第 3、21 到 28 列等来对 df 进行子集化。但是，我收到以下错误：

  File "<ipython-input-44-3108b602b220>", line 1
  df = df.ix[:, [3,21:28,30:34,36:57,61:64,67:]]
                     ^
  SyntaxError: invalid syntax

我错过了什么？

Answer 1

使用numpy.r_[...]:

df = df.iloc[:, np.r_[3,21:28,30:34,36:57,61:64,67:df.shape[1]]]

演示：

In [39]: df = pd.DataFrame(np.random.randint(5, size=(2, 100)))

In [40]: df
Out[40]:
   0   1   2   3   4   5   6   7   8   9  ...  90  91  92  93  94  95  96  97  98  99
0   3   1   0   3   2   4   1   2   1   3 ...   2   1   4   2   1   2   1   3   3   4
1   0   2   4   1   1   1   0   0   3   4 ...   4   4   0   3   2   3   0   2   0   1

[2 rows x 100 columns]

In [41]: df.iloc[:, np.r_[3,21:28,30:34,36:57,61:64,67:df.shape[1]]]
Out[41]:
   3   21  22  23  24  25  26  27  30  31 ...  90  91  92  93  94  95  96  97  98  99
0   3   4   1   2   0   3   0   3   2   2 ...   2   1   4   2   1   2   1   3   3   4
1   1   1   0   2   1   4   4   4   1   3 ...   4   4   0   3   2   3   0   2   0   1

[2 rows x 69 columns]

子集 pandas df 使用列索引切片的串联

Subset pandas df using concatenation of column indices slices

python

subset

dataframe

indices

pandas