需要从 python 中提取或删除列

Need to extract or remove columns from python

我有一个如下所示的列表:

    categorical_features = \
    ['FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond', 
     'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1', 
     'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
     'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallQual',
     'OverallCond', 'YrSold', 'MoSold']

我需要通过这样做从数据集中删除这些列:

all_data = all_data.loc[:,categorical_features]

不幸的是,这一步只选择了这些列。我如何通过排除它们来逆转这个过程?

我建议你计算你想要的那一个,这样会更容易

categorical_features = \
    ['FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond', 
     'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1', 
     'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
     'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallQual',
     'OverallCond', 'YrSold', 'MoSold']

cols = set(df.columns).difference(categorical_features)

all_data = all_data.loc[:,cols]

您可以使用 pandas.drop 排除这些列:

all_data = all_data.drop(categorical_features, axis = 1)

看下面的例子作为测试:

import pandas as pd
import numpy as np

dates = pd.date_range('20130101', periods=6)

df = pd.DataFrame(np.random.randn(6, 4), index = dates, columns = list('ABCD'))

print(df)

features = ['B', 'D']
df = df.drop(features, axis = 1)

print(df)

输出:

                   A         B         C         D
2013-01-01  1.365473 -0.445448  0.244377  0.416889
2013-01-02 -0.307532  0.095569  1.356229 -0.306618
2013-01-03  0.971216  1.100189  0.932189  0.808151
2013-01-04 -0.030160 -0.796742 -0.383336 -0.409233
2013-01-05  0.006601  0.093678 -1.013768  1.439921
2013-01-06  0.560771 -0.452491  1.050500 -1.545958
                   A         C
2013-01-01  1.365473  0.244377
2013-01-02 -0.307532  1.356229
2013-01-03  0.971216  0.932189
2013-01-04 -0.030160 -0.383336
2013-01-05  0.006601 -1.013768
2013-01-06  0.560771  1.050500