需要从 python 中提取或删除列
Need to extract or remove columns from python
我有一个如下所示的列表:
categorical_features = \
['FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond',
'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1',
'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallQual',
'OverallCond', 'YrSold', 'MoSold']
我需要通过这样做从数据集中删除这些列:
all_data = all_data.loc[:,categorical_features]
不幸的是,这一步只选择了这些列。我如何通过排除它们来逆转这个过程?
我建议你计算你想要的那一个,这样会更容易
categorical_features = \
['FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond',
'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1',
'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallQual',
'OverallCond', 'YrSold', 'MoSold']
cols = set(df.columns).difference(categorical_features)
all_data = all_data.loc[:,cols]
您可以使用 pandas.drop
排除这些列:
all_data = all_data.drop(categorical_features, axis = 1)
看下面的例子作为测试:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index = dates, columns = list('ABCD'))
print(df)
features = ['B', 'D']
df = df.drop(features, axis = 1)
print(df)
输出:
A B C D
2013-01-01 1.365473 -0.445448 0.244377 0.416889
2013-01-02 -0.307532 0.095569 1.356229 -0.306618
2013-01-03 0.971216 1.100189 0.932189 0.808151
2013-01-04 -0.030160 -0.796742 -0.383336 -0.409233
2013-01-05 0.006601 0.093678 -1.013768 1.439921
2013-01-06 0.560771 -0.452491 1.050500 -1.545958
A C
2013-01-01 1.365473 0.244377
2013-01-02 -0.307532 1.356229
2013-01-03 0.971216 0.932189
2013-01-04 -0.030160 -0.383336
2013-01-05 0.006601 -1.013768
2013-01-06 0.560771 1.050500
我有一个如下所示的列表:
categorical_features = \
['FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond',
'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1',
'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallQual',
'OverallCond', 'YrSold', 'MoSold']
我需要通过这样做从数据集中删除这些列:
all_data = all_data.loc[:,categorical_features]
不幸的是,这一步只选择了这些列。我如何通过排除它们来逆转这个过程?
我建议你计算你想要的那一个,这样会更容易
categorical_features = \
['FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond',
'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1',
'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallQual',
'OverallCond', 'YrSold', 'MoSold']
cols = set(df.columns).difference(categorical_features)
all_data = all_data.loc[:,cols]
您可以使用 pandas.drop
排除这些列:
all_data = all_data.drop(categorical_features, axis = 1)
看下面的例子作为测试:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index = dates, columns = list('ABCD'))
print(df)
features = ['B', 'D']
df = df.drop(features, axis = 1)
print(df)
输出:
A B C D
2013-01-01 1.365473 -0.445448 0.244377 0.416889
2013-01-02 -0.307532 0.095569 1.356229 -0.306618
2013-01-03 0.971216 1.100189 0.932189 0.808151
2013-01-04 -0.030160 -0.796742 -0.383336 -0.409233
2013-01-05 0.006601 0.093678 -1.013768 1.439921
2013-01-06 0.560771 -0.452491 1.050500 -1.545958
A C
2013-01-01 1.365473 0.244377
2013-01-02 -0.307532 1.356229
2013-01-03 0.971216 0.932189
2013-01-04 -0.030160 -0.383336
2013-01-05 0.006601 -1.013768
2013-01-06 0.560771 1.050500