"ValueError: labels ['timestamp'] not contained in axis" error

Question

我有这段代码，我想从文件中删除列 'timestamp' :u.data 但不能't.It 显示错误
"ValueError: labels ['timestamp'] not contained in axis" 我该如何更正它

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
plt.rc("font", size=14)
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.cross_validation import KFold
from sklearn.cross_validation import train_test_split



data = pd.read_table('u.data')
data.columns=['userID', 'itemID','rating', 'timestamp']
data.drop('timestamp', axis=1)


N = len(data)
print data.shape
print list(data.columns)
print data.head(10)

Answer 1

"ValueError: labels ['timestamp'] not contained in axis"

文件中没有 header，因此您加载它的方式是 df，其中列名是数据的第一行。您试图访问不存在的列 timestamp。

您的 u.data 中没有 header

$head u.data                   
196 242 3   881250949
186 302 3   891717742

因此，除非添加 header，否则无法使用列名。您可以将 header 添加到文件 u.data，例如我在文本编辑器中打开它并在它的顶部添加了 a b c timestamp 行（这似乎是一个 tab-separated 文件，所以添加 header 时要小心不要使用空格，否则会破坏格式)

$head u.data                   
a   b   c   timestamp
196 242 3   881250949
186 302 3   891717742

现在你的代码可以工作了 data.columns returns

Index([u'a', u'b', u'c', u'timestamp'], dtype='object')

你的工作代码的其余部分现在是

(100000, 4) # the shape
['a', 'b', 'c', 'timestamp'] # the columns
     a    b  c  timestamp # the df
0  196  242  3  881250949
1  186  302  3  891717742
2   22  377  1  878887116
3  244   51  2  880606923
4  166  346  1  886397596
5  298  474  4  884182806
6  115  265  2  881171488
7  253  465  5  891628467
8  305  451  3  886324817
9    6   86  3  883603013

如果不想加headers

或者您可以使用它的索引（大概是 3）删除列 'timestamp'，我们可以使用它下面的 df.ix 来选择所有行，列索引 0 到索引 2，从而删除索引为 3

的列

data.ix[:, 0:2]

Answer 2

我会这样做：

data = pd.read_table('u.data', header=None,
                     names=['userID', 'itemID','rating', 'timestamp'],
                     usecols=['userID', 'itemID','rating']
)

检查：

In [589]: data.head()
Out[589]:
   userID  itemID  rating
0     196     242       3
1     186     302       3
2      22     377       1
3     244      51       2
4     166     346       1

Answer 3

人们面临且未被注意到的最大问题之一是，在 u.data 文件中插入 headers 时，分隔应与一行数据之间的分隔完全相同。例如，如果使用制表符分隔元组，则不应使用空格。

在您的 u.data 文件中添加 headers 并使用尽可能多的空格分隔它们在一行的项目之间使用。 PS：使用sublime text，notepad/notepad++有时不起作用。

"ValueError: labels ['timestamp'] not contained in axis" error

"ValueError: labels ['timestamp'] not contained in axis" error

python

recommendation-engine

machine-learning

pandas

data-science