分区数据集时出错
Error on partitioning the data set
我想将数据集划分为训练集和交叉验证集。我就是这样做的。 train
是 pandas DataFrame。
import numpy as np
#...
features = ['season', 'holiday', 'workingday', 'weather',
'temp', 'atemp', 'humidity', 'windspeed', 'year',
'month', 'weekday', 'hour']
train = pd.read_csv('data/train.csv', parse_dates=[0])
np.random.shuffle(train)
training, crossvalidation = train[:0.8*len(train),features], train[0.8*len(train):,features]
此代码给出以下错误:
Traceback (most recent call last):
File "D:/Web/PyCharm/linear_regression.py", line 47, in <module>
np.random.shuffle(train)
File "mtrand.pyx", line 4607, in mtrand.RandomState.shuffle (numpy\random\mtrand\mtrand.c:25420)
File "mtrand.pyx", line 4610, in mtrand.RandomState.shuffle (numpy\random\mtrand\mtrand.c:25361)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1791, in __getitem__
return self._getitem_column(key)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1798, in _getitem_column
return self._get_item_cache(key)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 2851, in get
loc = self.items.get_loc(item)
File "C:\Python27\lib\site-packages\pandas\core\index.py", line 1578, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3811)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3691)
File "pandas\hashtable.pyx", line 697, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12336)
File "pandas\hashtable.pyx", line 705, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12287)
KeyError: 8953
这是train.head()
的结果
datetime season holiday workingday weather temp atemp \
0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395
1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635
2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635
3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395
4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395
humidity windspeed casual registered count year month hour weekday \
0 81 0 3 13 16 2011 1 0 5
1 80 0 8 32 40 2011 1 1 5
2 80 0 5 27 32 2011 1 2 5
3 75 0 3 10 13 2011 1 3 5
4 75 0 0 1 1 2011 1 4 5
log-casual log-registered log-count
0 1.386294 2.639057 2.833213
1 2.197225 3.496508 3.713572
2 1.791759 3.332205 3.496508
3 1.386294 2.397895 2.639057
4 0.000000 0.693147 0.693147
你的问题来自np.random.shuffle(train)
您需要 np.random.shuffle(train.values)
而不是
另一方面,您不能使用浮点数进行切片。你需要用 int.
我想将数据集划分为训练集和交叉验证集。我就是这样做的。 train
是 pandas DataFrame。
import numpy as np
#...
features = ['season', 'holiday', 'workingday', 'weather',
'temp', 'atemp', 'humidity', 'windspeed', 'year',
'month', 'weekday', 'hour']
train = pd.read_csv('data/train.csv', parse_dates=[0])
np.random.shuffle(train)
training, crossvalidation = train[:0.8*len(train),features], train[0.8*len(train):,features]
此代码给出以下错误:
Traceback (most recent call last):
File "D:/Web/PyCharm/linear_regression.py", line 47, in <module>
np.random.shuffle(train)
File "mtrand.pyx", line 4607, in mtrand.RandomState.shuffle (numpy\random\mtrand\mtrand.c:25420)
File "mtrand.pyx", line 4610, in mtrand.RandomState.shuffle (numpy\random\mtrand\mtrand.c:25361)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1791, in __getitem__
return self._getitem_column(key)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1798, in _getitem_column
return self._get_item_cache(key)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 2851, in get
loc = self.items.get_loc(item)
File "C:\Python27\lib\site-packages\pandas\core\index.py", line 1578, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3811)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3691)
File "pandas\hashtable.pyx", line 697, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12336)
File "pandas\hashtable.pyx", line 705, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12287)
KeyError: 8953
这是train.head()
datetime season holiday workingday weather temp atemp \
0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395
1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635
2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635
3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395
4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395
humidity windspeed casual registered count year month hour weekday \
0 81 0 3 13 16 2011 1 0 5
1 80 0 8 32 40 2011 1 1 5
2 80 0 5 27 32 2011 1 2 5
3 75 0 3 10 13 2011 1 3 5
4 75 0 0 1 1 2011 1 4 5
log-casual log-registered log-count
0 1.386294 2.639057 2.833213
1 2.197225 3.496508 3.713572
2 1.791759 3.332205 3.496508
3 1.386294 2.397895 2.639057
4 0.000000 0.693147 0.693147
你的问题来自np.random.shuffle(train)
您需要 np.random.shuffle(train.values)
而不是
另一方面,您不能使用浮点数进行切片。你需要用 int.