ValueError : when array length doesn't match index length. How to debug this?
ValueError : when array length doesn't match index length. How to debug this?
所以我开始使用 Kaggle 并且我正在做预测谁在泰坦尼克号坠毁中幸存下来以及谁没有幸存的指导性任务。
我按照要求做了所有事情。
所以我的最后一个代码单元看起来像这样
from sklearn.ensemble import RandomForestClassifier
y = train_data['Survived']
features = ["Pclass","Sex","SibSp","Parch"]
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(train_data[features])
model = RandomForestClassifier(n_estimators=1,max_depth=5,random_state=1)
model.fit(X,y)
predictions = model.predict(X_test)
output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
output.to_csv('my_submission.csv', index=False)
print("Your submission was successfully saved!")
编译后显示以下错误:
ValueError Traceback (most recent call last)
<ipython-input-24-7d2fc2ea2973> in <module>
11
12
---> 13 output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
14 output.to_csv('my_submission.csv', index=False)
15 print("Your submission was successfully saved!")
/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
433 )
434 elif isinstance(data, dict):
--> 435 mgr = init_dict(data, index, columns, dtype=dtype)
436 elif isinstance(data, ma.MaskedArray):
437 import numpy.ma.mrecords as mrecords
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
252 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
253 ]
--> 254 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
255
256
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
62 # figure out the index, if necessary
63 if index is None:
---> 64 index = extract_index(arrays)
65 else:
66 index = ensure_index(index)
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/construction.py in extract_index(data)
376 f"length {len(index)}"
377 )
--> 378 raise ValueError(msg)
379 else:
380 index = ibase.default_index(lengths[0])
ValueError: array length 891 does not match index length 418
但是,我无法调试我的错误到底是什么,有人可以帮忙吗?谢谢。
您构建 X_test 数据框的方式不正确,一旦您考虑 train_data,而不是 test_data。在创建输出文件时,这会在 test_data.PassengerId 和 predictions 的大小之间造成不匹配。
更正以下行,它将起作用:
X_test = pd.get_dummies(test_data[features])
纠正 x_test 赋值,将 x_train 替换为 x_test。
所以我开始使用 Kaggle 并且我正在做预测谁在泰坦尼克号坠毁中幸存下来以及谁没有幸存的指导性任务。
我按照要求做了所有事情。
所以我的最后一个代码单元看起来像这样
from sklearn.ensemble import RandomForestClassifier
y = train_data['Survived']
features = ["Pclass","Sex","SibSp","Parch"]
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(train_data[features])
model = RandomForestClassifier(n_estimators=1,max_depth=5,random_state=1)
model.fit(X,y)
predictions = model.predict(X_test)
output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
output.to_csv('my_submission.csv', index=False)
print("Your submission was successfully saved!")
编译后显示以下错误:
ValueError Traceback (most recent call last)
<ipython-input-24-7d2fc2ea2973> in <module>
11
12
---> 13 output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
14 output.to_csv('my_submission.csv', index=False)
15 print("Your submission was successfully saved!")
/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
433 )
434 elif isinstance(data, dict):
--> 435 mgr = init_dict(data, index, columns, dtype=dtype)
436 elif isinstance(data, ma.MaskedArray):
437 import numpy.ma.mrecords as mrecords
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
252 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
253 ]
--> 254 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
255
256
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
62 # figure out the index, if necessary
63 if index is None:
---> 64 index = extract_index(arrays)
65 else:
66 index = ensure_index(index)
/opt/conda/lib/python3.7/site-packages/pandas/core/internals/construction.py in extract_index(data)
376 f"length {len(index)}"
377 )
--> 378 raise ValueError(msg)
379 else:
380 index = ibase.default_index(lengths[0])
ValueError: array length 891 does not match index length 418
但是,我无法调试我的错误到底是什么,有人可以帮忙吗?谢谢。
您构建 X_test 数据框的方式不正确,一旦您考虑 train_data,而不是 test_data。在创建输出文件时,这会在 test_data.PassengerId 和 predictions 的大小之间造成不匹配。
更正以下行,它将起作用:
X_test = pd.get_dummies(test_data[features])
纠正 x_test 赋值,将 x_train 替换为 x_test。