如何解决实施交叉验证时出错
我正在尝试使用交叉验证来评估模型(MNIST):
from sklearn.model_selection import StratifiedKFold
from sklearn.base import clone
skfolds = StratifiedKFold(n_splits=5,random_state=42)
在运行第三行时,我得到以下警告:
C:\ Users \ nextg \ Desktop \ sample_project \ env \ lib \ site-packages \ sklearn \ model_selection_split.py:293: FutureWarning:由于shuffle是 假。这将产生0.24的误差。你应该离开random_state 设为默认值(无),或设置shuffle = True。 warnings.warn(
忽略警告,我写这段代码
for train_index,test_index in skfolds.split(X_train,y_test_5):
clone_clf = clone(sgd_clf)
X_train_folds = X_train[train_index]
y_train_folds = y_train[train_index]
X_test_fold = X_test[test_index]
y_test_fold = y_test_5[test_index]
clone_clf.fit(X_train_folds,y_train_folds)
y_pred = clone_clf.predict(X_test_fold)
n_correct = sum(y_pred == y_test_fold)
print(n_correct / len(y_pred))
运行此代码后,错误是
ValueError Traceback (most recent call last)
<ipython-input-66-7e786591c439> in <module>
----> 1 for train_index,y_test_5):
2 clone_clf = clone(sgd_clf)
3 X_train_folds = X_train[train_index]
4 y_train_folds = y_train[train_index]
5 X_test_fold = X_test[test_index]
~\Desktop\sample_project\env\lib\site-
packages\sklearn\model_selection\_split.py in split(self,X,y,groups)
326 The testing set indices for that split.
327 """
--> 328 X,groups = indexable(X,groups)
329 n_samples = _num_samples(X)
330 if self.n_splits > n_samples:
~\Desktop\sample_project\env\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
291 """
292 result = [_make_indexable(X) for X in iterables]
--> 293 check_consistent_length(*result)
294 return result
295
~\Desktop\sample_project\env\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
254 uniques = np.unique(lengths)
255 if len(uniques) > 1:
--> 256 raise ValueError("Found input variables with inconsistent numbers of"
257 " samples: %r" % [int(l) for l in lengths])
258
ValueError: Found input variables with inconsistent numbers of samples: [60000,10000]
有人可以解决这个错误吗
解决方法
此表达式没有意义:UserModule
。
应该是skfolds.split(X_train,y_test_5)
和skfolds.split(X,y)
来自doc:
X.shape[0] == y.shape[0]
,
应该为skfolds.split(X_train,y_train_5)
而不是skfolds.split(X_train,y_test_5)
在for循环的第二行,其y_test_fold = y_train_5[test_index]
不是y_train_folds = y_train[train_index]
整个问题都因为使用Tab键开始了。
,它的作用:
from sklearn.model_selection import StratifiedKFold
from sklearn.base import clone
skfolds = StratifiedKFold(n_splits=3,random_state=42,shuffle=True)
for train_index,test_index in skfolds.split(X_train,y_train_5):
clone_clf = clone(sgd_clf)
X_train_folds = X_train.values[train_index]
y_train_folds = y_train_5[train_index]
X_test_fold = X_train.values[test_index]
y_test_fold = y_train_5[test_index]
clone_clf.fit(X_train_folds,y_train_folds)
y_pred = clone_clf.predict(X_test_fold)
n_correct = sum(y_pred == y_test_fold)
print(n_correct / len(y_pred))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。