Pycharm 和 Spyder 之间的 kfold 样本大小不同

Different sample sizes in kfold between Pycharm and Spyder

我正在尝试对文本进行分类。我已经开发了执行此操作的代码,但是 kfold 样本大小在 SpyderPycharm 上有所不同,即使代码完全相同。

这是代码:

def baseline_model():

    model = Sequential()
    embedding_size = 100

    model.add(Embedding(input_dim=num_words,
                        output_dim=embedding_size,
                        input_length=max_tokens,
                        name='embedding_layer'))

    model.add(LSTM(units=150, activation='relu', return_sequences=True))
    model.add(Dropout(0.3))
    model.add(LSTM(units=150, activation='relu', ))
    model.add(Dense(output_dim, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

estimator = KerasClassifier(build_fn=baseline_model, epochs=15, batch_size=128, verbose=1)
kfold = KFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X_train_pad, y_train, cv=kfold)
print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

总数据大小为:

>>> X_train_pad.shape
Out[12]: (3320, 56)

这在 Spyder 上运行良好,其中每个折叠使用 10% 的数据作为训练,其余的用于测试:

Epoch 1/15
2988/2988 [==============================] - 19s 7ms/sample - loss: 3.6781 - acc: 0.0971

然而,相同的代码在 PyCharm:

上只使用了 24 个样本
Epoch 1/15
24/24[==============================] - 19s 7ms/sample - loss: 3.6781 - acc: 0.0971

我考虑了我安装的库,但应该不会引起这样的问题。知道为什么会这样吗?

编辑 1: Google colab 使用与 PyCharm:

相同的样本量
Epoch 1/15
24/24[==============================] - 19s 7ms/sample - loss: 3.6781 - acc: 0.0971

编辑 2: 如果我在 Anaconda 上创建一个新环境并安装最新的软件包,样本量很小。如果我创建环境并安装以下软件包,样本量很大。

> absl-py==0.7.1 alabaster==0.7.12 anaconda-client==1.7.2
> anaconda-navigator==1.9.7 anaconda-project==0.8.2 asn1crypto==0.24.0
> astor==0.8.0 astroid==2.1.0 astropy==3.1 atomicwrites==1.2.1
> attrs==18.2.0 Babel==2.6.0 backcall==0.1.0
> backports.functools-lru-cache==1.5 backports.os==0.1.1
> backports.shutil-get-terminal-size==1.0.0 backports.tempfile==1.0
> backports.weakref==1.0.post1 beautifulsoup4==4.6.3 bitarray==0.8.3
> bkcharts==0.2 blaze==0.11.3 bleach==3.0.2 bokeh==1.0.2 boto==2.49.0
> Bottleneck==1.2.1 certifi==2018.11.29 cffi==1.11.5 chardet==3.0.4
> chart-studio==1.0.0 Click==7.0 cloudpickle==0.6.1 clyent==1.2.2
> colorama==0.4.1 comtypes==1.1.7 conda==4.7.12 conda-build==3.18.9
> conda-package-handling==1.3.11 conda-verify==3.4.2 contextlib2==0.5.5
> cryptography==2.4.2 cycler==0.10.0 Cython==0.29.2 cytoolz==0.9.0.1
> dask==1.0.0 datashape==0.5.4 decorator==4.3.0 defusedxml==0.5.0
> distributed==1.25.1 docutils==0.14 entrypoints==0.2.3
> et-xmlfile==1.0.1 fastcache==1.0.2 filelock==3.0.10 Flask==1.0.2
> Flask-Cors==3.0.7 fsspec==0.4.0 future==0.17.1 gast==0.2.2
> gevent==1.3.7 glob2==0.6 graphviz==0.10.1 greenlet==0.4.15
> grpcio==1.16.1 h5py==2.8.0 heapdict==1.0.0 html5lib==1.0.1 idna==2.8
> imageio==2.4.1 imagesize==1.1.0 imbalanced-learn==0.4.3 imblearn==0.0
> importlib-metadata==0.6 inflection==0.3.1 ipykernel==5.1.0
> ipython==7.2.0 ipython-genutils==0.2.0 ipywidgets==7.4.2 isort==4.3.4
> itsdangerous==1.1.0 jdcal==1.4 jedi==0.13.2 Jinja2==2.10
> joblib==0.13.2 json5==0.8.5 jsonpickle==1.2 jsonschema==2.6.0
> jupyter==1.0.0 jupyter-client==5.2.4 jupyter-console==6.0.0
> jupyter-core==4.4.0 jupyterlab==0.35.3 jupyterlab-server==0.2.0
> Keras==2.2.4 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0
> keyring==17.0.0 kiwisolver==1.0.1 lazy-object-proxy==1.3.1
> libarchive-c==2.8 llvmlite==0.26.0 locket==0.2.0 lxml==4.2.5
> Markdown==3.1.1 MarkupSafe==1.1.0 matplotlib==3.1.1 mccabe==0.6.1
> menuinst==1.4.14 mistune==0.8.4 mkl-fft==1.0.6 mkl-random==1.0.2
> mock==3.0.5 more-itertools==4.3.0 mpl-finance==0.10.0 mpld3==0.3
> mpmath==1.1.0 msgpack==0.5.6 multipledispatch==0.6.0
> navigator-updater==0.2.1 nbconvert==5.4.0 nbformat==4.4.0
> networkx==2.2 nltk==3.4 nose==1.3.7 notebook==5.7.4 numba==0.41.0
> numexpr==2.6.8 numpy==1.15.4 numpydoc==0.8.0 odo==0.5.1 olefile==0.46
> openpyxl==2.5.12 packaging==18.0 pandas==0.25.1 pandocfilters==1.4.2
> parso==0.3.1 partd==0.3.9 path.py==11.5.0 pathlib2==2.3.3 patsy==0.5.1
> pep8==1.7.1 pickleshare==0.7.5 Pillow==5.3.0 pkginfo==1.4.2
> plotly==4.0.0 pluggy==0.8.0 ply==3.11 prometheus-client==0.5.0
> prompt-toolkit==2.0.7 protobuf==3.8.0 psutil==5.4.8 py==1.7.0
> pycodestyle==2.4.0 pycosat==0.6.3 pycparser==2.19 pycrypto==2.6.1
> pycurl==7.43.0.2 pydotplus==2.0.2 pyflakes==2.0.0 Pygments==2.3.1
> pylint==2.2.2 Pympler==0.7 pyodbc==4.0.25 pyOpenSSL==18.0.0
> pyparsing==2.3.0 pyreadline==2.1 pyrsistent==0.14.11 PySocks==1.6.8
> pytest==4.0.2 pytest-arraydiff==0.3 pytest-astropy==0.5.0
> pytest-doctestplus==0.2.0 pytest-openfiles==0.3.1
> pytest-remotedata==0.3.1 python-dateutil==2.7.5 pytz==2018.7
> PyWavelets==1.0.1 pywin32==223 pywinpty==0.5.5 PyYAML==3.13
> pyzmq==17.1.2 QtAwesome==0.5.3 qtconsole==4.4.3 QtPy==1.5.2
> Quandl==3.4.5 requests==2.21.0 retrying==1.3.3 rope==0.11.0
> ruamel-yaml==0.15.46 scikit-image==0.14.1 scikit-learn==0.20.1
> scipy==1.1.0 seaborn==0.9.0 Send2Trash==1.5.0 simplegeneric==0.8.1
> singledispatch==3.4.0.3 six==1.12.0 snowballstemmer==1.2.1
> sortedcollections==1.0.1 sortedcontainers==2.1.0 soupsieve==1.9.2
> Sphinx==1.8.2 sphinxcontrib-applehelp==1.0.1
> sphinxcontrib-devhelp==1.0.1 sphinxcontrib-htmlhelp==1.0.2
> sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.2
> sphinxcontrib-serializinghtml==1.1.3 sphinxcontrib-websupport==1.1.0
> spyder==3.3.2 spyder-kernels==0.3.0 SQLAlchemy==1.2.15
> statsmodels==0.9.0 SwarmPackagePy==1.0.0a5 sympy==1.3 TA-Lib==0.4.17
> tables==3.4.4 tblib==1.3.2 tensorboard==1.13.1 tensorflow==1.13.1
> tensorflow-estimator==1.13.0 termcolor==1.1.0 terminado==0.8.1
> testpath==0.4.2 toolz==0.9.0 tornado==5.1.1 tqdm==4.28.1
> traitlets==4.3.2 unicodecsv==0.14.1 urllib3==1.24.1 wcwidth==0.1.7
> webencodings==0.5.1 Werkzeug==0.14.1 widgetsnbextension==3.4.2
> win-inet-pton==1.0.1 win-unicode-console==0.5 wincertstore==0.2
> wrapt==1.10.11 xlrd==1.2.0 XlsxWriter==1.1.2 xlwings==0.15.1
> xlwt==1.3.0 zict==0.1.3 zipp==0.5.2

好的,我找到问题了。正如我在 post 中提到的,所谓的问题是根据库的版本出现的。看起来,Keras 现在显示批号而不是样本数。以下是类似的 posts: