安装gensim时分块警告
Chunkize warning while installing gensim
我已经在 Python 中安装了 gensim(通过 pip)。安装结束后,我收到以下警告:
C:\Python27\lib\site-packages\gensim\utils.py:855: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
我该如何纠正这个问题?
由于这个警告,我无法从 gensim.models 导入 word2vec。
我有以下配置:Python 2.7, gensim-0.13.4.1, numpy-1.11.3, scipy-0.18.1, pattern-2.6.
我觉得问题不大。 Gensim 只是让您知道它将分块化为不同的函数,因为您使用特定的 os.
查看此代码
if os.name == 'nt':
logger.info("detected Windows; aliasing chunkize to chunkize_serial")
def chunkize(corpus, chunksize, maxsize=0, as_numpy=False):
for chunk in chunkize_serial(corpus, chunksize, as_numpy=as_numpy):
yield chunk
else:
def chunkize(corpus, chunksize, maxsize=0, as_numpy=False):
"""
Split a stream of values into smaller chunks.
Each chunk is of length `chunksize`, except the last one which may be smaller.
A once-only input stream (`corpus` from a generator) is ok, chunking is done
efficiently via itertools.
If `maxsize > 1`, don't wait idly in between successive chunk `yields`, but
rather keep filling a short queue (of size at most `maxsize`) with forthcoming
chunks in advance. This is realized by starting a separate process, and is
meant to reduce I/O delays, which can be significant when `corpus` comes
from a slow medium (like harddisk).
If `maxsize==0`, don't fool around with parallelism and simply yield the chunksize
via `chunkize_serial()` (no I/O optimizations).
>>> for chunk in chunkize(range(10), 4): print(chunk)
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9]
"""
assert chunksize > 0
if maxsize > 0:
q = multiprocessing.Queue(maxsize=maxsize)
worker = InputQueue(q, corpus, chunksize, maxsize=maxsize, as_numpy=as_numpy)
worker.daemon = True
worker.start()
while True:
chunk = [q.get(block=True)]
if chunk[0] is None:
break
yield chunk.pop()
else:
for chunk in chunkize_serial(corpus, chunksize, as_numpy=as_numpy):
yield chunk
您可以使用此代码在导入gensim之前抑制消息:
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')
import gensim
我已经在 Python 中安装了 gensim(通过 pip)。安装结束后,我收到以下警告:
C:\Python27\lib\site-packages\gensim\utils.py:855: UserWarning: detected Windows; aliasing chunkize to chunkize_serial warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
我该如何纠正这个问题?
由于这个警告,我无法从 gensim.models 导入 word2vec。
我有以下配置:Python 2.7, gensim-0.13.4.1, numpy-1.11.3, scipy-0.18.1, pattern-2.6.
我觉得问题不大。 Gensim 只是让您知道它将分块化为不同的函数,因为您使用特定的 os.
查看此代码if os.name == 'nt':
logger.info("detected Windows; aliasing chunkize to chunkize_serial")
def chunkize(corpus, chunksize, maxsize=0, as_numpy=False):
for chunk in chunkize_serial(corpus, chunksize, as_numpy=as_numpy):
yield chunk
else:
def chunkize(corpus, chunksize, maxsize=0, as_numpy=False):
"""
Split a stream of values into smaller chunks.
Each chunk is of length `chunksize`, except the last one which may be smaller.
A once-only input stream (`corpus` from a generator) is ok, chunking is done
efficiently via itertools.
If `maxsize > 1`, don't wait idly in between successive chunk `yields`, but
rather keep filling a short queue (of size at most `maxsize`) with forthcoming
chunks in advance. This is realized by starting a separate process, and is
meant to reduce I/O delays, which can be significant when `corpus` comes
from a slow medium (like harddisk).
If `maxsize==0`, don't fool around with parallelism and simply yield the chunksize
via `chunkize_serial()` (no I/O optimizations).
>>> for chunk in chunkize(range(10), 4): print(chunk)
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9]
"""
assert chunksize > 0
if maxsize > 0:
q = multiprocessing.Queue(maxsize=maxsize)
worker = InputQueue(q, corpus, chunksize, maxsize=maxsize, as_numpy=as_numpy)
worker.daemon = True
worker.start()
while True:
chunk = [q.get(block=True)]
if chunk[0] is None:
break
yield chunk.pop()
else:
for chunk in chunkize_serial(corpus, chunksize, as_numpy=as_numpy):
yield chunk
您可以使用此代码在导入gensim之前抑制消息:
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')
import gensim