错误 nltk.gaac.demo() 是 运行

Errors nltk.gaac.demo() is run

当我运行nltk.gaac.demo()

如果我遗漏了什么,你能帮帮我吗?我收到以下错误。

我正在使用 nltk 3.0.1

Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit   (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.gaac.demo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python34\lib\site-packages\nltk\cluster\gaac.py", line 150, in demo
clusters = clusterer.cluster(vectors, True)
File "C:\Python34\lib\site-packages\nltk\cluster\gaac.py", line 41, in cluster
return VectorSpaceClusterer.cluster(self, vectors, assign_clusters, trace)
File "C:\Python34\lib\site-packages\nltk\cluster\util.py", line 57, in cluster
self.cluster_vectorspace(vectors, trace)
File "C:\Python34\lib\site-packages\nltk\cluster\gaac.py", line 79, in          cluster_vectorspace
self.update_clusters(self._num_clusters)
File "C:\Python34\lib\site-packages\nltk\cluster\gaac.py", line 99, in     update_clusters
clusters = self._dendrogram.groups(num_clusters)
File "C:\Python34\lib\site-packages\nltk\cluster\util.py", line 213, in groups
return root.groups(n)
File "C:\Python34\lib\site-packages\nltk\cluster\util.py", line 161, in groups
queue.sort()
TypeError: unorderable types: _DendrogramNode() < _DendrogramNode()

这似乎是 Python 2.x 和 3.x 之间的 nltk 模块兼容性问题。我在下面解释,你可以破解最后一节中的解决方案

说明

在我的机器上,在 Python 2.7 中,nltk.gaac.demo() 产生:

Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.gaac.demo()
None [array([ 0.70710678,  0.70710678]), array([ 0.4472136 ,  0.89442719]), arra
y([ 0.89442719,  0.4472136 ]), array([ 1.,  0.]), array([ 0.5547002 ,  0.8320502
9]), array([ 0.9486833 ,  0.31622777])]
Clusterer: <GroupAverageAgglomerative Clusterer n=4>
Clustered: [array([3, 3]), array([1, 2]), array([4, 2]), array([4, 0]), array([2
, 3]), array([3, 1])]
As: [0, 2, 3, 1, 2, 3]

     +---------+---------+---------+
     |         |         |         |
     |         |         +-----------------------------+
     |         |         |         |                   |
     |         +-----------------------------+         |
     |         |         |         |         |         |
[ 3.  3.] [ 1.  2.] [ 4.  2.] [ 4.  0.] [ 2.  3.] [ 3.  1.]
classify([3 3]): 0

而在 Python 3.3 中,我看到了 Python 3.4.1.

的确切行为 OP 报告

我已经用 nltk developers here 提交了错误报告。

This blog 关于将 Python 2 迁移到 Python 3 注意到:

Unorderable types, cmp and cmp Under Python 2 the most common way of making types sortable is to implement a cmp() method that in turn uses the builtin cmp() function

...

Since having both cmp() and rich comparison methods violates the principle of there being only one obvious way of doing something, Python 3 ignores the cmp() method. In addition to this, the cmp() function is gone! This typically results in your converted code raising a TypeError: unorderable types error. So you need to replace the cmp() method with rich comparison methods instead. To support sorting you only need to implement lt(), the method used for the “less then” operator, <.

解决方案

让自己继续前进 - 添加一个 __lt__() 函数到 _DendrogramNode class:

  • 在您选择的编辑器中打开 C:\Python34\Lib\site-packages\nltk\cluster\util.py
  • 找到行class _DendrogramNode(object)(在我的安装中是第129行)
  • 添加一个小于函数 - 所以你的代码看起来像:

    class _DendrogramNode(object):
       """ Tree node of a dendrogram. """<br>
        def __lt__(self, comparator):
            return self._value.any() < comparator._value.any()</pre>

  • 最后一步(考虑 Python 3 中的新划分规则)

  • 找到行 return '%s%s%s' % (lhalf*left, centre, right*rhalf)(我的第 247 行加上上面的内容)

  • 替换为return '%s%s%s' % (int(lhalf)*left, centre, right*int(rhalf))

然后你得到你想要的输出:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.gaac.demo()
None [array([ 0.70710678,  0.70710678]), array([ 0.4472136 ,  0.89442719]), arra
y([ 0.89442719,  0.4472136 ]), array([ 1.,  0.]), array([ 0.5547002 ,  0.8320502
9]), array([ 0.9486833 ,  0.31622777])]
Clusterer: <GroupAverageAgglomerative Clusterer n=4>
Clustered: [array([3, 3]), array([1, 2]), array([4, 2]), array([4, 0]), array([2
, 3]), array([3, 1])]
As: [0, 2, 3, 1, 2, 3]

     +---------+---------+---------+
     |         |         |         |
     |         |         +-----------------------------+
     |         |         |         |                   |
     |         +-----------------------------+         |
     |         |         |         |         |         |
[ 3.  3.] [ 1.  2.] [ 4.  2.] [ 4.  0.] [ 2.  3.] [ 3.  1.]
classify([3 3]): 0

我的被黑 util.py 文件版本可作为 github gist 使用。