Word2Vec - How to rid of "TypeError: unhashable type: 'list'" and "AttributeError: dlsym(0x7fa8c57be020, AttachDebuggerTracing): symbol not found"?

Word2Vec - How to rid of "TypeError: unhashable type: 'list'" and "AttributeError: dlsym(0x7fa8c57be020, AttachDebuggerTracing): symbol not found"?

当我基于 gensim 模块的 Word2Vec 实现创建模型时出现 TypeError: unhashable type: 'list'AttributeError: dlsym(0x7fa8c57be020, AttachDebuggerTracing): symbol not found 错误。

每个条目都包含三个部分,它们显示在一个列表中。并且,为了演示,模型包含三个条目

这是我尝试过的:

model = Word2Vec(sentences=features, size=100, sg=1, window=3, min_count=1, iter=10, workers=Pool()._processes)

model.build_vocab(features)

model.train(features)

features 的值为:

  [
    [
      ['permission.ACCESS_WIFI_STATE', 'permission.ACCESS_NETWORK_STATE', 'permission.READ_PHONE_STATE', 'permission.INTERNET', 'permission.CHANGE_WIFI_STATE'],
       ['intent.action.MAIN', 'intent.action.BATTERY_CHANGED_ACTION', 'intent.action.SIG_STR', 'intent.action.BOOT_COMPLETED'],
       []
    ],
    [
      ['permission.WRITE_EXTERNAL_STORAGE', 'permission.ACCESS_NETWORK_STATE', 'permission.READ_PHONE_STATE', 'permission.INTERNET', 'permission.INSTALL_PACKAGES', 'permission.SEND_SMS', 'permission.DELETE_PACKAGES'],
      ['intent.action.BOOT_COMPLETED', 'intent.action.USER_PRESENT', 'intent.action.PHONE_STATE', 'intent.action.MAIN'],
      []
    ], 
    [
      ['permission.WRITE_EXTERNAL_STORAGE', 'permission.ACCESS_FINE_LOCATION', 'permission.INTERNET', 'permission.READ_PHONE_STATE', 'permission.ACCESS_COARSE_LOCATION', 'permission.CALL_PHONE', 'permission.READ_CONTACTS', 'permission.READ_SMS'], 
      ['intent.action.PHONE_STATE', 'intent.action.MAIN'], 
      []
    ]
  ]

编辑:根据@gojomo的评论修正特征向量形式后的错误栈跟踪

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1631, in settrace
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1631, in settrace
2019-12-13 12:24:34,519:gensim.models.base_any2vec:INFO - worker thread finished; awaiting finish of 3 more threads
2019-12-13 12:24:34,519:gensim.models.base_any2vec:INFO - worker thread finished; awaiting finish of 2 more threads
2019-12-13 12:24:34,519:gensim.models.base_any2vec:INFO - worker thread finished; awaiting finish of 1 more threads
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1631, in settrace
2019-12-13 12:24:34,519:gensim.models.base_any2vec:INFO - worker thread finished; awaiting finish of 0 more threads
2019-12-13 12:24:34,520:gensim.models.base_any2vec:INFO - EPOCH - 10 : training on 6 raw words (0 effective words) took 0.0s, 0 effective words/s
    stop_at_frame,
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1711, in _locked_settrace
2019-12-13 12:24:34,520:gensim.models.base_any2vec:INFO - training on a 60 raw words (2 effective words) took 0.1s, 21 effective words/s
    stop_at_frame,
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1711, in _locked_settrace
2019-12-13 12:24:34,520:gensim.models.base_any2vec:WARNING - under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
    stop_at_frame,
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1711, in _locked_settrace
    debugger.enable_tracing(apply_to_all_threads=True)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 482, in enable_tracing
    debugger.enable_tracing(apply_to_all_threads=True)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 482, in enable_tracing
    pydevd_tracing.set_trace_to_threads(self.dummy_trace_dispatch)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd_tracing.py", line 241, in set_trace_to_threads
2019-12-13 12:24:34,521:gensim.utils:INFO - saving Word2Vec object under model/word2vec_model, separately None
    pydevd_tracing.set_trace_to_threads(self.dummy_trace_dispatch)
    debugger.enable_tracing(apply_to_all_threads=True)  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd_tracing.py", line 241, in set_trace_to_threads

  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 482, in enable_tracing
    pydevd_tracing.set_trace_to_threads(self.dummy_trace_dispatch)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd_tracing.py", line 241, in set_trace_to_threads
    result = lib.AttachDebuggerTracing(
      File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 361, in __getattr__
result = lib.AttachDebuggerTracing(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 361, in __getattr__
    result = lib.AttachDebuggerTracing(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 361, in __getattr__
2019-12-13 12:24:34,521:gensim.utils:INFO - not storing attribute vectors_norm
        func = self.__getitem__(name)func = self.__getitem__(name)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 366, in __getitem__
    func = self.__getitem__(name)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 366, in __getitem__

2019-12-13 12:24:34,522:gensim.utils:INFO - not storing attribute cum_table
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 366, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
    func = self._FuncPtr((name_or_ordinal, self))AttributeError: dlsym(0x7fed18f247e0, AttachDebuggerTracing): symbol not found

AttributeError: dlsym(0x7fed18f247e0, AttachDebuggerTracing): symbol not found
func = self._FuncPtr((name_or_ordinal, self))
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1631, in settrace
AttributeError: dlsym(0x7fed18d2ff70, AttachDebuggerTracing): symbol not found
    stop_at_frame,
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1711, in _locked_settrace
2019-12-13 12:24:34,524:gensim.utils:INFO - saved model/word2vec_model
    debugger.enable_tracing(apply_to_all_threads=True)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 482, in enable_tracing
    pydevd_tracing.set_trace_to_threads(self.dummy_trace_dispatch)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd_tracing.py", line 241, in set_trace_to_threads
    result = lib.AttachDebuggerTracing(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 361, in __getattr__
    func = self.__getitem__(name)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 366, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x7fed18a163b0, AttachDebuggerTracing): symbol not found
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1631, in settrace
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1631, in settrace
    stop_at_frame,
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1711, in _locked_settrace
    stop_at_frame,
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1711, in _locked_settrace
    debugger.enable_tracing(apply_to_all_threads=True)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 482, in enable_tracing
    debugger.enable_tracing(apply_to_all_threads=True)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 482, in enable_tracing
    pydevd_tracing.set_trace_to_threads(self.dummy_trace_dispatch)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd_tracing.py", line 241, in set_trace_to_threads
        pydevd_tracing.set_trace_to_threads(self.dummy_trace_dispatch)
result = lib.AttachDebuggerTracing(  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd_tracing.py", line 241, in set_trace_to_threads

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 361, in __getattr__
    result = lib.AttachDebuggerTracing(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 361, in __getattr__
    func = self.__getitem__(name)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 366, in __getitem__
    func = self.__getitem__(name)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 366, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x7fed15fd5850, AttachDebuggerTracing): symbol not found
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x7fed18a163b0, AttachDebuggerTracing): symbol not found
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1631, in settrace
    stop_at_frame,
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1711, in _locked_settrace
    debugger.enable_tracing(apply_to_all_threads=True)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 482, in enable_tracing
    pydevd_tracing.set_trace_to_threads(self.dummy_trace_dispatch)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd_tracing.py", line 241, in set_trace_to_threads
    result = lib.AttachDebuggerTracing(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 361, in __getattr__
    func = self.__getitem__(name)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 366, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x7fed18b09320, AttachDebuggerTracing): symbol not found
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1631, in settrace
    stop_at_frame,
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1711, in _locked_settrace
    debugger.enable_tracing(apply_to_all_threads=True)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 482, in enable_tracing
    pydevd_tracing.set_trace_to_threads(self.dummy_trace_dispatch)
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd_tracing.py", line 241, in set_trace_to_threads
    result = lib.AttachDebuggerTracing(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 361, in __getattr__
    func = self.__getitem__(name)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 366, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x7fed15fd5850, AttachDebuggerTracing): symbol not found

Gensim 的 Word2Vec 期望其语料库 sentences 是一个 序列 ,其中每个单独的项目都是一个 字符串标记列表 。 (也就是说,那些字符串标记是单词。)

相反,你有一个列表(作为一个序列是可以接受的),其中它的每个项目都是一个列表(这也是可以接受的),但是这些列表中的每一个都有另一个 list – 当进行 Word2Vec 训练时,每个项目都应该是一个字符串标记(单词)。

(我已将您的示例数据编辑为结构缩进,以使嵌套层次更加清晰。)

如果那些最内层的字符串列表是您的真实个体 "sentences",您需要确保它们是您最外层列表中的项目。

(另一方面,如果您真的希望像 ['intent.action.PHONE_STATE', 'intent.action.MAIN'] 这样的集群在您的模型中成为单个 "word",您需要将该列表更改为单个字符串标记,所以它看起来像一个词——因此是可散列的键——到 Word2Vec 和 Python。)