ValueError: Shape of passed values is, indices imply

ValueError: Shape of passed values is, indices imply

重新post再次发送,因为我没有收到第一个 post

的回复

我有以下数据如下:

desc = pd.DataFrame(description, columns =['new_desc'])

                                             new_desc
257623  the public safety report is compiled from crim...
161135  police say a sea isle city man ordered two pou...
156561  two people are behind bars this morning, after...
41690   pumpkin soup is a beloved breakfast soup in ja...
70092   right now, 15 states are grappling with how be...
...                                                   ...
207258  operation legend results in 59 more arrests, i...
222170                                      see story, 3a
204064  st. louis — missouri secretary of state jason ...
151443  tony lavell jones, 54, of sunset view terrace,...
97367   walgreens, on the other hand, is still going t...

[9863 rows x 1 columns]

我试图在文档中找到主导主题,当我 运行 以下代码时

best_lda_model = lda_desc
data_vectorized = tfidf
lda_output = best_lda_model.transform(data_vectorized)
topicnames = ["Topic " + str(i) for i in range(best_lda_model.n_components)]
docnames = ["Doc " + str(i) for i in range(len(dataset))]
df_document_topic = pd.DataFrame(np.round(lda_output, 2), columns = topicnames, index = docnames)
dominant_topic = np.argmax(df_document_topic.values, axis = 1)
df_document_topic['dominant_topic'] = dominant_topic

我已经尝试调整代码,但是,无论我更改什么,我都会收到以下错误跟踪簿错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\python36\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
   1673 
-> 1674         mgr = BlockManager(blocks, axes)
   1675         mgr._consolidate_inplace()

c:\python36\lib\site-packages\pandas\core\internals\managers.py in __init__(self, blocks, axes, do_integrity_check)
    148         if do_integrity_check:
--> 149             self._verify_integrity()
    150 

c:\python36\lib\site-packages\pandas\core\internals\managers.py in _verify_integrity(self)
    328             if block.shape[1:] != mgr_shape[1:]:
--> 329                 raise construction_error(tot_items, block.shape[1:], self.axes)
    330         if len(self.items) != tot_items:

ValueError: Shape of passed values is (9863, 8), indices imply (0, 8)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-41-bd470d69b181> in <module>
      4 topicnames = ["Topic " + str(i) for i in range(best_lda_model.n_components)]
      5 docnames = ["Doc " + str(i) for i in range(len(dataset))]
----> 6 df_document_topic = pd.DataFrame(np.round(lda_output, 2), columns = topicnames, index = docnames)
      7 dominant_topic = np.argmax(df_document_topic.values, axis = 1)
      8 df_document_topic['dominant_topic'] = dominant_topic

c:\python36\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    495                 mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
    496             else:
--> 497                 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
    498 
    499         # For data is list-like, or Iterable (will consume into list)

c:\python36\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)
    232         block_values = [values]
    233 
--> 234     return create_block_manager_from_blocks(block_values, [columns, index])
    235 
    236 

c:\python36\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
   1679         blocks = [getattr(b, "values", b) for b in blocks]
   1680         tot_items = sum(b.shape[0] for b in blocks)
-> 1681         raise construction_error(tot_items, blocks[0].shape[1:], axes, e)
   1682 
   1683 

ValueError: Shape of passed values is (9863, 8), indices imply (0, 8)

期望的结果是根据特定主题生成文档列表。下面是示例代码和所需的输出。

df_document_topic(df_document_topic['dominant_topic'] == 2).head(10)

当我运行这段代码时,我得到以下回溯

TypeError                                 Traceback (most recent call last)
<ipython-input-55-8cf9694464e6> in <module>
----> 1 df_document_topic(df_document_topic['dominant_topic'] == 2).head(10)

TypeError: 'DataFrame' object is not callable

下面是想要的输出

如有任何帮助,我们将不胜感激。

您作为 docnames 传递的索引是空的,它是从 dataset 获得的,如下所示:

docnames = ["Doc " + str(i) for i in range(len(dataset))]

所以这意味着 dataset 也是空的。作为解决方法,您可以根据 lda_output 的大小创建 Doc 索引,如下所示:

docnames = ["Doc " + str(i) for i in range(len(lda_output))]

让我知道这是否有效。