将整个列转换为语料库
Transform entire column as a corpus
df 有两列包含文本。我想将它们分别转换为语料库。
df
id | Description 1 |Description 2 |
-----------------------------------------------------------
1 |that book is good | better than book2 |
2 |book 2 is not better than 1 | not good |
. | . | . |
. | . | . |
. | . | . |
考虑说明 1 是文档,说明 2 是查询。
预期输出
Corpus 1: that book is good book 2 is not better than 1..................
Corpus 2: better than book2 not good.....................
您需要使用连接函数连接列中可用的每一行,然后追加it.Output 是列表格式
corpus = []
for i in range(len(df.columns)):
corpus.append(' '.join(df.iloc[j,i] for j in range(len(df.iloc[:,i]))))
df 有两列包含文本。我想将它们分别转换为语料库。
df
id | Description 1 |Description 2 |
-----------------------------------------------------------
1 |that book is good | better than book2 |
2 |book 2 is not better than 1 | not good |
. | . | . |
. | . | . |
. | . | . |
考虑说明 1 是文档,说明 2 是查询。
预期输出
Corpus 1: that book is good book 2 is not better than 1..................
Corpus 2: better than book2 not good.....................
您需要使用连接函数连接列中可用的每一行,然后追加it.Output 是列表格式
corpus = []
for i in range(len(df.columns)):
corpus.append(' '.join(df.iloc[j,i] for j in range(len(df.iloc[:,i]))))