使用 Python 从句子池中匹配一个词
To match a word from a pool of sentences using Python
我有两个不同的文件文件 "Sentence" 包含一个句子池,请在下面找到快照。
Sentence Snapshot
文件"Word" contians词库,请在下面找到快照。
如果有任何单词与句子匹配,我想将单词从单词文件映射到句子文件,我想要句子和匹配单词形式的结果
例如:
句子匹配词
Linux 开栈很棒 Linux 开栈
请在下面找到我的代码,当我尝试将结果提取到 csv 中时,它显示错误。
import pandas as pd
import csv
sentence_xlsx = pd.ExcelFile('C:\Python\Seema\Sentence.xlsx')
sentence_all = sentence_xlsx.parse('Sheet1')
#print(sentence_all)
word_xlsx = pd.ExcelFile('C:\Python\Seema\Word.xlsx')
word_all = word_xlsx.parse('Sheet1')
for sentence in sentence_all['Article']:
sentences = sentence.lower()
for word in sentences.split():
if word in ('linux','openstack'):
result = word,sentence
results = open('C:\Python\Seema\result.csv', 'wb')
writer = csv.writer(results, dialect='excel')
writer.writerows(result)
results.close()
Traceback (most recent call last):
File "Word_Finder2.py", line 25, in <module>
results = open('C:\Python\Seema\result.csv', 'wb')
IOError: [Errno 22] invalid mode ('wb') or filename: 'C:\Python\Seema\result.c
sv'
路径的 '\result.csv'
部分的 '\r'
被读取为回车 return 字符。要解决此问题,请将前导 r
附加到路径以使其成为原始字符串文字(credit @georg)。
然后使用writerows
,所有迭代的结果应该累积到一个列表中,而不仅仅是最后一个结果。
result = [] # create list to hold result from each iteration
for sentence in sentence_all['Article']:
sentences = sentence.lower()
for word in sentences.split():
if word in ('linux','openstack'):
# append result of iteration to result
result.append([sentence, word])
#|<- creates list of rows suitable for 'writerows'
results = open(r'C:\Python\Seema\result.csv', 'wb')
# ^ prevents \r... from being read as a special character
writer = csv.writer(results, dialect='excel')
writer.writerows(result)
results.close()
我有两个不同的文件文件 "Sentence" 包含一个句子池,请在下面找到快照。 Sentence Snapshot
文件"Word" contians词库,请在下面找到快照。
如果有任何单词与句子匹配,我想将单词从单词文件映射到句子文件,我想要句子和匹配单词形式的结果
例如: 句子匹配词 Linux 开栈很棒 Linux 开栈
请在下面找到我的代码,当我尝试将结果提取到 csv 中时,它显示错误。
import pandas as pd
import csv
sentence_xlsx = pd.ExcelFile('C:\Python\Seema\Sentence.xlsx')
sentence_all = sentence_xlsx.parse('Sheet1')
#print(sentence_all)
word_xlsx = pd.ExcelFile('C:\Python\Seema\Word.xlsx')
word_all = word_xlsx.parse('Sheet1')
for sentence in sentence_all['Article']:
sentences = sentence.lower()
for word in sentences.split():
if word in ('linux','openstack'):
result = word,sentence
results = open('C:\Python\Seema\result.csv', 'wb')
writer = csv.writer(results, dialect='excel')
writer.writerows(result)
results.close()
Traceback (most recent call last):
File "Word_Finder2.py", line 25, in <module>
results = open('C:\Python\Seema\result.csv', 'wb')
IOError: [Errno 22] invalid mode ('wb') or filename: 'C:\Python\Seema\result.c
sv'
路径的 '\result.csv'
部分的 '\r'
被读取为回车 return 字符。要解决此问题,请将前导 r
附加到路径以使其成为原始字符串文字(credit @georg)。
然后使用writerows
,所有迭代的结果应该累积到一个列表中,而不仅仅是最后一个结果。
result = [] # create list to hold result from each iteration
for sentence in sentence_all['Article']:
sentences = sentence.lower()
for word in sentences.split():
if word in ('linux','openstack'):
# append result of iteration to result
result.append([sentence, word])
#|<- creates list of rows suitable for 'writerows'
results = open(r'C:\Python\Seema\result.csv', 'wb')
# ^ prevents \r... from being read as a special character
writer = csv.writer(results, dialect='excel')
writer.writerows(result)
results.close()