VB.Net: 按行搜索 Word 文档
VB.Net: Searching Word Document By Line
我正在尝试逐行阅读 Word 文档(800 多页),如果该行包含特定文本,在本例中 Section
,只需将该行打印到控制台即可。
Public Sub doIt()
SearchFile("theFilePath", "Section")
Console.WriteLine("SHit")
End Sub
Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
Dim sr As StreamReader = New StreamReader(strFilePath)
Dim strLine As String = String.Empty
For Each line As String In sr.ReadLine
If line.Contains(strSearchTerm) = True Then
Console.WriteLine(line)
End If
Next
End Sub
它运行了,但是没有打印出任何东西。我知道 "Section" 这个词也在那里多次出现。
正如评论中已经提到的,您无法按照目前的方式搜索 Word
文档。您需要如上所述创建一个 Word.Application
对象,然后加载文档以便搜索它。
这是我为您编写的一个简短示例。请注意,您需要添加对 Microsoft.Office.Interop.Word 的引用,然后您需要将导入语句添加到您的 class。例如 Imports Microsoft.Office.Interop
。这也会抓取每个段落,然后使用范围来查找您正在搜索的单词,如果找到它会将其添加到列表中。
注意:经过试验和测试 - 我在按钮事件中有这个,但放在你需要的地方。
Try
Dim objWordApp As Word.Application = Nothing
Dim objDoc As Word.Document = Nothing
Dim TextToFind As String = YOURTEXT
Dim TextRange As Word.Range = Nothing
Dim StringLines As New List(Of String)
objWordApp = CreateObject("Word.Application")
If objWordApp IsNot Nothing Then
objWordApp.Visible = False
objDoc = objWordApp.Documents.Open(FileName, )
End If
If objDoc IsNot Nothing Then
'loop through each paragraph in the document and get the range
For Each p As Word.Paragraph In objDoc.Paragraphs
TextRange = p.Range
TextRange.Find.ClearFormatting()
If TextRange.Find.Execute(TextToFind, ) Then
StringLines.Add(p.Range.Text)
End If
Next
If StringLines.Count > 0 Then
MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
End If
objDoc.Close()
objWordApp.Quit()
End If
Catch ex As Exception
'publish your exception?
End Try
更新为使用句子 - 这将遍历每个段落并抓取每个句子,然后我们可以查看该词是否存在...这样做的好处是速度更快,因为我们得到每个段落,然后搜索句子。我们必须得到段落才能得到句子...
Try
Dim objWordApp As Word.Application = Nothing
Dim objDoc As Word.Document = Nothing
Dim TextToFind As String = "YOUR TEXT TO FIND"
Dim TextRange As Word.Range = Nothing
Dim StringLines As New List(Of String)
Dim SentenceCount As Integer = 0
objWordApp = CreateObject("Word.Application")
If objWordApp IsNot Nothing Then
objWordApp.Visible = False
objDoc = objWordApp.Documents.Open(FileName, )
End If
If objDoc IsNot Nothing Then
For Each p As Word.Paragraph In objDoc.Paragraphs
TextRange = p.Range
TextRange.Find.ClearFormatting()
SentenceCount = TextRange.Sentences.Count
If SentenceCount > 0 Then
Do Until SentenceCount = 0
Dim sentence As String = TextRange.Sentences.Item(SentenceCount).Text
If sentence.Contains(TextToFind) Then
StringLines.Add(sentence.Trim())
End If
SentenceCount -= 1
Loop
End If
Next
If StringLines.Count > 0 Then
MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
End If
objDoc.Close()
objWordApp.Quit()
End If
Catch ex As Exception
'publish your exception?
End Try
这是一个子程序,它将打印找到搜索字符串的每一行,而不是每一段。它将模仿在您的示例中使用 streamreader 的行为 read/check 每行:
'Add reference to and import Microsoft.Office.Interop.Word
Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
Dim wordObject As Word.Application = New Word.Application
wordObject.Visible = False
Dim objWord As Word.Document = wordObject.Documents.Open(strFilePath)
objWord.Characters(1).Select()
Dim bolEOF As Boolean = False
Do Until bolEOF
wordObject.Selection.MoveEnd(WdUnits.wdLine, 1)
If wordObject.Selection.Text.ToUpper.Contains(strSearchTerm.ToUpper) Then
Console.WriteLine(wordObject.Selection.Text.Replace(vbCr, "").Replace(vbCr, "").Replace(vbCrLf, ""))
End If
wordObject.Selection.Collapse(WdCollapseDirection.wdCollapseEnd)
If wordObject.Selection.Bookmarks.Exists("\EndOfDoc") Then
bolEOF = True
End If
Loop
objWord.Close()
wordObject.Quit()
objWord = Nothing
wordObject = Nothing
Me.Close()
End Sub
它是 nawfal's solution to parsing word document lines
的 vb.net 实施稍作修改
我正在尝试逐行阅读 Word 文档(800 多页),如果该行包含特定文本,在本例中 Section
,只需将该行打印到控制台即可。
Public Sub doIt()
SearchFile("theFilePath", "Section")
Console.WriteLine("SHit")
End Sub
Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
Dim sr As StreamReader = New StreamReader(strFilePath)
Dim strLine As String = String.Empty
For Each line As String In sr.ReadLine
If line.Contains(strSearchTerm) = True Then
Console.WriteLine(line)
End If
Next
End Sub
它运行了,但是没有打印出任何东西。我知道 "Section" 这个词也在那里多次出现。
正如评论中已经提到的,您无法按照目前的方式搜索 Word
文档。您需要如上所述创建一个 Word.Application
对象,然后加载文档以便搜索它。
这是我为您编写的一个简短示例。请注意,您需要添加对 Microsoft.Office.Interop.Word 的引用,然后您需要将导入语句添加到您的 class。例如 Imports Microsoft.Office.Interop
。这也会抓取每个段落,然后使用范围来查找您正在搜索的单词,如果找到它会将其添加到列表中。
注意:经过试验和测试 - 我在按钮事件中有这个,但放在你需要的地方。
Try
Dim objWordApp As Word.Application = Nothing
Dim objDoc As Word.Document = Nothing
Dim TextToFind As String = YOURTEXT
Dim TextRange As Word.Range = Nothing
Dim StringLines As New List(Of String)
objWordApp = CreateObject("Word.Application")
If objWordApp IsNot Nothing Then
objWordApp.Visible = False
objDoc = objWordApp.Documents.Open(FileName, )
End If
If objDoc IsNot Nothing Then
'loop through each paragraph in the document and get the range
For Each p As Word.Paragraph In objDoc.Paragraphs
TextRange = p.Range
TextRange.Find.ClearFormatting()
If TextRange.Find.Execute(TextToFind, ) Then
StringLines.Add(p.Range.Text)
End If
Next
If StringLines.Count > 0 Then
MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
End If
objDoc.Close()
objWordApp.Quit()
End If
Catch ex As Exception
'publish your exception?
End Try
更新为使用句子 - 这将遍历每个段落并抓取每个句子,然后我们可以查看该词是否存在...这样做的好处是速度更快,因为我们得到每个段落,然后搜索句子。我们必须得到段落才能得到句子...
Try
Dim objWordApp As Word.Application = Nothing
Dim objDoc As Word.Document = Nothing
Dim TextToFind As String = "YOUR TEXT TO FIND"
Dim TextRange As Word.Range = Nothing
Dim StringLines As New List(Of String)
Dim SentenceCount As Integer = 0
objWordApp = CreateObject("Word.Application")
If objWordApp IsNot Nothing Then
objWordApp.Visible = False
objDoc = objWordApp.Documents.Open(FileName, )
End If
If objDoc IsNot Nothing Then
For Each p As Word.Paragraph In objDoc.Paragraphs
TextRange = p.Range
TextRange.Find.ClearFormatting()
SentenceCount = TextRange.Sentences.Count
If SentenceCount > 0 Then
Do Until SentenceCount = 0
Dim sentence As String = TextRange.Sentences.Item(SentenceCount).Text
If sentence.Contains(TextToFind) Then
StringLines.Add(sentence.Trim())
End If
SentenceCount -= 1
Loop
End If
Next
If StringLines.Count > 0 Then
MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
End If
objDoc.Close()
objWordApp.Quit()
End If
Catch ex As Exception
'publish your exception?
End Try
这是一个子程序,它将打印找到搜索字符串的每一行,而不是每一段。它将模仿在您的示例中使用 streamreader 的行为 read/check 每行:
'Add reference to and import Microsoft.Office.Interop.Word
Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
Dim wordObject As Word.Application = New Word.Application
wordObject.Visible = False
Dim objWord As Word.Document = wordObject.Documents.Open(strFilePath)
objWord.Characters(1).Select()
Dim bolEOF As Boolean = False
Do Until bolEOF
wordObject.Selection.MoveEnd(WdUnits.wdLine, 1)
If wordObject.Selection.Text.ToUpper.Contains(strSearchTerm.ToUpper) Then
Console.WriteLine(wordObject.Selection.Text.Replace(vbCr, "").Replace(vbCr, "").Replace(vbCrLf, ""))
End If
wordObject.Selection.Collapse(WdCollapseDirection.wdCollapseEnd)
If wordObject.Selection.Bookmarks.Exists("\EndOfDoc") Then
bolEOF = True
End If
Loop
objWord.Close()
wordObject.Quit()
objWord = Nothing
wordObject = Nothing
Me.Close()
End Sub
它是 nawfal's solution to parsing word document lines
的 vb.net 实施稍作修改