如何使用 vbscript 从标签或 HTML class 中提取数据?只有我选择的标签或 classes
How to extract data from tag or HTML class using vbscript? Only tags or classes I choose
有人可以帮助我吗?我需要使用 VBScript 提取标签或 HTML 类 之间的文本并保存到单个文本文件中。我需要保存标签或 类 我在不同行上定义的。
我在 Internet 上找到了很多代码,但 none 按预期工作。
例如,我有下面的代码,但是我不能通过它提取类,而且也不能超过一个标签。在许多情况下,代码甚至不起作用。
myURL = "http://rss.cnn.com/rss/edition.rss"
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set ohtmlFile = CreateObject("htmlfile")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Const ForReading = 1, ForWriting = 2, ForAppending = 8
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send
If oXMLHttp.Status = 200 Then
ohtmlFile.Write oXMLHttp.responseText
ohtmlFile.Close
Set oTable = ohtmlFile.getElementsByTagName("description")
sFileName = "c:\users\user\desktop\News.txt"
Set objFile = objFSO.OpenTextFile(sFileName, ForAppending, True)
For Each oTab In oTable
objFile.Write oTab.Innertext & vbCrLf
Next
objFile.Close
End If
WScript.Quit
谢谢!
将 getElementsByTagName Method 与您的 ohtmlFile
对象一起使用,您走在了正确的轨道上。您可以指定所需的标签类型。例如:
Set objAnchors = ohtmlFile.getElementsByTagName("a")
returns HTML 文档中的所有 <a>
标签。
不过,您可以改用正则表达式:
Option Explicit
Dim myURL,oXMLHttp,objFSO,Description,write2File,ws
myURL = "http://rss.cnn.com/rss/edition.rss"
set ws = CreateObject("wscript.shell")
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set objFSO = CreateObject("Scripting.FileSystemObject")
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send
If oXMLHttp.Status = 200 Then
Description = Extract(oXMLHttp.responseText)
Set write2File = objFSO.CreateTextFile(".\News.txt",True)
write2File.WriteLine(Description)
write2File.Close
ws.run ".\News.txt"
End If
'-------------------------------------------------------------------------
Function Extract(Data)
Dim re,Match,Matches
Set re = New RegExp
re.Global = True
re.IgnoreCase = True
re.Pattern = "<description><!\[CDATA\[([\s\S]*?)\]\]><\/description>"
Set Matches = re.Execute(Data)
For Each Match in Matches
Description = Description & Match.SubMatches(0) & vbCrlf & vbCrlf
Next
Extract = Description
End Function
'-------------------------------------------------------------------------
编辑:
关于如何从 google 获取新闻的第二个请求:
Option Explicit
Dim myURL,oXMLHttp,objFSO,GoogleNews,write2File,ws
myURL = "https://news.google.com/?hl=en-US&gl=US&ceid=US:en"
set ws = CreateObject("wscript.shell")
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set objFSO = CreateObject("Scripting.FileSystemObject")
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send
If oXMLHttp.Status = 200 Then
GoogleNews = Extract(oXMLHttp.responseText)
Set write2File = objFSO.CreateTextFile(".\GoogleNews.txt",True,-1)
write2File.WriteLine(GoogleNews)
write2File.Close
ws.run ".\GoogleNews.txt"
End If
'-------------------------------------------------------------------------
Function Extract(Data)
Dim re,Match,Matches
Set re = New RegExp
re.Global = True
re.IgnoreCase = True
re.MultiLine = True
re.Pattern = "(\bclass=""DY5T1d"" >)(.+?)<\/a>"
Set Matches = re.Execute(Data)
For Each Match in Matches
GoogleNews = GoogleNews & Match.SubMatches(1) & vbCrlf & vbCrlf
Next
GoogleNews = Replace(GoogleNews,"'","'")
GoogleNews = Replace(GoogleNews,""",chr(34))
Extract = GoogleNews
End Function
'-------------------------------------------------------------------------
有人可以帮助我吗?我需要使用 VBScript 提取标签或 HTML 类 之间的文本并保存到单个文本文件中。我需要保存标签或 类 我在不同行上定义的。
我在 Internet 上找到了很多代码,但 none 按预期工作。
例如,我有下面的代码,但是我不能通过它提取类,而且也不能超过一个标签。在许多情况下,代码甚至不起作用。
myURL = "http://rss.cnn.com/rss/edition.rss"
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set ohtmlFile = CreateObject("htmlfile")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Const ForReading = 1, ForWriting = 2, ForAppending = 8
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send
If oXMLHttp.Status = 200 Then
ohtmlFile.Write oXMLHttp.responseText
ohtmlFile.Close
Set oTable = ohtmlFile.getElementsByTagName("description")
sFileName = "c:\users\user\desktop\News.txt"
Set objFile = objFSO.OpenTextFile(sFileName, ForAppending, True)
For Each oTab In oTable
objFile.Write oTab.Innertext & vbCrLf
Next
objFile.Close
End If
WScript.Quit
谢谢!
将 getElementsByTagName Method 与您的 ohtmlFile
对象一起使用,您走在了正确的轨道上。您可以指定所需的标签类型。例如:
Set objAnchors = ohtmlFile.getElementsByTagName("a")
returns HTML 文档中的所有 <a>
标签。
不过,您可以改用正则表达式:
Option Explicit
Dim myURL,oXMLHttp,objFSO,Description,write2File,ws
myURL = "http://rss.cnn.com/rss/edition.rss"
set ws = CreateObject("wscript.shell")
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set objFSO = CreateObject("Scripting.FileSystemObject")
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send
If oXMLHttp.Status = 200 Then
Description = Extract(oXMLHttp.responseText)
Set write2File = objFSO.CreateTextFile(".\News.txt",True)
write2File.WriteLine(Description)
write2File.Close
ws.run ".\News.txt"
End If
'-------------------------------------------------------------------------
Function Extract(Data)
Dim re,Match,Matches
Set re = New RegExp
re.Global = True
re.IgnoreCase = True
re.Pattern = "<description><!\[CDATA\[([\s\S]*?)\]\]><\/description>"
Set Matches = re.Execute(Data)
For Each Match in Matches
Description = Description & Match.SubMatches(0) & vbCrlf & vbCrlf
Next
Extract = Description
End Function
'-------------------------------------------------------------------------
编辑:
关于如何从 google 获取新闻的第二个请求:
Option Explicit
Dim myURL,oXMLHttp,objFSO,GoogleNews,write2File,ws
myURL = "https://news.google.com/?hl=en-US&gl=US&ceid=US:en"
set ws = CreateObject("wscript.shell")
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set objFSO = CreateObject("Scripting.FileSystemObject")
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send
If oXMLHttp.Status = 200 Then
GoogleNews = Extract(oXMLHttp.responseText)
Set write2File = objFSO.CreateTextFile(".\GoogleNews.txt",True,-1)
write2File.WriteLine(GoogleNews)
write2File.Close
ws.run ".\GoogleNews.txt"
End If
'-------------------------------------------------------------------------
Function Extract(Data)
Dim re,Match,Matches
Set re = New RegExp
re.Global = True
re.IgnoreCase = True
re.MultiLine = True
re.Pattern = "(\bclass=""DY5T1d"" >)(.+?)<\/a>"
Set Matches = re.Execute(Data)
For Each Match in Matches
GoogleNews = GoogleNews & Match.SubMatches(1) & vbCrlf & vbCrlf
Next
GoogleNews = Replace(GoogleNews,"'","'")
GoogleNews = Replace(GoogleNews,""",chr(34))
Extract = GoogleNews
End Function
'-------------------------------------------------------------------------