如何使用 vbscript 从标签或 HTML class 中提取数据?只有我选择的标签或 classes

How to extract data from tag or HTML class using vbscript? Only tags or classes I choose

有人可以帮助我吗?我需要使用 VBScript 提取标签或 HTML 类 之间的文本并保存到单个文本文件中。我需要保存标签或 类 我在不同行上定义的。

我在 Internet 上找到了很多代码,但 none 按预期工作。

例如,我有下面的代码,但是我不能通过它提取类,而且也不能超过一个标签。在许多情况下,代码甚至不起作用。

myURL = "http://rss.cnn.com/rss/edition.rss"

Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set ohtmlFile = CreateObject("htmlfile")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Const ForReading = 1, ForWriting = 2, ForAppending = 8

oXMLHttp.Open "GET", myURL, False
oXMLHttp.send

If oXMLHttp.Status = 200 Then

ohtmlFile.Write oXMLHttp.responseText
ohtmlFile.Close

Set oTable = ohtmlFile.getElementsByTagName("description")
sFileName = "c:\users\user\desktop\News.txt"
Set objFile = objFSO.OpenTextFile(sFileName, ForAppending, True)
For Each oTab In oTable
    objFile.Write oTab.Innertext & vbCrLf
Next
objFile.Close
End If

WScript.Quit

谢谢!

getElementsByTagName Method 与您的 ohtmlFile 对象一起使用,您走在了正确的轨道上。您可以指定所需的标签类型。例如:

Set objAnchors = ohtmlFile.getElementsByTagName("a")

returns HTML 文档中的所有 <a> 标签。

不过,您可以改用正则表达式:

Option Explicit
Dim myURL,oXMLHttp,objFSO,Description,write2File,ws
myURL = "http://rss.cnn.com/rss/edition.rss"
set ws = CreateObject("wscript.shell")
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set objFSO = CreateObject("Scripting.FileSystemObject")
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send

If oXMLHttp.Status = 200 Then
    Description = Extract(oXMLHttp.responseText)
    Set write2File = objFSO.CreateTextFile(".\News.txt",True)
    write2File.WriteLine(Description)
    write2File.Close
    ws.run ".\News.txt"
End If
'-------------------------------------------------------------------------
Function Extract(Data)  
    Dim re,Match,Matches
     Set re = New RegExp 
     re.Global = True 
     re.IgnoreCase = True  
     re.Pattern = "<description><!\[CDATA\[([\s\S]*?)\]\]><\/description>" 
     Set Matches = re.Execute(Data)
     For Each Match in Matches
         Description = Description & Match.SubMatches(0) & vbCrlf & vbCrlf
     Next  
    Extract = Description
End Function
'-------------------------------------------------------------------------

编辑:

关于如何从 google 获取新闻的第二个请求:

Option Explicit
Dim myURL,oXMLHttp,objFSO,GoogleNews,write2File,ws
myURL = "https://news.google.com/?hl=en-US&gl=US&ceid=US:en"
set ws = CreateObject("wscript.shell")
Set oXMLHttp = CreateObject("MSXML2.XMLHTTP")
Set objFSO = CreateObject("Scripting.FileSystemObject")
oXMLHttp.Open "GET", myURL, False
oXMLHttp.send

If oXMLHttp.Status = 200 Then
    GoogleNews = Extract(oXMLHttp.responseText)
    Set write2File = objFSO.CreateTextFile(".\GoogleNews.txt",True,-1)
    write2File.WriteLine(GoogleNews)
    write2File.Close
    ws.run ".\GoogleNews.txt"
End If
'-------------------------------------------------------------------------
Function Extract(Data)  
    Dim re,Match,Matches
     Set re = New RegExp 
     re.Global = True 
     re.IgnoreCase = True  
     re.MultiLine = True
     re.Pattern = "(\bclass=""DY5T1d"" >)(.+?)<\/a>" 
     Set Matches = re.Execute(Data)
     For Each Match in Matches
         GoogleNews = GoogleNews & Match.SubMatches(1) & vbCrlf & vbCrlf
     Next  
    GoogleNews = Replace(GoogleNews,"&#39;","'")
    GoogleNews = Replace(GoogleNews,"&quot;",chr(34))
    Extract = GoogleNews
End Function
'-------------------------------------------------------------------------