如何提取 HTML table 内容到 DataTable
how to extract HTML table contents to DataTable
我有 this html 页面,页面中的内容如下所示
我正在尝试将页面中的内容提取到 DataTable 中并将其显示到网格中
例如
<a href='/exodus-5.1/bacon/exodus-5.1-20150612-NIGHTLY-bacon.zip'>exodus-5.1-20150612-NIGHTLY-bacon.zip</a>
我还需要获取 link 的名称以及 uri
名称:- exodus-5.1-20150612-NIGHTLY-bacon.zip
uri : - /exodus-5.1/bacon/exodus-5.1-20150612-NIGHTLY-bacon.zip
以下是我最后的结果
Dim request As HttpWebRequest = HttpWebRequest.Create(url)
request.Method = WebRequestMethods.Http.Get
Dim response As HttpWebResponse = request.GetResponse()
Dim reader As New StreamReader(response.GetResponseStream())
Dim webpageContents As String = reader.ReadToEnd()
response.Close()
虽然不是 VB.Net,但使用另一种 .Net 语言 F# 和可通过 Nuget 获得的 HTML Type Provider which is part of the FSharp.Data project 是一项非常容易实现的任务。
HTML 类型提供程序使您可以在 Visual Studio 中对 HTML 文档进行类型化访问,即
// Reference the FSharp.Data Nuget package
#r @".\packages\FSharp.Data.2.2.3\lib\net40\FSharp.Data.dll"
// Type provider over your HTML document specified in yourUrl
type html = FSharp.Data.HtmlProvider<yourUrl>
// Get the rows from the HTML table in the page
let allRows = html.GetSample().Tables.Table1.Rows |> Seq.skip 1
// Skip empty rows
let validRows = allRows |> Seq.where (fun row -> row.Name <> "")
然后将有效行加载到数据表中:
// Reference the System.Data assembly
#r "System.Data.dll"
// Create a DataTable
let table = new System.Data.DataTable()
// Add column names to the table
for name in ["Parent";"Name";"Last modified";"Size"] do table.Columns.Add(name) |> ignore
// Add row values to the table
for row in validRows do
table.Rows.Add(row.Column1, row.Name, row.``Last modified``, row.Size) |> ignore
最后在表单上显示数据表:
// Reference the Windows.Forms assembly
#r "System.Windows.Forms.dll"
open System.Windows.Forms
// Create a form
let form = new Form(Width=480,Height=320)
// Initialise a grid
let grid = new DataGridView(Dock=DockStyle.Fill)
form.Controls.Add(grid)
// Set the grid data source with the table
form.Load.Add(fun _ -> grid.DataSource <- table)
form.Show()
它以以下形式显示填充的 DataGrid:
我有 this html 页面,页面中的内容如下所示
我正在尝试将页面中的内容提取到 DataTable 中并将其显示到网格中
例如
<a href='/exodus-5.1/bacon/exodus-5.1-20150612-NIGHTLY-bacon.zip'>exodus-5.1-20150612-NIGHTLY-bacon.zip</a>
我还需要获取 link 的名称以及 uri
名称:- exodus-5.1-20150612-NIGHTLY-bacon.zip
uri : - /exodus-5.1/bacon/exodus-5.1-20150612-NIGHTLY-bacon.zip
以下是我最后的结果
Dim request As HttpWebRequest = HttpWebRequest.Create(url)
request.Method = WebRequestMethods.Http.Get
Dim response As HttpWebResponse = request.GetResponse()
Dim reader As New StreamReader(response.GetResponseStream())
Dim webpageContents As String = reader.ReadToEnd()
response.Close()
虽然不是 VB.Net,但使用另一种 .Net 语言 F# 和可通过 Nuget 获得的 HTML Type Provider which is part of the FSharp.Data project 是一项非常容易实现的任务。
HTML 类型提供程序使您可以在 Visual Studio 中对 HTML 文档进行类型化访问,即
// Reference the FSharp.Data Nuget package
#r @".\packages\FSharp.Data.2.2.3\lib\net40\FSharp.Data.dll"
// Type provider over your HTML document specified in yourUrl
type html = FSharp.Data.HtmlProvider<yourUrl>
// Get the rows from the HTML table in the page
let allRows = html.GetSample().Tables.Table1.Rows |> Seq.skip 1
// Skip empty rows
let validRows = allRows |> Seq.where (fun row -> row.Name <> "")
然后将有效行加载到数据表中:
// Reference the System.Data assembly
#r "System.Data.dll"
// Create a DataTable
let table = new System.Data.DataTable()
// Add column names to the table
for name in ["Parent";"Name";"Last modified";"Size"] do table.Columns.Add(name) |> ignore
// Add row values to the table
for row in validRows do
table.Rows.Add(row.Column1, row.Name, row.``Last modified``, row.Size) |> ignore
最后在表单上显示数据表:
// Reference the Windows.Forms assembly
#r "System.Windows.Forms.dll"
open System.Windows.Forms
// Create a form
let form = new Form(Width=480,Height=320)
// Initialise a grid
let grid = new DataGridView(Dock=DockStyle.Fill)
form.Controls.Add(grid)
// Set the grid data source with the table
form.Load.Add(fun _ -> grid.DataSource <- table)
form.Show()
它以以下形式显示填充的 DataGrid: