我如何从 html 源中的一行解析 link？

Question

public partial class Form1 : Form
{
    string downloaddifrectory;
    string mainurl = "http://www.usgodae.org/ftp/outgoing/fnmoc/models/navgem_0.5/latest_data/";
    List<string> parsedlinks = new List<string>();
    string path_exe = Path.GetDirectoryName(Application.LocalUserAppDataPath);

    public Form1()
    {
        InitializeComponent();

        Parseanddownloadfiles();
    }

    private void Parseanddownloadfiles()
    {
        using (WebClient client = new WebClient())
        {
            client.DownloadFile(mainurl, path_exe + "\page.html");
        }

        string firsttag = "href";
        string lasttag = ">";
        int index = 0;
        string[] lines = File.ReadAllLines(path_exe + "\page.html");
        for (int i = 0; i < lines.Length; i++)
        {
            if (lines[i].Contains("href"))
            {
                int first = lines[i].IndexOf(firsttag, index);
                string result = lines[i].Substring(first + 2,);
            }
        }
    }

    private void Form1_Load(object sender, EventArgs e)
    {

    }
}

在这种情况下，我尝试使用索引和子字符串。这是 html 页面源视图的 link：

Source View

例如，在源视图中，源视图中的一行是：

<img src="/icons/unknown.gif" alt="[   ]"> <a href="US058GCOM-GR1mdl.0018_0056_00000F0RL2015110900_0001_000000-000000grnd_sea_temp">US058GCOM-GR1mdl.0018_0056_00000F0RL2015110900_0001_000000-000000grnd_sea_temp</a>       09-Nov-2015 04:23  444K

如果我右键单击该部分：

US058GCOM-GR1mdl.0018_0056_00000F0RL2015110900_0001_000000-000000grnd_sea_temp

我可以复制我得到的 link 地址：

http://www.usgodae.org/ftp/outgoing/fnmoc/models/navgem_0.5/latest_data/US058GCOM-GR1mdl.0018_0056_00000F0RL2015110900_0001_000000-000000grnd_sea_temp

如果我现在将此 ftp link 粘贴到我的浏览器，它将下载该文件。

我需要做的主要目标是在每一行中下载所有具有这种 links 的文件。

Answer 1

要解析 html 页面，请使用 html 解析器，例如 HtmlAgilityPack。

这是一个工作代码

var web = new HtmlAgilityPack.HtmlWeb();
var doc = web.Load("http://www.usgodae.org/ftp/outgoing/fnmoc/models/navgem_0.5/latest_data/");

var links = doc.DocumentNode.SelectNodes("//a[@href]")
            .Select(x => x.Attributes["href"].Value)
            .ToList();

现在您可以使用HttpClient、HttpWebRequest 或WebClient 下载文件。

我如何从 html 源中的一行解析 link？

How can i parse a link from a line in a html source?

.net

c#

winforms