HTML 在 Golang 中最后一次出现模式后的模板解析提取 header

HTML Template parsing extract header after last occurence of a pattern in Golang

我有一个 HTML 代码作为 golang 字符串,我想在最后一次出现模式后从中提取特定的 header。举例说明:

    func main() {
    h := `
<html>
 <body>
  <a name="0"> text </a>
  <a name="1"> abc </a>
  <a name="2"> def ghi jkl </a>
  <a name="3"> abc </a>
  <a name="4"> Some text </a>
 </body>
</html>`

    pattern := "abc"

    // Now I want <a name="3"> to be printed. I mean, when someone
    // searches for the pattern abc, the last occurence is the <a>
    // section with the name "3". If the pattern is "def" then "2"
    // should be printed, if the pattern is "text" then 4 should
    // be printed

}

知道我该怎么做吗?我试过模板和扫描仪包,但无法正常工作。

这取决于 html 输入是什么。 You may be able to get away with using regexp, but if you're working with arbitrary html, you're going to have to use a full html parser, such as https://godoc.org/golang.org/x/net/html.

例如,使用 goquery(使用 x/net/html):

package main

import (
        "fmt"
        "strings"

        "github.com/PuerkitoBio/goquery"
)

func main() {
        h := `
<html>
 <body>
  <a name="0"> text </a>
  <a name="1"> abc </a>
  <a name="2"> def ghi jkl </a>
  <a name="3"> abc </a>
  <a name="4"> Some text </a>
 </body>
</html>`

        pattern := "abc"

        doc, err := goquery.NewDocumentFromReader(strings.NewReader(h))
        if err != nil {
                panic(err)
        }

        doc.Find("a").Each(func(i int, s *goquery.Selection) {
                if strings.TrimSpace(s.Text()) == pattern {
                        name, ok := s.Attr("name")
                        if ok {
                                fmt.Println(name)
                        }
                }
        })

}

编辑:或者您可以使用 contains selector 代替 doc.Find 部分,具体取决于您的实际输入:

// Don't do this if pattern is arbitrary user input
name, ok := doc.Find(fmt.Sprintf("a:contains(%s)", pattern)).Last().Attr("name")
if ok {
        fmt.Println(name)
}

您可以使用 xquery 使用 XPath,它可以简化您的代码。

package main

import (
    "fmt"
    "strings"
    "github.com/antchfx/xquery/html"
    "golang.org/x/net/html"
)

func main() {
    htmlstr := `<html>
    <body>
    <a name="0"> text </a>
    <a name="1"> abc </a>
    <a name="2"> def ghi jkl </a>
    <a name="3"> abc </a>
    <a name="4"> Some text </a>
    </body>
    </html>`
    root, err := html.Parse(strings.NewReader(htmlstr))
    if err != nil {
        panic(err)
    }
    node := htmlquery.FindOne(root, "//a[normalize-space(text())='abc']")
    fmt.Println(htmlquery.InnerText(node))
}