HTML 在 Golang 中最后一次出现模式后的模板解析提取 header
HTML Template parsing extract header after last occurence of a pattern in Golang
我有一个 HTML 代码作为 golang 字符串,我想在最后一次出现模式后从中提取特定的 header。举例说明:
func main() {
h := `
<html>
<body>
<a name="0"> text </a>
<a name="1"> abc </a>
<a name="2"> def ghi jkl </a>
<a name="3"> abc </a>
<a name="4"> Some text </a>
</body>
</html>`
pattern := "abc"
// Now I want <a name="3"> to be printed. I mean, when someone
// searches for the pattern abc, the last occurence is the <a>
// section with the name "3". If the pattern is "def" then "2"
// should be printed, if the pattern is "text" then 4 should
// be printed
}
知道我该怎么做吗?我试过模板和扫描仪包,但无法正常工作。
这取决于 html 输入是什么。 You may be able to get away with using regexp, but if you're working with arbitrary html, you're going to have to use a full html parser, such as https://godoc.org/golang.org/x/net/html.
例如,使用 goquery(使用 x/net/html):
package main
import (
"fmt"
"strings"
"github.com/PuerkitoBio/goquery"
)
func main() {
h := `
<html>
<body>
<a name="0"> text </a>
<a name="1"> abc </a>
<a name="2"> def ghi jkl </a>
<a name="3"> abc </a>
<a name="4"> Some text </a>
</body>
</html>`
pattern := "abc"
doc, err := goquery.NewDocumentFromReader(strings.NewReader(h))
if err != nil {
panic(err)
}
doc.Find("a").Each(func(i int, s *goquery.Selection) {
if strings.TrimSpace(s.Text()) == pattern {
name, ok := s.Attr("name")
if ok {
fmt.Println(name)
}
}
})
}
编辑:或者您可以使用 contains selector 代替 doc.Find
部分,具体取决于您的实际输入:
// Don't do this if pattern is arbitrary user input
name, ok := doc.Find(fmt.Sprintf("a:contains(%s)", pattern)).Last().Attr("name")
if ok {
fmt.Println(name)
}
您可以使用 xquery 使用 XPath,它可以简化您的代码。
package main
import (
"fmt"
"strings"
"github.com/antchfx/xquery/html"
"golang.org/x/net/html"
)
func main() {
htmlstr := `<html>
<body>
<a name="0"> text </a>
<a name="1"> abc </a>
<a name="2"> def ghi jkl </a>
<a name="3"> abc </a>
<a name="4"> Some text </a>
</body>
</html>`
root, err := html.Parse(strings.NewReader(htmlstr))
if err != nil {
panic(err)
}
node := htmlquery.FindOne(root, "//a[normalize-space(text())='abc']")
fmt.Println(htmlquery.InnerText(node))
}
我有一个 HTML 代码作为 golang 字符串,我想在最后一次出现模式后从中提取特定的 header。举例说明:
func main() {
h := `
<html>
<body>
<a name="0"> text </a>
<a name="1"> abc </a>
<a name="2"> def ghi jkl </a>
<a name="3"> abc </a>
<a name="4"> Some text </a>
</body>
</html>`
pattern := "abc"
// Now I want <a name="3"> to be printed. I mean, when someone
// searches for the pattern abc, the last occurence is the <a>
// section with the name "3". If the pattern is "def" then "2"
// should be printed, if the pattern is "text" then 4 should
// be printed
}
知道我该怎么做吗?我试过模板和扫描仪包,但无法正常工作。
这取决于 html 输入是什么。 You may be able to get away with using regexp, but if you're working with arbitrary html, you're going to have to use a full html parser, such as https://godoc.org/golang.org/x/net/html.
例如,使用 goquery(使用 x/net/html):
package main
import (
"fmt"
"strings"
"github.com/PuerkitoBio/goquery"
)
func main() {
h := `
<html>
<body>
<a name="0"> text </a>
<a name="1"> abc </a>
<a name="2"> def ghi jkl </a>
<a name="3"> abc </a>
<a name="4"> Some text </a>
</body>
</html>`
pattern := "abc"
doc, err := goquery.NewDocumentFromReader(strings.NewReader(h))
if err != nil {
panic(err)
}
doc.Find("a").Each(func(i int, s *goquery.Selection) {
if strings.TrimSpace(s.Text()) == pattern {
name, ok := s.Attr("name")
if ok {
fmt.Println(name)
}
}
})
}
编辑:或者您可以使用 contains selector 代替 doc.Find
部分,具体取决于您的实际输入:
// Don't do this if pattern is arbitrary user input
name, ok := doc.Find(fmt.Sprintf("a:contains(%s)", pattern)).Last().Attr("name")
if ok {
fmt.Println(name)
}
您可以使用 xquery 使用 XPath,它可以简化您的代码。
package main
import (
"fmt"
"strings"
"github.com/antchfx/xquery/html"
"golang.org/x/net/html"
)
func main() {
htmlstr := `<html>
<body>
<a name="0"> text </a>
<a name="1"> abc </a>
<a name="2"> def ghi jkl </a>
<a name="3"> abc </a>
<a name="4"> Some text </a>
</body>
</html>`
root, err := html.Parse(strings.NewReader(htmlstr))
if err != nil {
panic(err)
}
node := htmlquery.FindOne(root, "//a[normalize-space(text())='abc']")
fmt.Println(htmlquery.InnerText(node))
}