VB.net Html 转换器错误,没有找零?
VB.net Html convertor error giving no change?
网络语言。
我想替换 html 第一个标签并保留文本结构,我已经从网站 https://beansoftware.com/ASP.NET-Tutorials/Convert-HTML-To-Plain-Text.aspx
尝试了下面的代码
Dim html As String = "<div class='WordSection1'><p class='MsoNormal'>"
Dim final_result As String
Dim sbhtml As StringBuilder = New StringBuilder(html)
Dim OldWords() As String = {" ", "&", """, "<", ">", "®", "©", "•", "™"}
Dim NewWords() As String = {" ", "&", """", "<", ">", "®", "©", "•", "™"}
For i As Integer = 0 To i < OldWords.Length
sbhtml.Replace(OldWords(i), NewWords(i))
Next i
Console.WriteLine($"result after loop : {sbhtml}")
sbhtml.Replace("<br>", "\n<br>")
sbhtml.Replace("<br ", "\n<br ")
sbhtml.Replace("<p ", "\n<p ")
final_result = Regex.Replace(sbhtml.ToString(), "<[^>]*>", "")
Console.WriteLine(final_result)
但是返回的输出与字符串相同
for 语句错误。应该是
For i As Integer = 0 To OldWords.Length - 1
可能泄露了一些 C# 语法。
为什么不将 sbhtml.Replace("<br>", "\n<br>")
和后续行附加到 OldWords
和 NewWords
?它们在技术上没有任何不同。
通过使用元组,您可以将新旧单词放入同一个数组中并使用 For-Each 循环
我建议采用以下方法
Dim html As String =
"<div class='WordSection1'>aaa<br>bbb<p class='MsoNormal'>"
Dim final_result As String
Dim sbhtml As StringBuilder = New StringBuilder(html)
Dim Substitutions() As (old As String, repl As String) = {
(" ", " "), ("&", "&"), (""", """"), ("<", "<"),
(">", ">"), ("®", "®"), ("©", "©"), ("•", "•"),
("™", "â„¢"), ("<br>", "\n<br>"), ("<br ", "\n<br "), ("<p ", "\n<p ")}
For Each subst In Substitutions
sbhtml.Replace(subst.old, subst.repl)
Next
Console.WriteLine($"result after loop : {sbhtml}")
final_result = Regex.Replace(sbhtml.ToString(), "<[^>]*>", "")
Console.WriteLine(final_result)
HtmlAgilityPack 在 HTML 操纵方面做得非常好,非常可靠。你会做类似
的事情
Dim plainText As String = HtmlUtilities.ConvertToPlainText(html)
有关 HtmlAgilityPack 的轻松安装,请参阅 Install and manage packages in Visual Studio using the NuGet Package Manager。
网络语言。 我想替换 html 第一个标签并保留文本结构,我已经从网站 https://beansoftware.com/ASP.NET-Tutorials/Convert-HTML-To-Plain-Text.aspx
尝试了下面的代码Dim html As String = "<div class='WordSection1'><p class='MsoNormal'>"
Dim final_result As String
Dim sbhtml As StringBuilder = New StringBuilder(html)
Dim OldWords() As String = {" ", "&", """, "<", ">", "®", "©", "•", "™"}
Dim NewWords() As String = {" ", "&", """", "<", ">", "®", "©", "•", "™"}
For i As Integer = 0 To i < OldWords.Length
sbhtml.Replace(OldWords(i), NewWords(i))
Next i
Console.WriteLine($"result after loop : {sbhtml}")
sbhtml.Replace("<br>", "\n<br>")
sbhtml.Replace("<br ", "\n<br ")
sbhtml.Replace("<p ", "\n<p ")
final_result = Regex.Replace(sbhtml.ToString(), "<[^>]*>", "")
Console.WriteLine(final_result)
但是返回的输出与字符串相同
for 语句错误。应该是
For i As Integer = 0 To OldWords.Length - 1
可能泄露了一些 C# 语法。
为什么不将 sbhtml.Replace("<br>", "\n<br>")
和后续行附加到 OldWords
和 NewWords
?它们在技术上没有任何不同。
通过使用元组,您可以将新旧单词放入同一个数组中并使用 For-Each 循环
我建议采用以下方法
Dim html As String =
"<div class='WordSection1'>aaa<br>bbb<p class='MsoNormal'>"
Dim final_result As String
Dim sbhtml As StringBuilder = New StringBuilder(html)
Dim Substitutions() As (old As String, repl As String) = {
(" ", " "), ("&", "&"), (""", """"), ("<", "<"),
(">", ">"), ("®", "®"), ("©", "©"), ("•", "•"),
("™", "â„¢"), ("<br>", "\n<br>"), ("<br ", "\n<br "), ("<p ", "\n<p ")}
For Each subst In Substitutions
sbhtml.Replace(subst.old, subst.repl)
Next
Console.WriteLine($"result after loop : {sbhtml}")
final_result = Regex.Replace(sbhtml.ToString(), "<[^>]*>", "")
Console.WriteLine(final_result)
HtmlAgilityPack 在 HTML 操纵方面做得非常好,非常可靠。你会做类似
的事情Dim plainText As String = HtmlUtilities.ConvertToPlainText(html)
有关 HtmlAgilityPack 的轻松安装,请参阅 Install and manage packages in Visual Studio using the NuGet Package Manager。