VB.net Html 转换器错误,没有找零?

VB.net Html convertor error giving no change?

网络语言。 我想替换 html 第一个标签并保留文本结构,我已经从网站 https://beansoftware.com/ASP.NET-Tutorials/Convert-HTML-To-Plain-Text.aspx

尝试了下面的代码
Dim html As String = "<div class='WordSection1'><p class='MsoNormal'>"
Dim final_result As String

Dim sbhtml As StringBuilder = New StringBuilder(html)
Dim OldWords() As String = {" ", "&", """, "<", ">", "®", "©", "•", "™"}
Dim NewWords() As String = {" ", "&", """", "<", ">", "®", "©", "•", "™"}
For i As Integer = 0 To i < OldWords.Length
    sbhtml.Replace(OldWords(i), NewWords(i))
Next i
Console.WriteLine($"result after loop : {sbhtml}")

sbhtml.Replace("<br>", "\n<br>")
sbhtml.Replace("<br ", "\n<br ")
sbhtml.Replace("<p ", "\n<p ")

final_result = Regex.Replace(sbhtml.ToString(), "<[^>]*>", "")

Console.WriteLine(final_result)

但是返回的输出与字符串相同

for 语句错误。应该是

For i As Integer = 0 To OldWords.Length - 1

可能泄露了一些 C# 语法。

为什么不将 sbhtml.Replace("<br>", "\n<br>") 和后续行附加到 OldWordsNewWords?它们在技术上没有任何不同。

通过使用元组,您可以将新旧单词放入同一个数组中并使用 For-Each 循环

我建议采用以下方法

Dim html As String =
    "&lt;div class='WordSection1'&gt;aaa<br>bbb&lt;p class='MsoNormal'&gt;"
Dim final_result As String

Dim sbhtml As StringBuilder = New StringBuilder(html)
Dim Substitutions() As (old As String, repl As String) = {
    ("&nbsp;", " "), ("&amp;", "&"), ("&quot;", """"), ("&lt;", "<"),
    ("&gt;", ">"), ("&reg;", "®"), ("&copy;", "©"), ("&bull;", "•"),
    ("&trade;", "â„¢"), ("<br>", "\n<br>"), ("<br ", "\n<br "), ("<p ", "\n<p ")}
For Each subst In Substitutions
    sbhtml.Replace(subst.old, subst.repl)
Next
Console.WriteLine($"result after loop : {sbhtml}")

final_result = Regex.Replace(sbhtml.ToString(), "<[^>]*>", "")

Console.WriteLine(final_result)

HtmlAgilityPack 在 HTML 操纵方面做得非常好,非常可靠。你会做类似

的事情
Dim plainText As String = HtmlUtilities.ConvertToPlainText(html)

有关 HtmlAgilityPack 的轻松安装,请参阅 Install and manage packages in Visual Studio using the NuGet Package Manager