如何在保留制表符和其他有效纯文本布局的同时将 HTML 转换为纯文本
HOW TO Convert HTML to plain-text while retaining Tabs and other valid plain-text layout
WRT this solution,请问我们如何使其适应retain tabs和其他valid纯文本布局
参考方案:
public static string StripHTML(string HTMLText, bool decode = true)
{
Regex reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
var stripped = reg.Replace(HTMLText, "");
return decode ? HttpUtility.HtmlDecode(stripped) : stripped;
}
我不确定你的意思,它确实保留制表符和换行符
void Main()
{
var html = "<html>\n\t<body>\n\t\tBody text!\n\t</body>\n</html>";
StripHTML(html).Dump(); //Prints "\n\t\n\t\tBody text!\n\t\n
}
public static string StripHTML(string HTMLText, bool decode = true)
{
Regex reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
var stripped = reg.Replace(HTMLText, "");
return decode ? HttpUtility.HtmlDecode(stripped) : stripped;
}
WRT this solution,请问我们如何使其适应retain tabs和其他valid纯文本布局
参考方案:
public static string StripHTML(string HTMLText, bool decode = true)
{
Regex reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
var stripped = reg.Replace(HTMLText, "");
return decode ? HttpUtility.HtmlDecode(stripped) : stripped;
}
我不确定你的意思,它确实保留制表符和换行符
void Main()
{
var html = "<html>\n\t<body>\n\t\tBody text!\n\t</body>\n</html>";
StripHTML(html).Dump(); //Prints "\n\t\n\t\tBody text!\n\t\n
}
public static string StripHTML(string HTMLText, bool decode = true)
{
Regex reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
var stripped = reg.Replace(HTMLText, "");
return decode ? HttpUtility.HtmlDecode(stripped) : stripped;
}