Python: 如何在不导入模块的情况下删除两个分隔符之间的文本

Question

我搜索了很多主题，但它们都需要导入（BeautifulSoup，正则表达式）。输入是一个包含多次分隔符（'<'，'>'）的大字符串我听说配对标签是一种很好的技术，但我不确定如何去做。

示例（非常小）输入：实际输入的是一整HTML个代码。

<!DOCTYPE html>
<html>
example
<head>
hello
<meta charset="utf-8">
example2
<meta/>

期望的输出：

example hello example2

Answer 1

这是一个使用简单循环的简单易懂的方法：

str = '<!DOCTYPE html><html>example<head>hello<meta charset="utf-8">'
words = []
temp = ""
flag = 0
for i in str:
    if i=="<":
        flag = 0
        if temp:
            words.append(temp)
            temp = ""
    elif i==">":
        flag=1
    else:
        if flag==1:
            temp += i
print(words)   # prints ['example', 'hello']

Answer 2

将 tag_depth 变量初始化为零。

一次遍历字符串一个字符。如果您看到 < 字符，则递增 tag_depth，如果您看到 > 字符，则将其递减。如果您看到任何其他字符并且 tag_depth 为零，则输出该字符。

tag_depth = 0
for c in mystring:
    if c == '<':
        tag_depth += 1
    elif c == '>':
        tag_depth -= 1
    elif tag_depth == 0:
        print(f"{c}", end=0)

Python: 如何在不导入模块的情况下删除两个分隔符之间的文本

Python: How do I remove text between two delimiters WITHOUT IMPORTING MODULE

python

tags

strip

delimiter