删除 python3 中 html 中标签之间的换行符

Question

我想要 trim 把所有的空格和新行都去掉，然后把结果从

<title>

     Asian Case Research Journal (World Scientific)

</title>

到这个

<title>Asian Case Research Journal (World Scientific)</title>

我的代码：

for link in url_list:
    try:
    r = requests.get(link)
    soup = BeautifulSoup(r.content,"html.parser")
    print(soup.title)
except:
    print("No Title Found ")
    continue

Answer 1

试试这个并根据您的用例修改它。

desired_string = ''.join([x.strip() for x in str(soup.title).split('\r\n')])

Answer 2

soup.title.text.strip()应该做

Answer 3

import bs4

html = '''<title>

     Asian Case Research Journal (World Scientific)

</title>'''
soup = bs4.BeautifulSoup(html, 'lxml')
title = soup.title
title.string = title.get_text(strip=True)
print(str(title))

输出：

<title>Asian Case Research Journal (World Scientific)</title>

在bs4中，tag是一个String属性的Object，可以用.表示法访问或修改，用[=13=将tag对象转为pythonstr对象]

文档：modifying-string

删除 python3 中 html 中标签之间的换行符

Removing newlines between tags in html in python3

python

beautifulsoup

html-parsing

removing-whitespace

python-3.x