删除 python3 中 html 中标签之间的换行符
Removing newlines between tags in html in python3
我想要 trim 把所有的空格和新行都去掉,然后把结果从
<title>
Asian Case Research Journal (World Scientific)
</title>
到这个
<title>Asian Case Research Journal (World Scientific)</title>
我的代码:
for link in url_list:
try:
r = requests.get(link)
soup = BeautifulSoup(r.content,"html.parser")
print(soup.title)
except:
print("No Title Found ")
continue
试试这个并根据您的用例修改它。
desired_string = ''.join([x.strip() for x in str(soup.title).split('\r\n')])
soup.title.text.strip()
应该做
import bs4
html = '''<title>
Asian Case Research Journal (World Scientific)
</title>'''
soup = bs4.BeautifulSoup(html, 'lxml')
title = soup.title
title.string = title.get_text(strip=True)
print(str(title))
输出:
<title>Asian Case Research Journal (World Scientific)</title>
在bs4中,tag是一个String属性的Object,可以用.
表示法访问或修改,用[=13=将tag对象转为pythonstr对象]
我想要 trim 把所有的空格和新行都去掉,然后把结果从
<title>
Asian Case Research Journal (World Scientific)
</title>
到这个
<title>Asian Case Research Journal (World Scientific)</title>
我的代码:
for link in url_list:
try:
r = requests.get(link)
soup = BeautifulSoup(r.content,"html.parser")
print(soup.title)
except:
print("No Title Found ")
continue
试试这个并根据您的用例修改它。
desired_string = ''.join([x.strip() for x in str(soup.title).split('\r\n')])
soup.title.text.strip()
应该做
import bs4
html = '''<title>
Asian Case Research Journal (World Scientific)
</title>'''
soup = bs4.BeautifulSoup(html, 'lxml')
title = soup.title
title.string = title.get_text(strip=True)
print(str(title))
输出:
<title>Asian Case Research Journal (World Scientific)</title>
在bs4中,tag是一个String属性的Object,可以用.
表示法访问或修改,用[=13=将tag对象转为pythonstr对象]