获取括号之间的单个字符串
Get individual strings between brackets
假设我有这个字符串
[LEVEL]
[NAME]The Girder Guide! [/NAME]
[AUTHOR]draworigami[/AUTHOR]
[AUTHORLEVEL]11[/AUTHORLEVEL]
[COUNTRY]CA[/COUNTRY]
[ID]62784[/ID]
[RATING]4[/RATING]
[DATE]2021-05-11 23:08:35[/DATE]
[PLAYCOUNT]33[/PLAYCOUNT]
[WINCOUNT]28[/WINCOUNT]
[STARS]0[/STARS]
[COMMENTS]1[/COMMENTS]
[/LEVEL]
有没有办法获取每个 [] 和 [/] 之间的单独字符串?我一直在用互联网上的代码对其进行拍摄,但无济于事。
试试这个:
st = "[LEVEL][NAME]The Girder Guide![/NAME][AUTHOR]draworigami[/AUTHOR][AUTHORLEVEL]11[/AUTHORLEVEL][COUNTRY]CA[/COUNTRY][ID]62784[/ID][RATING]4[/RATING][DATE]2021-05-11 23:08:35[/DATE][PLAYCOUNT]33[/PLAYCOUNT][WINCOUNT]28[/WINCOUNT][STARS]0[/STARS][COMMENTS]1[/COMMENTS][/LEVEL]"
st = st.split("]")
for i in range(len(st)):
st[i] = st[i].replace("[", "")
st[i]= st[i].replace("/", "")
st = st[:-1]
print(st)
st变成-
['LEVEL', 'NAME', 'The Girder Guide!NAME', 'AUTHOR', 'draworigamiAUTHOR', 'AUTHORLEVEL', '11AUTHORLEVEL', 'COUNTRY', 'CACOUNTRY', 'ID', '62784ID', 'RATING', '4RATING', 'DATE', '2021-05-11 23:08:35DATE', 'PLAYCOUNT', '33PLAYCOUNT', 'WINCOUNT', '28WINCOUNT', 'STARS', '0STARS', 'COMMENTS', '1COMMENTS', 'LEVEL']
我做了什么:
- 围绕
]
拆分字符串,以便获得不含字符“]”的字符串列表。
- 简单地从得到的列表中的字符串中分别删除字符
[
和/
。
- 跳过最后一个字符,因为它是由于
split
. 生成的空字符串
这将 return [] 和 [/] 之间的所有文本:
from bs4 import BeautifulSoup
rml = """
[LEVEL]
[NAME]The Girder Guide! [/NAME]
[AUTHOR]draworigami[/AUTHOR]
[AUTHORLEVEL]11[/AUTHORLEVEL]
[COUNTRY]CA[/COUNTRY]
[ID]62784[/ID]
[RATING]4[/RATING]
[DATE]2021-05-11 23:08:35[/DATE]
[PLAYCOUNT]33[/PLAYCOUNT]
[WINCOUNT]28[/WINCOUNT]
[STARS]0[/STARS]
[COMMENTS]1[/COMMENTS]
[/LEVEL]
"""
html = rml.replace('[', '<').replace(']', '>')
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('level').text)
输出:
The Girder Guide!
draworigami
11
CA
62784
4
2021-05-11 23:08:35
33
28
0
1
编辑 #1: 原始字符串没有换行符,因此要漂亮打印:
rml = "[LEVEL][NAME]The Girder Guide![/NAME][AUTHOR]draworigami[/AUTHOR][AUTHORLEVEL]11[/AUTHORLEVEL][COUNTRY]CA[/COUNTRY][ID]62784[/ID][RATING]4[/RATING][DATE]2021-05-11 23:08:35[/DATE][PLAYCOUNT]33[/PLAYCOUNT][WINCOUNT]28[/WINCOUNT][STARS]0[/STARS][COMMENTS]1[/COMMENTS][/LEVEL]"
html = rml.replace('[', '<').replace(']', '>')
soup = BeautifulSoup(html, 'html.parser')
elements = soup.find('level').contents
for e in elements:
print(e.text)
使用正则表达式如何?
import re
s = '[LEVEL][NAME]The Girder Guide![/NAME][AUTHOR]draworigami[/AUTHOR][AUTHORLEVEL]11[/AUTHORLEVEL][COUNTRY]CA[/COUNTRY][ID]62784[/ID][RATING]4[/RATING][DATE]2021-05-11 23:08:35[/DATE][PLAYCOUNT]33[/PLAYCOUNT][WINCOUNT]28[/WINCOUNT][STARS]0[/STARS][COMMENTS]1[/COMMENTS][/LEVEL]'
s = s.replace('/', '')
result = []
for e in re.findall(r"\][A-Za-z0-9 _.:,!'/$\-]+\[", s):
result.append(e.replace('[', '').replace(']', ''))
结果
['The Girder Guide!',
'draworigami',
'11',
'CA',
'62784',
'4',
'2021-05-11 23:08:35',
'33',
'28',
'0',
'1']
假设我有这个字符串
[LEVEL]
[NAME]The Girder Guide! [/NAME]
[AUTHOR]draworigami[/AUTHOR]
[AUTHORLEVEL]11[/AUTHORLEVEL]
[COUNTRY]CA[/COUNTRY]
[ID]62784[/ID]
[RATING]4[/RATING]
[DATE]2021-05-11 23:08:35[/DATE]
[PLAYCOUNT]33[/PLAYCOUNT]
[WINCOUNT]28[/WINCOUNT]
[STARS]0[/STARS]
[COMMENTS]1[/COMMENTS]
[/LEVEL]
有没有办法获取每个 [] 和 [/] 之间的单独字符串?我一直在用互联网上的代码对其进行拍摄,但无济于事。
试试这个:
st = "[LEVEL][NAME]The Girder Guide![/NAME][AUTHOR]draworigami[/AUTHOR][AUTHORLEVEL]11[/AUTHORLEVEL][COUNTRY]CA[/COUNTRY][ID]62784[/ID][RATING]4[/RATING][DATE]2021-05-11 23:08:35[/DATE][PLAYCOUNT]33[/PLAYCOUNT][WINCOUNT]28[/WINCOUNT][STARS]0[/STARS][COMMENTS]1[/COMMENTS][/LEVEL]"
st = st.split("]")
for i in range(len(st)):
st[i] = st[i].replace("[", "")
st[i]= st[i].replace("/", "")
st = st[:-1]
print(st)
st变成-
['LEVEL', 'NAME', 'The Girder Guide!NAME', 'AUTHOR', 'draworigamiAUTHOR', 'AUTHORLEVEL', '11AUTHORLEVEL', 'COUNTRY', 'CACOUNTRY', 'ID', '62784ID', 'RATING', '4RATING', 'DATE', '2021-05-11 23:08:35DATE', 'PLAYCOUNT', '33PLAYCOUNT', 'WINCOUNT', '28WINCOUNT', 'STARS', '0STARS', 'COMMENTS', '1COMMENTS', 'LEVEL']
我做了什么:
- 围绕
]
拆分字符串,以便获得不含字符“]”的字符串列表。 - 简单地从得到的列表中的字符串中分别删除字符
[
和/
。 - 跳过最后一个字符,因为它是由于
split
. 生成的空字符串
这将 return [] 和 [/] 之间的所有文本:
from bs4 import BeautifulSoup
rml = """
[LEVEL]
[NAME]The Girder Guide! [/NAME]
[AUTHOR]draworigami[/AUTHOR]
[AUTHORLEVEL]11[/AUTHORLEVEL]
[COUNTRY]CA[/COUNTRY]
[ID]62784[/ID]
[RATING]4[/RATING]
[DATE]2021-05-11 23:08:35[/DATE]
[PLAYCOUNT]33[/PLAYCOUNT]
[WINCOUNT]28[/WINCOUNT]
[STARS]0[/STARS]
[COMMENTS]1[/COMMENTS]
[/LEVEL]
"""
html = rml.replace('[', '<').replace(']', '>')
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('level').text)
输出:
The Girder Guide!
draworigami
11
CA
62784
4
2021-05-11 23:08:35
33
28
0
1
编辑 #1: 原始字符串没有换行符,因此要漂亮打印:
rml = "[LEVEL][NAME]The Girder Guide![/NAME][AUTHOR]draworigami[/AUTHOR][AUTHORLEVEL]11[/AUTHORLEVEL][COUNTRY]CA[/COUNTRY][ID]62784[/ID][RATING]4[/RATING][DATE]2021-05-11 23:08:35[/DATE][PLAYCOUNT]33[/PLAYCOUNT][WINCOUNT]28[/WINCOUNT][STARS]0[/STARS][COMMENTS]1[/COMMENTS][/LEVEL]"
html = rml.replace('[', '<').replace(']', '>')
soup = BeautifulSoup(html, 'html.parser')
elements = soup.find('level').contents
for e in elements:
print(e.text)
使用正则表达式如何?
import re
s = '[LEVEL][NAME]The Girder Guide![/NAME][AUTHOR]draworigami[/AUTHOR][AUTHORLEVEL]11[/AUTHORLEVEL][COUNTRY]CA[/COUNTRY][ID]62784[/ID][RATING]4[/RATING][DATE]2021-05-11 23:08:35[/DATE][PLAYCOUNT]33[/PLAYCOUNT][WINCOUNT]28[/WINCOUNT][STARS]0[/STARS][COMMENTS]1[/COMMENTS][/LEVEL]'
s = s.replace('/', '')
result = []
for e in re.findall(r"\][A-Za-z0-9 _.:,!'/$\-]+\[", s):
result.append(e.replace('[', '').replace(']', ''))
结果
['The Girder Guide!',
'draworigami',
'11',
'CA',
'62784',
'4',
'2021-05-11 23:08:35',
'33',
'28',
'0',
'1']