在 python 中,当我不知道不需要的数据会在哪里弹出,或者具体的字符串是什么时,从数据列表中删除不需要的项目?
In python remove unwanted item from a list of data when I don't know where the unwanted data will pop up, or what the specific string will be?
这是我编造的输入,但结构与我正在处理的数据相同。我需要删除 'some stuff I dont want',但我不知道它会出现在数据中的哪个位置。我还需要将剩余的数据放入 7 项的子列表中。数据逐个字符地从 PDF 的文本布局中提取,并放入 'Input' 列表中。我希望它做的是查看列表中的第一项,检查它是否是少于 3 位的整数。如果为真,则将该项目和接下来的 6 项放入子列表中。如果为 False,我希望它忽略该项目并检查下一个项目。我希望它持续执行此操作,直到用完数据以检查并放入子列表。
输入:
['1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '4', '3.00', '43.00 NC', '1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3','43.00 NC', 'some stuff I dont want', '1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3', '43.00 NC']
输出应如下所示:
输出:
[['1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '4', '3.00', '43.00 NC'], ['1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3', '43.00 NC'], ['1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3', '43.00 NC']]
我尝试使用 for 循环和 while 循环,但我似乎无法获得正确的语法来仅将我想要的数据放入子列表而忽略我不需要的数据。有没有办法做到这一点,也许我错过了?
像这样的事情可能会让你开始:
data = ['1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '4', '3.00', '43.00 NC', '1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3','43.00 NC', 'some stuff I dont want', '1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3', '43.00 NC']
all_sublists = []
i = 0
while i < len(data):
try:
if int(data[i]) < 100:
all_sublists.append(data[i:i+7])
i += 7
except ValueError:
i += 1
all_sublists
returns
[['1',
'1',
'2',
'11" Some Words symbols and numbers mixed 3-4-2#',
'4',
'3.00',
'43.00 NC'],
['1',
'1',
'2',
'11" Some Words symbols and numbers mixed 3-4-2#',
'3.00',
'3',
'43.00 NC'],
['1',
'1',
'2',
'11" Some Words symbols and numbers mixed 3-4-2#',
'3.00',
'3',
'43.00 NC']]
这是我编造的输入,但结构与我正在处理的数据相同。我需要删除 'some stuff I dont want',但我不知道它会出现在数据中的哪个位置。我还需要将剩余的数据放入 7 项的子列表中。数据逐个字符地从 PDF 的文本布局中提取,并放入 'Input' 列表中。我希望它做的是查看列表中的第一项,检查它是否是少于 3 位的整数。如果为真,则将该项目和接下来的 6 项放入子列表中。如果为 False,我希望它忽略该项目并检查下一个项目。我希望它持续执行此操作,直到用完数据以检查并放入子列表。
输入:
['1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '4', '3.00', '43.00 NC', '1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3','43.00 NC', 'some stuff I dont want', '1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3', '43.00 NC']
输出应如下所示: 输出:
[['1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '4', '3.00', '43.00 NC'], ['1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3', '43.00 NC'], ['1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3', '43.00 NC']]
我尝试使用 for 循环和 while 循环,但我似乎无法获得正确的语法来仅将我想要的数据放入子列表而忽略我不需要的数据。有没有办法做到这一点,也许我错过了?
像这样的事情可能会让你开始:
data = ['1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '4', '3.00', '43.00 NC', '1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3','43.00 NC', 'some stuff I dont want', '1','1','2', '11" Some Words symbols and numbers mixed 3-4-2#', '3.00', '3', '43.00 NC']
all_sublists = []
i = 0
while i < len(data):
try:
if int(data[i]) < 100:
all_sublists.append(data[i:i+7])
i += 7
except ValueError:
i += 1
all_sublists
returns
[['1',
'1',
'2',
'11" Some Words symbols and numbers mixed 3-4-2#',
'4',
'3.00',
'43.00 NC'],
['1',
'1',
'2',
'11" Some Words symbols and numbers mixed 3-4-2#',
'3.00',
'3',
'43.00 NC'],
['1',
'1',
'2',
'11" Some Words symbols and numbers mixed 3-4-2#',
'3.00',
'3',
'43.00 NC']]