python 使用列表中的文本将字符串列表拆分为字符串列表

python split string list into lists of strings using text in the list

我有一个名为 'exemptions' 的列表,其中包含多个字段(字符串变量)。

exemptions = ['S-1', '20090820', '\t\t\t\tDOLLAR GENERAL CORP', '\t\t0000029534', 'S-1/A', '20021114', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806', '\t\t\t\tCONSTAR FOREIGN HOLDINGS INC', '\t\t0001178543', '\t\t\t\tCONSTAR PLASTICS LLC', '\t\t0001178541', '\t\t\t\tDT INC', '\t\t0001178539', '\t\t\t\tBFF INC', '\t\t0001178538', '\t\t\t\tCONSTAR INC', '\t\t0001178537', 'S-1', '20020523', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806', 'S-1', '20051123', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300', 'S-1', '20061221', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300', 'S-1/A', '20140327', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729', 'S-1', '20110331', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729', 'S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388', 'S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761', 'S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761', 'S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015', 'S-1', '20140512', '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI', '\t\t0000759944']

我想在每个 'S-1' 或 'S-1/A' 的开头创建子列表。期望的输出将是:

exemptions = [['S-1', '20090820', '\t\t\t\tDOLLAR GENERAL CORP', '\t\t0000029534'], ['S-1/A', '20021114', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806', '\t\t\t\tCONSTAR FOREIGN HOLDINGS INC', '\t\t0001178543', '\t\t\t\tCONSTAR PLASTICS LLC', '\t\t0001178541', '\t\t\t\tDT INC', '\t\t0001178539', '\t\t\t\tBFF INC', '\t\t0001178538', '\t\t\t\tCONSTAR INC', '\t\t0001178537'], ['S-1', '20020523', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806'], ['S-1', '20051123', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'], ['S-1', '20061221', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'], ['S-1/A', '20140327', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'], ['S-1', '20110331', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'], ['S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388'], ['S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'], ['S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'], ['S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015'], ['S-1', '20140512', '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI', '\t\t0000759944']]

我尝试了 _list = [i.split('S-1') for i in exemptions],但没有提供我需要的东西...

有什么建议吗?非常感谢

这个有用吗?

exemptions = ['S-1', '20090820', .... , '\t\t0000759944']
result = []
for e in exemptions:
    if e in ("S-1", "S-1/A"):
        result.append([])
    result[-1].append(e)

请注意,这取决于您的输入列表以 'starting' S-1 值开头的事实,每次它遇到其中一个时,它都会在 result 的末尾添加一个新的子列表.然后你需要做的就是在最后一个子列表的末尾继续添加值。

将列表作为带有自定义分隔符的字符串加入,例如 |,使用 re.split 拆分每次出现的 S-1,然后拆分结果列表的每个元素回到基于分隔符 |

的列表
>>> res = [s.strip('|').split('|') for s in re.split(r'(?=S-1)', '|'.join(exemptions)) if s]
>>>
>>> pprint(res)
[['S-1', '20090820', '\t\t\t\tDOLLAR GENERAL CORP', '\t\t0000029534'],
 ['S-1/A',
  '20021114',
  '\t\t\t\tCONSTAR INTERNATIONAL INC',
  '\t\t0000029806',
  '\t\t\t\tCONSTAR FOREIGN HOLDINGS INC',
  '\t\t0001178543',
  '\t\t\t\tCONSTAR PLASTICS LLC',
  '\t\t0001178541',
  '\t\t\t\tDT INC',
  '\t\t0001178539',
  '\t\t\t\tBFF INC',
  '\t\t0001178538',
  '\t\t\t\tCONSTAR INC',
  '\t\t0001178537'],
 ['S-1', '20020523', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806'],
 ['S-1', '20051123', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'],
 ['S-1', '20061221', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'],
 ['S-1/A', '20140327', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'],
 ['S-1', '20110331', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'],
 ['S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388'],
 ['S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'],
 ['S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'],
 ['S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015'],
 ['S-1',
  '20140512',
  '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI',
  '\t\t0000759944']]
>>> 
# exemptions is input list
finalList = []
temporaryList = []
for eachItem in exemptions:
    if 'S-1' in eachItem:
        temporaryList = []
        temporaryList.append(eachItem)
    else:
        temporaryList.append(eachItem)
finalList.append(temporaryList)

打印最终列表