如果单元格值中不存在字符串则添加到列表,如果存在则中断并开始新列表?
Add to list if string not present in cell value, break and start new list if present?
我正在尝试遍历 Excel 中的列并检查是否存在字符串。如果字符串存在,我想将列表重置为 []
并重复该过程。在这上面花了太多时间,我似乎无法弄清楚我做错了什么。
示例数据:
Open Ended Schemes(Balanced)
Aditya Birla Sun Life Mutual Fund
120518 Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Dividend
120517 Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Growth
Open Ended Schemes(Debt Scheme - Banking and PSU Fund)
Axis Mutual Fund
128953 Axis Banking & PSU Debt Fund - Bonus Option
117447 Axis Banking & PSU Debt Fund - Daily Dividend Option
代码:
from openpyxl import load_workbook
import os
wb = load_workbook('m.xlsx')
ws = wb.active
keys = ['1', '2']
m_dict = {}
scheme_codes = []
for g in groups[0:2]:
for row in ws.iter_rows('A{}:A{}'.format(ws.min_row +1, ws.max_row)):
# scheme_codes = []
for cell in row:
if cell.value != None:
if 'Schemes' in cell.value:
print('Found Schemes' + str(cell.value))
scheme_codes=[]
break
else:
scheme = cell.value
scheme_codes.append(scheme)
m_dict[g] = scheme_codes
每个方案我只得到 1 个项目,我尝试了多种方法,要么一直通过 rows
。该文件有 18000 行。
预期输出
{1:['A' 列中 'schemes' 第一次重复前的所有项目],2:['A' 列中 'schemes' 第二次重复前的所有项目]
现在,当我 运行 代码时,我得到一个 len(scheme_codes) = 8069,据我所知这是错误的。第一个列表应该接近 80 项。
这不完全是您的要求,它实际上提供了一些额外的信息...
它给你一个包含一组 scheme_code 和 scheme_names 元组的字典,比如:
{scheme: {sub_scheme : {(code, name), (code, name), ...}}}
如果你真的只需要顶层方案和它的代码,你应该可以简化它。
只需删除一级 defaultdict 并改用 scheme_codes[scheme].add(cell.value)
...
from openpyxl import load_workbook
import os
from collections import defaultdict
wb = load_workbook("mfcodes.xlsx")
ws = wb.active
scheme_codes = defaultdict(lambda: defaultdict(set))
scheme = 'N/A'
sub_scheme = 'N/A'
for row in ws[f'A{ws.min_row}:B{ws.max_row}']:
cell = row[0]
if not cell.value:
continue
if 'Schemes' in cell.value:
scheme = cell.value
else:
if not cell.value.isdigit():
sub_scheme = cell.value
else:
scheme_codes[scheme][sub_scheme].add((cell.value, row[1].value))
print(repr(next(iter(scheme_codes.items()))))
输出:
{'Open Ended Schemes(Balanced)' :
{'Aditya Birla Sun Life Mutual Fund': {('120518', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Dividend"),
('120517', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Growth"),
('103154', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Regular Plan-Dividend"),
('103155', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Regular Plan-Growth"),
('131671', 'Aditya Birla Sun Life Balanced Advantage Fund - Direct Plan - Dividend Option'),
('131670', 'Aditya Birla Sun Life Balanced Advantage Fund - Direct Plan - Growth Option'),
('131665', 'Aditya Birla Sun Life Balanced Advantage Fund - Regular Plan - Dividend Option'),
('131666', 'Aditya Birla Sun Life Balanced Advantage Fund - Regular Plan - Growth Option')},
'Baroda Pioneer Mutual Fund': {('125112', 'Baroda Pioneer Balance Fund - Plan A - Bonus Option'),
('101913', 'BARODA PIONEER BALANCE FUND - Plan A - Dividend Option'),
('101912', 'BARODA PIONEER BALANCE FUND - Plan A - Growth Option'),
('119325', 'BARODA PIONEER BALANCE FUND - Plan B (Direct) - Dividend Option'),
('119326', 'BARODA PIONEER BALANCE FUND - Plan B (Direct) - Growth Option')},
# et cetera ...
}
}
顺便提一下:第一个方案有67个码...
我正在尝试遍历 Excel 中的列并检查是否存在字符串。如果字符串存在,我想将列表重置为 []
并重复该过程。在这上面花了太多时间,我似乎无法弄清楚我做错了什么。
示例数据:
Open Ended Schemes(Balanced)
Aditya Birla Sun Life Mutual Fund
120518 Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Dividend
120517 Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Growth
Open Ended Schemes(Debt Scheme - Banking and PSU Fund)
Axis Mutual Fund
128953 Axis Banking & PSU Debt Fund - Bonus Option
117447 Axis Banking & PSU Debt Fund - Daily Dividend Option
代码:
from openpyxl import load_workbook
import os
wb = load_workbook('m.xlsx')
ws = wb.active
keys = ['1', '2']
m_dict = {}
scheme_codes = []
for g in groups[0:2]:
for row in ws.iter_rows('A{}:A{}'.format(ws.min_row +1, ws.max_row)):
# scheme_codes = []
for cell in row:
if cell.value != None:
if 'Schemes' in cell.value:
print('Found Schemes' + str(cell.value))
scheme_codes=[]
break
else:
scheme = cell.value
scheme_codes.append(scheme)
m_dict[g] = scheme_codes
每个方案我只得到 1 个项目,我尝试了多种方法,要么一直通过 rows
。该文件有 18000 行。
预期输出
{1:['A' 列中 'schemes' 第一次重复前的所有项目],2:['A' 列中 'schemes' 第二次重复前的所有项目]
现在,当我 运行 代码时,我得到一个 len(scheme_codes) = 8069,据我所知这是错误的。第一个列表应该接近 80 项。
这不完全是您的要求,它实际上提供了一些额外的信息...
它给你一个包含一组 scheme_code 和 scheme_names 元组的字典,比如:
{scheme: {sub_scheme : {(code, name), (code, name), ...}}}
如果你真的只需要顶层方案和它的代码,你应该可以简化它。
只需删除一级 defaultdict 并改用 scheme_codes[scheme].add(cell.value)
...
from openpyxl import load_workbook
import os
from collections import defaultdict
wb = load_workbook("mfcodes.xlsx")
ws = wb.active
scheme_codes = defaultdict(lambda: defaultdict(set))
scheme = 'N/A'
sub_scheme = 'N/A'
for row in ws[f'A{ws.min_row}:B{ws.max_row}']:
cell = row[0]
if not cell.value:
continue
if 'Schemes' in cell.value:
scheme = cell.value
else:
if not cell.value.isdigit():
sub_scheme = cell.value
else:
scheme_codes[scheme][sub_scheme].add((cell.value, row[1].value))
print(repr(next(iter(scheme_codes.items()))))
输出:
{'Open Ended Schemes(Balanced)' :
{'Aditya Birla Sun Life Mutual Fund': {('120518', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Dividend"),
('120517', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Growth"),
('103154', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Regular Plan-Dividend"),
('103155', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Regular Plan-Growth"),
('131671', 'Aditya Birla Sun Life Balanced Advantage Fund - Direct Plan - Dividend Option'),
('131670', 'Aditya Birla Sun Life Balanced Advantage Fund - Direct Plan - Growth Option'),
('131665', 'Aditya Birla Sun Life Balanced Advantage Fund - Regular Plan - Dividend Option'),
('131666', 'Aditya Birla Sun Life Balanced Advantage Fund - Regular Plan - Growth Option')},
'Baroda Pioneer Mutual Fund': {('125112', 'Baroda Pioneer Balance Fund - Plan A - Bonus Option'),
('101913', 'BARODA PIONEER BALANCE FUND - Plan A - Dividend Option'),
('101912', 'BARODA PIONEER BALANCE FUND - Plan A - Growth Option'),
('119325', 'BARODA PIONEER BALANCE FUND - Plan B (Direct) - Dividend Option'),
('119326', 'BARODA PIONEER BALANCE FUND - Plan B (Direct) - Growth Option')},
# et cetera ...
}
}
顺便提一下:第一个方案有67个码...