如何合并列表中的损坏文本并追加到字典中?

how to merge broken text from list and append in dictionary?

参考Python module for converting PDF to text post,抓取pdf文件并提取数据。在抓取时,数据被分成两个单独的变量。如何合并这些数据并将其提取为字典?
例如。

content = ['Sample Questions Set 1 ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '01  Which function among the following can’t be accessed outside ', 'the class in java in same package? ', 'A. public void show()。 ', 'B. void show()。 ', 'C. protected show()。 ', 'D. static void show()。 ', '02  How many private member functions are allowed in a class ? ', 'A. Only 1 ', 'B. Only 7 ', 'C. Only 255 ', 'D. As many as required ', '03  Can main() function be made private? ', 'A. Yes, always。 ', 'B. Yes, if program doesn’t contain any classes。 ', 'C. No, because main function is user defined。 ', 'D. No, never。 ', '04  If private member functions are to be declared in C++ then_________。 ', 'A. private:  ', 'B. private ', 'C. private(private member list) ', 'D. private :- <private members> ', '05  If a function in java is declared private then it _________。 ', 'A. Can’t access the standard output ', 'B. Can access the standard output。 ', 'C. Can’t access any output stream。 ', 'D. Can access only the output streams。 ']

输出:

questions = [{'Qid':01,'Qtext':'Which function among the following can’t be accessed outside the class in java in same package?','A.':'public void show()。','B.':' void show()。','C.':'protected show()。','D.':'static void show()'},{'Qid':02,....},{...},{...},{...}]

执行以下操作:

questions = []
for s in content:
    s = s.lstrip()
    if s:
        if s[0].isdigit():
            questions.append({'Qid': len(questions) + 1, 'Qtext': s.split(maxsplit=1)[1]})
        elif s[0].isalpha() and s[1] == '.':
            questions[-1][s[:2]] = s.split(maxsplit=1)[1]
        elif questions:
            questions[-1]['Qtext'] += s

questions 将变为:

[{'Qid': 1, 'Qtext': 'Which function among the following can’t be accessed outside the class in java in same package? ', 'A.': 'public void show()。 ', 'B.': 'void show()。 ', 'C.': 'protected show()。 ', 'D.': 'static void show()。 '}, {'Qid': 2, 'Qtext': 'How many private member functions are allowed in a class ? ', 'A.': 'Only 1 ', 'B.': 'Only 7 ', 'C.': 'Only 255 ', 'D.': 'As many as required '}, {'Qid': 3, 'Qtext': 'Can main() function be made private? ', 'A.': 'Yes, always。 ', 'B.': 'Yes, if program doesn’t contain any classes。 ', 'C.': 'No, because main function is user defined。 ', 'D.': 'No, never。 '}, {'Qid': 4, 'Qtext': 'If private member functions are to be declared in C++ then_________。 ', 'A.': 'private:  ', 'B.': 'private ', 'C.': 'private(private member list) ', 'D.': 'private :- <private members> '}, {'Qid': 5, 'Qtext': 'If a function in java is declared private then it _________。 ', 'A.': 'Can’t access the standard output ', 'B.': 'Can access the standard output。 ', 'C.': 'Can’t access any output stream。 ', 'D.': 'Can access only the output streams。 '}]

这会将它们合并到问题列表中:-

import re

questions = []
loc = 0

for i in range(len(content)):
    res = content[i]
    prefix = res[0]
    if(prefix.isalpha() and res[1]=='.'):
        questions[loc][prefix + "."] = re.sub(r"[ABCD]\.\s*", '', res)
        if(prefix == "D"):loc += 1
    elif(prefix.isdigit()):
        questions.append({'Qid':loc+1, 'Qtext': re.sub(r"\d+\s+", '', res)})
    elif(len(questions) != 0):
        questions[loc]['Qtext'] += res #for this line which after a question cutted

结果 :

[{'Qid': 1, 'Qtext': 'Which function among the following can’t be accessed outside the class in java in same package? ', 'A.': 'public void show()。 ', 'B.': 'void show()。 ', 'C.': 'protected show()。 ', 'D.': 'static void show()。 '}, {'Qid': 2, 'Qtext': 'How many private member functions are allowed in a class ? ', 'A.': 'Only 1 ', 'B.': 'Only 7 ', 'C.': 'Only 255 ', 'D.': 'As many as required '}, {'Qid': 3, 'Qtext': 'Can main() function be made private? ', 'A.': 'Yes, always。 ', 'B.': 'Yes, if program doesn’t contain any classes。 ', 'C.': 'No, because main function is user defined。 ', 'D.': 'No, never。 '}, {'Qid': 4, 'Qtext': 'If private member functions are to be declared in C++ then_________。 ', 'A.': 'private:  ', 'B.': 'private ', 'C.': 'private(private member list) ', 'D.': 'private :- <private members> '}, {'Qid': 5, 'Qtext': 'If a function in java is declared private then it _________。 ', 'A.': 'Can’t access the standard output ', 'B.': 'Can access the standard output。 ', 'C.': 'Can’t access any output stream。 ', 'D.': 'Can access only the output streams。 '}]