未正确附加项目

Items not being appended properly

我正在编写以下代码,它将输入 scrap,其中有几个短语:

scrap= ['Mutagenesis screens define conserved functions of metabolism and longevity', 'EK Bharath Shrestha Bharat(EBSB) - 100 commonly used sentences and their translations in 22 languages - P & D', 'OEB Special Seminar: “Phylogenetics and phylogenomics of Lentinula and the origin of cultivated shiitake mushrooms”', 'Student Exchange programme (Autumn Semester 2022) in University of Skovde, Sweden - CIR - Last Date: 04.03.2022', 'Ontario Institute for Studies in Education', 'Q Quest 2022 - AU TVS CQM - Last Date: 01.03.2022', 'National Conference on "Present Innovation Approaches and Paradigm in Physical Education"', 'Mahatma Gandhi University Newsletter ‘Insider’-Published.', 'STAGE Seminar', 'BOSM', 'Faculty of Law', 'UNIVERSITY UNION ELECTION 2019-20', 'Keynote Lecture: Sustainability for Africa: ...', 'Hillary Chute, "Maus Now: Spiegelman’s...', 'Conference on ‘Sustainable agriculture and farmers empowerment’ during 16th and 17th March 2021.', 'MIT Probability Seminar', 'Name of Programme', '49th All India Conference of Dravidian Linguists', 'Grad College Social Hour (GC common lounge)', 'MIT Symphony Orchestra: Márquez, Sarasate, and...', 'SCSB Colloquium Series: Etiology and impact of...', 'Celebration of National Science Day on 28th February 2022 - Dept. of Physics', 'PICASSO Tie-dye Event', 'Lunch & Learn with Muslim Life Program', '2022 Koch Institute Image Awards', 'Ideas & Images: The Power of Visual...', '30 Minutes Towards Better Bibliographies and Footnotes! (online)', 'Virtual Workshop on "Flight to a Bright Career-Enhance your Personality"', '4th Disaster Risk and Vulnerability Conference organised by SES scheduled on Oct 9-10 & 16-17.', 'French Education Fair 2022 organized by Campus France - CIR']

现在我想将 scrap 中使用了 prog_list 中单词的短语附加到 TRUE_PROG :

prog_list=['writing', 'cryptography', 'recoding', 'decoding', 'program', 'code', 'planning', 'programming', 'encoding', 'gull', 'scheduling', 'tease', 'program', 'code']
TRUE_PROG =[] 

我写了一个简单的代码,里面有循环,但它产生了我没想到的输出:

程序代码:

TRUE_PROG=[]
MIS_PROG=[]
c_list = []
p = string.punctuation
punc = list(p)

for i in scrap:
    # print(i)
    words_in_scrap = i.split() 
    for j in words_in_scrap:
      words = j.lower()
      for k in words: 
        # print(k)
        if k in punc:
          words = words.replace(k ," ")
      #CLEANSED DATA
      clean = words  
      # print("clean=",clean)
      c_list.append(clean)
    
    # print("c_list=:",c_list)
    for c in c_list:
      if c ==" ":
        c_list.remove(c)
    # print("c_list cleaned of spaces=",c_list)    
    for t in c_list:
      if t in prog_list:
        TRUE_PROG.append(i)
        #print("\ni=",i,"due to t=",t)
      else:
        MIS_PROG.append(i)
# print("\n\nPROG=",set(TRUE_PROG),"\n\n\n MIS_PROG=", set(MIS_PROG),"\n")

如果您取消注释 #print("\ni=",i,"due to t=",t) 您会发现还附加了一些甚至没有这些词的短语。它给了我这个:

i= Lunch & Learn with Muslim Life Program due to t= program

i= 2022 Koch Institute Image Awards due to t= program

i= Ideas & Images: The Power of Visual... due to t= program

i= 30 Minutes Towards Better Bibliographies and Footnotes! (online) due to t= program

i= Virtual Workshop on "Flight to a Bright Career-Enhance your Personality" due to t= program

等等。除了第一个,其余的虽然没有“程序”这个词,但还是加了进去。任何更正都将受到高度重视。谢谢!

ps = list(set(prog_list))

for p in ps:
      for s in scrap:
        words = s.split()
        for w in words:
            if p == w.lower():
                r = s+f" - due to the word {p}"
                TRUE_PROG.append(r)

        
print(TRUE_PROG)

输出:

['Lunch & Learn with Muslim Life Program - due to the word program']