使用 pythons 解析标准

Using pythons parse with criteria

首先我要说的是,我对任何类型的编码都没有什么经验,所以即使我也不完全知道我在做什么,但我会尽力而为!

我一直在编写这段代码,它获取某个网站的 HTML,然后给我命名元素 (?) 的 .CSV 文件(您可以在网站的检查面板中看到这些).

所以我的问题是,如何在我当前的代码中使用条件,以便我可以告诉代码只包含 return 个单词,例如其中包含字母 g?

我很乐意详细说明! 已经谢谢你了!

    import urllib.request
    from bs4 import BeautifulSoup
    import csv
    
    url = 'https://kouluruoka.fi/menu/kouvola_koulujenruokalista'
    
    request = urllib.request.Request(url)
    
    content = urllib.request.urlopen(request)
    
    parse = BeautifulSoup(content, 'html.parser')
    

    #These texts get words in <h2> and <span> named elements

    text1 = parse.find_all('h2')
    
    text2 = parse.find_all('span')

    
    #This code uses the texts above to create the .CSV file

    with open('index.csv', 'a') as csv_file:
      writer = csv.writer(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
      for col1,col2 in zip(text1, text2):
        writer.writerow([col1.get_text().strip(), col2.get_text().strip()])

您可以通过这种方式检查元素是否包含一些 string/letter:

h2_elements = parse.find_all('h2')
span_elements = parse.find_all('span')
# This code uses the texts above to create the .CSV file

with open('index.csv', 'a') as csv_file:
    writer = csv.writer(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
    for h2_element, span_element in zip(h2_elements, span_elements):
        h2_element_str = h2_element.get_text().strip()
        span_element_str = span_element.get_text().strip()

        if 'a' in h2_element_str and 'a' in span_element_str:
            writer.writerow([h2_element_str, span_element_str])