while 循环数据不附加到 while 循环之外的列表

While loop data not appending to list outside of while loop

我正在尝试抓取数据,将其写入 pd 系列,然后进入 while 循环,以便在每次迭代后将网站的其余页面附加到原始系列(位于 while 循环之外)。我不确定为什么这不起作用。这是我卡住的地方:

current_url = 'https://www.yellowpages.com/search?search_terms=hvac&geo_location_terms=97080'

def get_data_run(current_url):
    company_names1 = get_company_name(current_url)
    print(company_names1) #1
    page = 1
    max_page = 3
    company_names1 = paginate(current_url, page, max_page, company_names1)
    print(company_names1) #2



def paginate(current_url, page, max_page, company_names1):
    while (page <= max_page):
            new_url = current_url + f"&page={page}"
            print(new_url)
            company_names = get_company_name(new_url)
            company_names1.append(company_names)
            print(company_names) #3
            print(company_names1) #4
            
            page +=1
            if page == max_page:
                return company_names1

def get_company_name(url):
    company_names = []
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'lxml')
    box = list(soup.findAll("div", {"class": "result"}))
    for i in range(len(box)):
        try:
            company_names.append(box[i].find("a", {"class": "business-name"}).text.strip())
        except Exception:
            company_names.append("null")
        else: 
            continue
    company_names = pd.Series(company_names, dtype='string')
    return company_names


get_data_run(current_url)

我已经标记了 company_names1company_names 的不同印刷品和所有印刷品,并且每次 company_names1 它甚至在附加 [=14 之后印刷相同系列的公司=] 在 while 循环中。我无法理解的是,当我打印 company_names (#3) 时,它会打印下一页公司名称。我不明白为什么它没有附加到 while 循环内,然后为什么它没有成功返回函数外部并在 #2 打印中打印组合系列。谢谢!

更新: 这是一些示例输出:

当我打印 #3:

(pyfinance) justinbenfit@MacBook-Pro-3 yellowpages_scrape % /usr/local/anaconda3/envs/pyfinance/bin/python /Users/justinbenfit/Desktop/yellowpages_scrape/test.py
0             Honke Heating & Air Conditioning
1                   Climate Kings Heating & Ac
2                  Mike's Truck & Auto Service
3          One Hour Heating & Air Conditioning
4                 Morgan Heating & Cooling Inc
5       Rnr Heating Venting & Air Conditioning
6                           Universal HVAC Inc
7                                   Mr Furnace
8                Affordable Excellence Heating
9                           Green Air Products
10                        David Eugene Neketin
11                  Century Heating & Air Cond
12                            Appliance Wizard
13             Precision Energy Solutions Inc.
14      Portland Heating & Air Conditioning Co
15                                         Mhc
16     American Pride Heating and Cooling, LLC
17                            Tri Star Western
18                 Comfort Zone Heat & Air Inc
19                          Don's Air-Care Inc
20                   Chuck's Heating & Cooling
21    Mt. Hood Heating Cooling & Refrigeration
22                   Chuck's Heating & Cooling
23                                 Mr. Furnace
24                  America's Same Day Service
25         Arctic Commercial Refrigeration LLC
26                          Apex Refrigeration
27        Ben's Heating & Air Conditioning LLC
28                       David's Appliance Inc
29                   Wolcott Heating & Cooling
dtype: string
0                                              Air-Trix
1                                      Johnstone Supply
2                            Buss Heating & Cooling Inc
3                                     The Heat Exchange
4                   Hoodview Heating & Air Conditioning
5                Loomis Heating Cooling & Refrigeration
6                       All About Air Heating & Cooling
7                                        Hanson Heating
8                              Sparks Heating & Cooling
9                              Interior Comfort Systems
10                              P D X Heating & Cooling
11                                      Apcom Power Inc
12                                     Area Heating Inc
13    Four Seasons Heating Air Conditioning & Servic...
14                                  Perfect Climate Inc
15                           Combustion Consultants Inc
16                            Classic Heat Source, Inc.
17                               Multnomah Heating, Inc
18     Apollo Plumbing, Heating & Air Conditioning - OR
19                             Art's Furnace & Air Cond
20                                      Kurchel Heating
21                               P & O Construction Inc
22                                Systems Management NW
23                                   Bridgetown Heating
24             Amana Heating & Air Conditioning Systems
25                                         QualitySmith
26                                   Wilbert Jr, Wilson
27                 Faith Heating & Air Conditioning Inc
28    Northwest Commercial Heating & Air Conditionin...
29                                     Heat Master Corp
dtype: string

当我打印#1、#2 和#4 时

0             Honke Heating & Air Conditioning
1                   Climate Kings Heating & Ac
2                  Mike's Truck & Auto Service
3          One Hour Heating & Air Conditioning
4                 Morgan Heating & Cooling Inc
5       Rnr Heating Venting & Air Conditioning
6                           Universal HVAC Inc
7                                   Mr Furnace
8                Affordable Excellence Heating
9                           Green Air Products
10                        David Eugene Neketin
11                  Century Heating & Air Cond
12                            Appliance Wizard
13             Precision Energy Solutions Inc.
14      Portland Heating & Air Conditioning Co
15                                         Mhc
16     American Pride Heating and Cooling, LLC
17                            Tri Star Western
18                 Comfort Zone Heat & Air Inc
19                          Don's Air-Care Inc
20                   Chuck's Heating & Cooling
21                   Chuck's Heating & Cooling
22                                 Mr. Furnace
23    Mt. Hood Heating Cooling & Refrigeration
24                  America's Same Day Service
25         Arctic Commercial Refrigeration LLC
26                          Apex Refrigeration
27        Ben's Heating & Air Conditioning LLC
28                       David's Appliance Inc
29                   Wolcott Heating & Cooling
dtype: string

问题是您将 pd.Series 视为 list,但前者是不可变的,而后者是可变的。这意味着,将数据附加到列表的工作方式如下:

lst = [1,2,3]
lst.append(4)
print(lst)
# [1, 2, 3, 4]

对象更改而无需显式分配。如果您对 Series 执行相同操作,则会发生以下情况:

series = pd.Series([1,2,3])
series.append(pd.Series([4]))
print(series)

输出为:

0    1
1    2
2    3
dtype: int64

因此,要更新系列,您必须替换原始对象或创建一个新对象。如果没有分配,它将不会存储在内存中:

series = pd.Series([1,2,3])
series = series.append(pd.Series([4]))
print(series)

输出:

0    1
1    2
2    3
0    4
dtype: int64

如果您的问题出在 paginate 函数中,您应该更改此行:

company_names1.append(company_names)

至:

company_names1 = company_names1.append(company_names)

一切正常