无法从输出 url 列表中找到 Url

Unable to findout an Url from an output url list

我下面的代码列出了 urls.If 我想要任何具体的 url 如何解决这个问题: 我的代码如下:

import bs4, requests
index_pages = ('https://www.tripadvisor.in/Hotels-g60763-oa{}-New_York_City_New_York-Hotels.html#ACCOM_OVERVIEW'.format(i) for i in range(0, 180, 30))
urls = []
with requests.session() as s:
    for index in index_pages:
        r = s.get(index)
        soup = bs4.BeautifulSoup(r.text, 'lxml')
        url_list = [i.get('href') for i in soup.select('.property_title')]
        urls.append(url_list)
        print(url_list) 

现在我得到的 Urls.Output 列表的输出如下:

New_York_City_New_York.html', '/Hotel_Review-g60763-d93543-Reviews-Shelburne_NYC_an_Affinia_hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1485603-Reviews-Comfort_Inn_Times_Square_West-New_York_City_New_York.html', '/Hotel_Review-g60763-d93340-Reviews-Hotel_Elysee_by_Library_Hotel_Collection-New_York_City_New_York.html', '/Hotel_Review-g60763-d1641016-Reviews-The_Chatwal_A_Luxury_Collection_Hotel_New_York-New_York_City_New_York.html', '/Hotel_Review-g60763-d93585-Reviews-Lowell_Hotel-New_York_City_New_York.html']
D:\anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
['/Hotel_Review-g60763-d277882-Reviews-Hampton_Inn_Manhattan_Seaport_Financial_District-New_York_City_New_York.html', '/Hotel_Review-g60763-d3529145-Reviews-Holiday_Inn_Express_Manhattan_Times_Square_South-New_York_City_New_York.html', '/Hotel_Review-g60763-d208453-Reviews-Hilton_Times_Square-New_York_City_New_York.html', '/Hotel_Review-g60763-d249711-Reviews-The_Hotel_at_Times_Square-New_York_City_New_York.html', '/Hotel_Review-g60763-d1158753-Reviews-Kimpton_Ink48_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1186070-Reviews-Marriott_Vacation_Club_Pulse_New_York_City-New_York_City_New_York.html', '/Hotel_Review-g60763-d1938661-Reviews-Row_NYC_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d93345-Reviews-Skyline_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d217616-Reviews-Kimpton_Muse_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1888977-Reviews-The_Pearl_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d223021-Reviews-Club_Quarters_Hotel_Midtown-New_York_City_New_York.html', '/Hotel_Review-g60763-d611947-Reviews-New_York_Hilton_Midtown-New_York_City_New_York.html', '/Hotel_Review-g60763-d4274398-Reviews-Courtyard_New_York_Manhattan_Times_Square_West-New_York_City_New_York.html', '/Hotel_Review-g60763-d1456416-Reviews-The_Dominick_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d122014-Reviews-Gild_Hall_a_Thompson_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d2622936-Reviews-Wyndham_Garden_Chinatown-New_York_City_New_York.html', '/Hotel_Review-g60763-d1456560-Reviews-Kimpton_Hotel_Eventi-New_York_City_New_York.html', '/Hotel_Review-g60763-d249710-Reviews-Morningside_Inn-New_York_City_New_York.html', '/Hotel_Review-g60763-d2079052-Reviews-YOTEL_New_York-New_York_City_New_York.html', '/Hotel_Review-g60763-d224214-Reviews-The_Bryant_Park_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1785018-Reviews-The_James_New_York_SoHo-New_York_City_New_York.html', '/Hotel_Review-g60763-d247814-Reviews-The_Gatsby_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d112039-Reviews-Hotel_Newton-New_York_City_New_York.html', '/Hotel_Review-g60763-d612263-Reviews-Hotel_Mela-New_York_City_New_York.html', '/Hotel_Review-g60763-d99392-Reviews-Hotel_Metro-New_York_City_New_York.html', '/Hotel_Review-g60763-d4446427-Reviews-Hotel_Boutique_At_Grand_Central-New_York_City_New_York.html', '/Hotel_Review-g60763-d1503474-Reviews-Distrikt_Hotel_New_York_City-New_York_City_New_York.html', '/Hotel_Review-g60763-d93467-Reviews-Gardens_NYC_an_Affinia_hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d93603-Reviews-The_Pierre_A_Taj_Hotel_New_York-New_York_City_New_York.html', '/Hotel_Review-g60763-d113311-Reviews-The_Peninsula_New_York-New_York_City_New_York.html']

现在,如果我要从上面的列表中寻找任何特定的 url,该怎么做? 例如:对于 Hilton_Times_Square 如何从上面的列表中找出 url?

寻找准确的 url:

def findExactUrl(urlList, searched):
    for url in urllist:
       if url == searched:
           reurn url

在空闲时你可以调用

>>findExactUrl(url_list, "http://maritonhotel.com/123")
## if such url is in your list
>>"mariton Hotel" 
## or if such url is not there, nothing should show, just:
>>

或者,从您的 .py 文件调用:

myUrl = findExactUrl(url_list, "http://maritonhotel.com/123")
print(myUrl)
>>"http://maritonhotel.com/123"

您可以将函数编辑为 return Truei 以查找它的索引。

更模糊的搜索

def findOccurence(urlList, searched):
    foundUrls = []
    for url in urllist:
       if url.contains(searched):
           foundUrls.append(url)
    return foundUrls

如果你想从字符串中删除一些子字符串,只需调用 .replace() 方法。

for i in range(len(url_list)):
    if "[" in url_list[i]:
        url_list[i] = url_list[i].replace("[", "")

如果您有不明白的地方,请告诉我。另外,请认真考虑为初学者购买一些 python 书籍,或者参加一些 on-line 课程。

这就是我要做的:

keyword = "Hilton_Times_Square"
target_urls = [ e for e in url_list if keyword in e ]