尝试将代码更新为 Selenium (Python) 中的 return HTML 值
Trying to update code to return HTML value in Selenium (Python)
我正在使用 Selenium 抓取一些 Facebook 群组信息:
with open("groups.txt") as file:
lines = file.readlines()
total = len(lines)
count = 1
for line in lines:
group_id = line.strip().split(".com/")[1]
if "groups" not in line:
new_line = "https://www.facebook.com/groups/" + str(group_id) + "/about"
else:
new_line = line.strip() + '/about'
sleep(2)
driver.get(new_line)
page_source = driver.page_source
page_id = page_source.split('"groupID":"')[1].split('","')[0]
page_followers = page_source.split('<!-- --> total members')[0][-15:]
page_followers = str(page_followers.split('>')[1]).replace(',', '')
page_name = page_source.split("</title>")[0].split("<title>")[1]
df1.loc[len(df1)] = [line.strip(), 'https://www.facebook.com/' + str(page_id), page_followers, page_name]
print(f"{count}/{total}", line.strip(), 'https://www.facebook.com/' + str(page_id), page_followers)
count += 1
df1.to_csv("groups.csv", encoding='utf-8', index=False, header=False)
Facebook 最近更新了一些内容,因此此代码无法 return 群组成员数量。
这些是相关行:
page_followers = page_source.split('<!-- --> total members')[0][-15:]
page_followers = str(page_followers.split('>')[1]).replace(',', '')
以view-source:https://www.facebook.com/groups/764385144252353/about为例,我发现了两个“total members”的实例。是否可以就我应该更改哪些内容以获取此号码获得一些建议?
新
此代码提取成员的确切数量并将其从字符串转换为整数
driver.get('https://www.facebook.com/groups/410943193806268/about')
members = driver.find_element(By.XPATH, "//span[contains(text(), 'total members')]").text
members = int(''.join(i for i in members if i.isdigit()))
print(members)
输出
15589
旧
我建议不要使用page_source
来提取这种数据,而是使用find_element
这种方式
driver.find_element(By.CSS_SELECTOR, "a[href*='members']").text.split()[0]
输出
'186'
说明:a[href*='members']
搜索 a
元素(例如 <a class='test'>...</a>
),其 href
属性包含字符串 members
(例如 <a href="something-members-test">...</a>
)
我正在使用 Selenium 抓取一些 Facebook 群组信息:
with open("groups.txt") as file:
lines = file.readlines()
total = len(lines)
count = 1
for line in lines:
group_id = line.strip().split(".com/")[1]
if "groups" not in line:
new_line = "https://www.facebook.com/groups/" + str(group_id) + "/about"
else:
new_line = line.strip() + '/about'
sleep(2)
driver.get(new_line)
page_source = driver.page_source
page_id = page_source.split('"groupID":"')[1].split('","')[0]
page_followers = page_source.split('<!-- --> total members')[0][-15:]
page_followers = str(page_followers.split('>')[1]).replace(',', '')
page_name = page_source.split("</title>")[0].split("<title>")[1]
df1.loc[len(df1)] = [line.strip(), 'https://www.facebook.com/' + str(page_id), page_followers, page_name]
print(f"{count}/{total}", line.strip(), 'https://www.facebook.com/' + str(page_id), page_followers)
count += 1
df1.to_csv("groups.csv", encoding='utf-8', index=False, header=False)
Facebook 最近更新了一些内容,因此此代码无法 return 群组成员数量。
这些是相关行:
page_followers = page_source.split('<!-- --> total members')[0][-15:]
page_followers = str(page_followers.split('>')[1]).replace(',', '')
以view-source:https://www.facebook.com/groups/764385144252353/about为例,我发现了两个“total members”的实例。是否可以就我应该更改哪些内容以获取此号码获得一些建议?
新
此代码提取成员的确切数量并将其从字符串转换为整数
driver.get('https://www.facebook.com/groups/410943193806268/about')
members = driver.find_element(By.XPATH, "//span[contains(text(), 'total members')]").text
members = int(''.join(i for i in members if i.isdigit()))
print(members)
输出
15589
旧
我建议不要使用page_source
来提取这种数据,而是使用find_element
这种方式
driver.find_element(By.CSS_SELECTOR, "a[href*='members']").text.split()[0]
输出
'186'
说明:a[href*='members']
搜索 a
元素(例如 <a class='test'>...</a>
),其 href
属性包含字符串 members
(例如 <a href="something-members-test">...</a>
)