Python 打印整个 URL 中的正则表达式问题
Regex issue in Python printing entire URL
我正在尝试提取所有包含“https://play.google.com/store/”的网址并打印整个字符串。当我 运行 我当前的代码时,它只打印“https://play.google.com/store/”,但我正在寻找整个 URL。有人能指出我正确的方向吗?这是我的代码:
import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re
URL = "https://www.pocketgamer.com/android/best-tycoon-games-android/?page=3"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []
for link in soup.findAll("a", target="_blank"):
links.append(link.get('href'))
x = re.findall("https://play.google.com/store/", str(links))
print(x)
re.findall
只是 returns 与正则表达式匹配的文本部分,因此您得到的只是正则表达式中的 https://play.google.com/store/
。您可以修改正则表达式,但考虑到您要搜索的是链接列表,只检查它们是否以 https://play.google.com/store/
开头会更容易。例如:
x = [link for link in links if link.startswith('https://play.google.com/store/')]
输出(针对您的查询):
[
'https://play.google.com/store/apps/details?id=com.auxbrain.egginc',
'https://play.google.com/store/apps/details?id=net.kairosoft.android.gamedev3en',
'https://play.google.com/store/apps/details?id=com.pixodust.games.idle.museum.tycoon.empire.art.history',
'https://play.google.com/store/apps/details?id=com.AdrianZarzycki.idle.incremental.car.industry.tycoon',
'https://play.google.com/store/apps/details?id=com.veloxia.spacecolonyidle',
'https://play.google.com/store/apps/details?id=com.uplayonline.esportslifetycoon',
'https://play.google.com/store/apps/details?id=com.codigames.hotel.empire.tycoon.idle.game',
'https://play.google.com/store/apps/details?id=com.mafgames.idle.cat.neko.manager.tycoon',
'https://play.google.com/store/apps/details?id=com.atari.mobile.rctempire',
'https://play.google.com/store/apps/details?id=com.pixodust.games.rocket.star.inc.idle.space.factory.tycoon',
'https://play.google.com/store/apps/details?id=com.idlezoo.game',
'https://play.google.com/store/apps/details?id=com.fluffyfairygames.idleminertycoon',
'https://play.google.com/store/apps/details?id=com.boomdrag.devtycoon2',
'https://play.google.com/store/apps/details?id=com.TomJarStudio.GamingShop2D',
'https://play.google.com/store/apps/details?id=com.roasterygames.smartphonetycoon2'
]
我正在尝试提取所有包含“https://play.google.com/store/”的网址并打印整个字符串。当我 运行 我当前的代码时,它只打印“https://play.google.com/store/”,但我正在寻找整个 URL。有人能指出我正确的方向吗?这是我的代码:
import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re
URL = "https://www.pocketgamer.com/android/best-tycoon-games-android/?page=3"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []
for link in soup.findAll("a", target="_blank"):
links.append(link.get('href'))
x = re.findall("https://play.google.com/store/", str(links))
print(x)
re.findall
只是 returns 与正则表达式匹配的文本部分,因此您得到的只是正则表达式中的 https://play.google.com/store/
。您可以修改正则表达式,但考虑到您要搜索的是链接列表,只检查它们是否以 https://play.google.com/store/
开头会更容易。例如:
x = [link for link in links if link.startswith('https://play.google.com/store/')]
输出(针对您的查询):
[
'https://play.google.com/store/apps/details?id=com.auxbrain.egginc',
'https://play.google.com/store/apps/details?id=net.kairosoft.android.gamedev3en',
'https://play.google.com/store/apps/details?id=com.pixodust.games.idle.museum.tycoon.empire.art.history',
'https://play.google.com/store/apps/details?id=com.AdrianZarzycki.idle.incremental.car.industry.tycoon',
'https://play.google.com/store/apps/details?id=com.veloxia.spacecolonyidle',
'https://play.google.com/store/apps/details?id=com.uplayonline.esportslifetycoon',
'https://play.google.com/store/apps/details?id=com.codigames.hotel.empire.tycoon.idle.game',
'https://play.google.com/store/apps/details?id=com.mafgames.idle.cat.neko.manager.tycoon',
'https://play.google.com/store/apps/details?id=com.atari.mobile.rctempire',
'https://play.google.com/store/apps/details?id=com.pixodust.games.rocket.star.inc.idle.space.factory.tycoon',
'https://play.google.com/store/apps/details?id=com.idlezoo.game',
'https://play.google.com/store/apps/details?id=com.fluffyfairygames.idleminertycoon',
'https://play.google.com/store/apps/details?id=com.boomdrag.devtycoon2',
'https://play.google.com/store/apps/details?id=com.TomJarStudio.GamingShop2D',
'https://play.google.com/store/apps/details?id=com.roasterygames.smartphonetycoon2'
]