为什么我的 Newspaper3k 代码不能用于新闻周刊?
Why isn't my Newspaper3k code working with Newsweek?
我正在使用 Jupyter Notebook 工作,报纸有问题无法从新闻周刊中提取任何内容。我可以在 Goose 上获得它 运行,但我想有一个备份以防 Goose 失败。
我试过 Fox、Yahoo 和 CNN 等其他网站,都可以正常使用。所以新闻周刊是一个孤立的问题。
from newspaper import Article
url = 'https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-
calling-daughter-trump-press-secretary-sarah-sanders-1444184'
article = Article(url)
article.download()
article.html
article.parse()
article.text
Article `download()` failed with 403 Client Error: Forbidden for url:
https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter-
trump-press-secretary-sarah-sanders-1444184 on URL
https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter-
trump-press-secretary-sarah-sanders-1444184
您可能已经解决了这个问题,但这与您在 Newspaper.
请求文章时未通过用户代理直接相关
from newspaper import Article
from newspaper import Config
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'
config = Config()
config.browser_user_agent = user_agent
url = 'https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter-trump-press-secretary-sarah-sanders-1444184'
article = Article(url, config=config)
article.download()
article.html
article.parse()
article.text
我正在使用 Jupyter Notebook 工作,报纸有问题无法从新闻周刊中提取任何内容。我可以在 Goose 上获得它 运行,但我想有一个备份以防 Goose 失败。
我试过 Fox、Yahoo 和 CNN 等其他网站,都可以正常使用。所以新闻周刊是一个孤立的问题。
from newspaper import Article
url = 'https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-
calling-daughter-trump-press-secretary-sarah-sanders-1444184'
article = Article(url)
article.download()
article.html
article.parse()
article.text
Article `download()` failed with 403 Client Error: Forbidden for url:
https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter-
trump-press-secretary-sarah-sanders-1444184 on URL
https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter-
trump-press-secretary-sarah-sanders-1444184
您可能已经解决了这个问题,但这与您在 Newspaper.
请求文章时未通过用户代理直接相关from newspaper import Article
from newspaper import Config
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'
config = Config()
config.browser_user_agent = user_agent
url = 'https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter-trump-press-secretary-sarah-sanders-1444184'
article = Article(url, config=config)
article.download()
article.html
article.parse()
article.text