抓取隔离的 subreddits
Scraping quarantined subreddits
我正在从事一个错误信息项目,我想抓取几个隔离区 subreddits(特别是 r/russsia)。
当我按照 praw docs 上发布的指南进行操作时,我收到 prawcore.exceptions.Forbidden: received 403 HTTP response
错误。
我在 3 多年前看到了一些关于在浏览器上手动添加 subreddit 并使用 quarn.opt_in()
的解决方案,但没有成功。下面是代码片段:
reddit = praw.Reddit(user_agent='Comment Extraction (by /u/guy_asking_on_Whosebug)',
client_id=sec.reddit_client_id, client_secret=sec.reddit_client_secret)
subred = reddit.subreddit(subreddit)
subred.quaran.opt_in() # error happens here
# for post in subred.top(limit=10): ERROR HAPPENS BEFORE, KEPT FOR POST HISTORY
# pass # error happens here
subred
是 praw.models.reddit.subreddit.Subreddit
类型,但不会 return 提交。
有什么解决方案吗?
完全错误:
Traceback (most recent call last):
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3361, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-9de81e112c74>", line 1, in <cell line: 1>
for post in subred.top(limit=10):
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/models/listing/generator.py", line 63, in __next__
self._next_batch()
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/models/listing/generator.py", line 73, in _next_batch
self._listing = self._reddit.get(self.url, params=self.params)
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 595, in get
return self._objectify_request(method="GET", params=params, path=path)
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 696, in _objectify_request
self.request(
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 885, in request
return self._core.request(
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/prawcore/sessions.py", line 330, in request
return self._request_with_retries(
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/prawcore/sessions.py", line 266, in _request_with_retries
raise self.STATUS_EXCEPTIONS[response.status_code](response)
prawcore.exceptions.Forbidden: received 403 HTTP response
要抓取隔离的 subreddits,您的客户端不能是只读的。
您还可以通过提供帐户用户名和密码来让您的客户获得完全授权。
reddit = praw.Reddit(user_agent='Comment Extraction (by /u/guy_asking_on_Whosebug)',
client_id=sec.reddit_client_id, client_secret=sec.reddit_client_secret,
password=sec.reddit_password, username=sec.reddit_username)
https://praw.readthedocs.io/en/stable/getting_started/authentication.html#password-flow
我正在从事一个错误信息项目,我想抓取几个隔离区 subreddits(特别是 r/russsia)。
当我按照 praw docs 上发布的指南进行操作时,我收到 prawcore.exceptions.Forbidden: received 403 HTTP response
错误。
我在 3 多年前看到了一些关于在浏览器上手动添加 subreddit 并使用 quarn.opt_in()
的解决方案,但没有成功。下面是代码片段:
reddit = praw.Reddit(user_agent='Comment Extraction (by /u/guy_asking_on_Whosebug)',
client_id=sec.reddit_client_id, client_secret=sec.reddit_client_secret)
subred = reddit.subreddit(subreddit)
subred.quaran.opt_in() # error happens here
# for post in subred.top(limit=10): ERROR HAPPENS BEFORE, KEPT FOR POST HISTORY
# pass # error happens here
subred
是 praw.models.reddit.subreddit.Subreddit
类型,但不会 return 提交。
有什么解决方案吗?
完全错误:
Traceback (most recent call last):
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3361, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-9de81e112c74>", line 1, in <cell line: 1>
for post in subred.top(limit=10):
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/models/listing/generator.py", line 63, in __next__
self._next_batch()
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/models/listing/generator.py", line 73, in _next_batch
self._listing = self._reddit.get(self.url, params=self.params)
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 595, in get
return self._objectify_request(method="GET", params=params, path=path)
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 696, in _objectify_request
self.request(
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 885, in request
return self._core.request(
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/prawcore/sessions.py", line 330, in request
return self._request_with_retries(
File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/prawcore/sessions.py", line 266, in _request_with_retries
raise self.STATUS_EXCEPTIONS[response.status_code](response)
prawcore.exceptions.Forbidden: received 403 HTTP response
要抓取隔离的 subreddits,您的客户端不能是只读的。
您还可以通过提供帐户用户名和密码来让您的客户获得完全授权。
reddit = praw.Reddit(user_agent='Comment Extraction (by /u/guy_asking_on_Whosebug)',
client_id=sec.reddit_client_id, client_secret=sec.reddit_client_secret,
password=sec.reddit_password, username=sec.reddit_username)
https://praw.readthedocs.io/en/stable/getting_started/authentication.html#password-flow