收集 subreddit 标题(批量)的最佳方式是什么
What would be the best way to collect the titles (in bulk) of a subreddit
我想收集 subreddit 上所有帖子的标题,我想知道最好的方法是什么?
我环顾四周,发现了一些关于 Python 和机器人的内容。我也简要地浏览了 API,但不确定该往哪个方向走。
因为我不想承诺找出 90% 的方法是行不通的,所以我问是否有人可以指出我正确的语言方向和额外的东西,比如任何需要的软件,例如 pip for Python.
我自己的经验是使用网络语言,例如 PHP 所以我最初认为网络应用程序可以解决问题,但我不确定这是否是最好的方法以及如何去做。
所以我的问题是
What would be the best way to collect the titles (in bulk) of a
subreddit?
或者太主观了
How do I retrieve and store all the post titles of a subreddit?
最好需要:
- 做超过 1 页的 (25) 个结果
- 保存到 .txt 文件
提前致谢。
PHP;在 25 行中:
$subreddit = 'pokemon';
$max_pages = 10;
// Set variables with default data
$page = 0;
$after = '';
$titles = '';
do {
$url = 'http://www.reddit.com/r/' . $subreddit . '/new.json?limit=25&after=' . $after;
// Set URL you want to fetch
$ch = curl_init($url);
// Set curl option of of header to false (don't need them)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Set curl option of nobody to false as we need the body
curl_setopt($ch, CURLOPT_NOBODY, 0);
// Set curl timeout of 5 seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
// Set curl to return output as string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Execute curl
$output = curl_exec($ch);
// Get HTTP code of request
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
// Close curl
curl_close($ch);
// If http code is 200 (success)
if ($status == 200) {
// Decode JSON into PHP object
$json = json_decode($output);
// Set after for next curl iteration (reddit's pagination)
$after = $json->data->after;
// Loop though each post and output title
foreach ($json->data->children as $k => $v) {
$titles .= $v->data->title . "\n";
}
}
// Increment page number
$page++;
// Loop though whilst current page number is less than maximum pages
} while ($page < $max_pages);
// Save titles to text file
file_put_contents(dirname(__FILE__) . '/' . $subreddit . '.txt', $titles);
我想收集 subreddit 上所有帖子的标题,我想知道最好的方法是什么?
我环顾四周,发现了一些关于 Python 和机器人的内容。我也简要地浏览了 API,但不确定该往哪个方向走。
因为我不想承诺找出 90% 的方法是行不通的,所以我问是否有人可以指出我正确的语言方向和额外的东西,比如任何需要的软件,例如 pip for Python.
我自己的经验是使用网络语言,例如 PHP 所以我最初认为网络应用程序可以解决问题,但我不确定这是否是最好的方法以及如何去做。
所以我的问题是
What would be the best way to collect the titles (in bulk) of a subreddit?
或者太主观了
How do I retrieve and store all the post titles of a subreddit?
最好需要:
- 做超过 1 页的 (25) 个结果
- 保存到 .txt 文件
提前致谢。
PHP;在 25 行中:
$subreddit = 'pokemon';
$max_pages = 10;
// Set variables with default data
$page = 0;
$after = '';
$titles = '';
do {
$url = 'http://www.reddit.com/r/' . $subreddit . '/new.json?limit=25&after=' . $after;
// Set URL you want to fetch
$ch = curl_init($url);
// Set curl option of of header to false (don't need them)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Set curl option of nobody to false as we need the body
curl_setopt($ch, CURLOPT_NOBODY, 0);
// Set curl timeout of 5 seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
// Set curl to return output as string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Execute curl
$output = curl_exec($ch);
// Get HTTP code of request
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
// Close curl
curl_close($ch);
// If http code is 200 (success)
if ($status == 200) {
// Decode JSON into PHP object
$json = json_decode($output);
// Set after for next curl iteration (reddit's pagination)
$after = $json->data->after;
// Loop though each post and output title
foreach ($json->data->children as $k => $v) {
$titles .= $v->data->title . "\n";
}
}
// Increment page number
$page++;
// Loop though whilst current page number is less than maximum pages
} while ($page < $max_pages);
// Save titles to text file
file_put_contents(dirname(__FILE__) . '/' . $subreddit . '.txt', $titles);