收集 subreddit 标题（批量）的最佳方式是什么

Question

我想收集 subreddit 上所有帖子的标题，我想知道最好的方法是什么？

我环顾四周，发现了一些关于 Python 和机器人的内容。我也简要地浏览了 API，但不确定该往哪个方向走。

因为我不想承诺找出 90% 的方法是行不通的，所以我问是否有人可以指出我正确的语言方向和额外的东西，比如任何需要的软件，例如 pip for Python.

我自己的经验是使用网络语言，例如 PHP 所以我最初认为网络应用程序可以解决问题，但我不确定这是否是最好的方法以及如何去做。

所以我的问题是

What would be the best way to collect the titles (in bulk) of a subreddit?

或者太主观了

How do I retrieve and store all the post titles of a subreddit?

最好需要：

做超过 1 页的 (25) 个结果
保存到 .txt 文件

提前致谢。

Answer 1

PHP；在 25 行中：

$subreddit = 'pokemon';
$max_pages = 10;

// Set variables with default data
$page = 0;
$after = '';
$titles = '';
do {
    $url = 'http://www.reddit.com/r/' . $subreddit . '/new.json?limit=25&after=' . $after;

    // Set URL you want to fetch
    $ch = curl_init($url);

    // Set curl option of of header to false (don't need them)
    curl_setopt($ch, CURLOPT_HEADER, 0);

    // Set curl option of nobody to false as we need the body
    curl_setopt($ch, CURLOPT_NOBODY, 0);

    // Set curl timeout of 5 seconds
    curl_setopt($ch, CURLOPT_TIMEOUT, 5);

    // Set curl to return output as string
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    // Execute curl
    $output = curl_exec($ch);

    // Get HTTP code of request
    $status = curl_getinfo($ch, CURLINFO_HTTP_CODE);

    // Close curl
    curl_close($ch);

    // If http code is 200 (success)
    if ($status == 200) {
        // Decode JSON into PHP object
        $json = json_decode($output);
        // Set after for next curl iteration (reddit's pagination)
        $after = $json->data->after;
        // Loop though each post and output title
        foreach ($json->data->children as $k => $v) {
            $titles .= $v->data->title . "\n";
        }
    }
    // Increment page number
    $page++;
// Loop though whilst current page number is less than maximum pages
} while ($page < $max_pages);

// Save titles to text file
file_put_contents(dirname(__FILE__) . '/' . $subreddit . '.txt', $titles);

收集 subreddit 标题（批量）的最佳方式是什么

What would be the best way to collect the titles (in bulk) of a subreddit

php

python

automation

bots

reddit