只有 OpenURI 在 Reddit API 请求中成功
Only OpenURI succeeds at Reddit API request
我正在向 Reddit API 提出请求。首先,我设置了一个 subreddit top URL:
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
这些都正确获取了内容:
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
Open3.capture2('/usr/bin/curl', '--user-agent', 'My agent', reddit_url.to_s)[0]
URI.open(reddit_url, 'User-Agent' => 'My agent').read
但后来我用 URL 尝试了特定的 post:
reddit_url = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
并且 Net::HTTP
和 Open3
/curl
都失败了,只得到空字符串。 URI.open
继续工作,在网络浏览器中打开 URL 也是如此。
为什么第二个请求不适用于其中两个解决方案?为什么它与 URI.open
一起工作,而它应该是“an easy-to-use wrapper for Net::HTTP”?它有什么不同之处,以及如何使用 Net::HTTP
和 curl
?
复制它
使用您的示例,为了简单起见,重点放在 Net::HTTP 上,第一个示例并不像写的那样工作:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
# => Type Error - no implicit conversion of URI::HTTPS into String
相反,我以此为起点:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007fc3ea8e7320>
puts result.body.size
# => 167,394
有了这个工作,我们可以尝试第二个 URL。有趣的是,根据我是重新使用初始连接还是建立新连接,我得到了不同的结果:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
reddit_url_two = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007f931a143390>
puts result.body.size
# => 174,615
http_two = Net::HTTP.new(reddit_url_two.host, reddit_url_two.port)
http_two.use_ssl = true
result_two = http_two.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_two
# => #<Net::HTTPMovedPermanently:0x00007f931a148818>
puts result_two.body.size
# => 0
result_reusing_connection = http.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_reusing_connection
# => #<Net::HTTPOK:0x00007f931a0fb3b0>
puts result_reusing_connection.body.size
# => 141,575
所以我怀疑您有时会收到 301 重定向,这是造成混乱的原因。还有另一个 question and answer here 是关于如何跟随重定向的。
我正在向 Reddit API 提出请求。首先,我设置了一个 subreddit top URL:
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
这些都正确获取了内容:
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
Open3.capture2('/usr/bin/curl', '--user-agent', 'My agent', reddit_url.to_s)[0]
URI.open(reddit_url, 'User-Agent' => 'My agent').read
但后来我用 URL 尝试了特定的 post:
reddit_url = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
并且 Net::HTTP
和 Open3
/curl
都失败了,只得到空字符串。 URI.open
继续工作,在网络浏览器中打开 URL 也是如此。
为什么第二个请求不适用于其中两个解决方案?为什么它与 URI.open
一起工作,而它应该是“an easy-to-use wrapper for Net::HTTP”?它有什么不同之处,以及如何使用 Net::HTTP
和 curl
?
使用您的示例,为了简单起见,重点放在 Net::HTTP 上,第一个示例并不像写的那样工作:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
# => Type Error - no implicit conversion of URI::HTTPS into String
相反,我以此为起点:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007fc3ea8e7320>
puts result.body.size
# => 167,394
有了这个工作,我们可以尝试第二个 URL。有趣的是,根据我是重新使用初始连接还是建立新连接,我得到了不同的结果:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
reddit_url_two = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007f931a143390>
puts result.body.size
# => 174,615
http_two = Net::HTTP.new(reddit_url_two.host, reddit_url_two.port)
http_two.use_ssl = true
result_two = http_two.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_two
# => #<Net::HTTPMovedPermanently:0x00007f931a148818>
puts result_two.body.size
# => 0
result_reusing_connection = http.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_reusing_connection
# => #<Net::HTTPOK:0x00007f931a0fb3b0>
puts result_reusing_connection.body.size
# => 141,575
所以我怀疑您有时会收到 301 重定向,这是造成混乱的原因。还有另一个 question and answer here 是关于如何跟随重定向的。