图 API POST 抓取请求间歇性失败
Graph API POST request to scrape fails intermittently
我的组织运营着一个新闻网站,该网站要求 Facebook 通过我们的发布流程抓取我们的内容。此请求通过一批请求发送到图形 API。该批次包含每个已发布文章的 URL 以及设置为 POST.
的 HTTP 方法
请求似乎总是 return 200 OK,但内容并不总是被 Facebook 正确抓取。这是用户点击我们应用程序前端文章的分享按钮的证据。 Facebook 分享对话框显示
过时或默认值。正确的值应该反映我们的 Open Graph 元标记。
可靠地解决这个问题的唯一方法是使用 Facebook 共享调试器重新抓取似乎每次都有效的内容。
Facebook 的一位开发人员表示问题出在我们的实施上,如下所示:
private static async Task RefreshFacebookCacheAsync(IList<string> newsReleaseUris, CancellationToken token)
{
string appId = "our-app-id";
string appSecret = "our-app-secret";
bool newsSiteIsNotAccessibleFromFacebook = string.IsNullOrEmpty(appId) || string.IsNullOrEmpty(appSecret);
if (newsSiteIsNotAccessibleFromFacebook)
{
// removed for brevity...
}
string graphUrl = "https://graph.facebook.com/?access_token" + $"{appId}|{appSecret}";
HttpResponseMessage response = null;
try
{
// set the accept type and create the batch operations
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var requests = newsReleaseUris.Select(u => new { method = "POST", relative_url = $"?id={u}&scrape=true" }).ToList();
var serializedRequests = JsonConvert.SerializeObject(requests);
// encode the batch operations
var queryParams = new Dictionary<string, string>
{
{ "batch", serializedRequests }
};
// send the request and wait for the response
response = await client.PostAsync(graphUrl, new FormUrlEncodedContent(queryParams), token);
var reply = await response.Content.ReadAsStringAsync();
}
catch (TaskCanceledException tce)
{
// logging code goes here
}
}
// Calling code for this method:
await Task.Run(async () =>
{
cts = new CancellationTokenSource();
cts.CancelAfter(60000);
await RefreshFacebookCacheAsync(releaseUris, cts.Token);
cts = null;
});
编辑:
这是来自 Facebook 的回复,从回复正文中删除了详细信息:
[
{
"code": 200,
"headers": [
{
"name": "Access-Control-Allow-Origin",
"value": "*"
},
{
"name": "Strict-Transport-Security",
"value": "max-age=15552000; preload"
},
{
"name": "Expires",
"value": "Sat, 01 Jan 2000 00:00:00 GMT"
},
{
"name": "Content-Type",
"value": "text/javascript; charset=UTF-8"
},
{
"name": "Facebook-API-Version",
"value": "v2.12"
},
{
"name": "Cache-Control",
"value": "private, no-cache, no-store, must-revalidate"
},
{
"name": "Vary",
"value": "Accept-Encoding"
},
{
"name": "Pragma",
"value": "no-cache"
}
],
"body": "{\"url\":\"https:\/\/news.gov.bc.ca\/releases\/2017PREM0002-000050\",\"type\":\"article\",\"title\":\"Five conditions secure coastal protection and economic benefits for all British Columbians\",\"image\":[{\"url\":\"https:\/\/farm1.staticflickr.com\/376\/31446050403_570a0f0cac_b.jpg\"}],\"description\":\"Following the Trudeau government\u2019s approval of Kinder Morgan\u2019s Trans Mountain Pipeline Project, the Province\u2019s clear, consistent and principled position on its five conditions has resulted in tangible and significant investments that will protect British Columbia\u2019s environmental and economic interests.\",\"updated_time\":\"2018-02-19T17:08:49+0000\",\"pages\":[{\"name\":\"Government of British Columbia\",\"url\":\"https:\/\/www.facebook.com\/BCProvincialGovernment\/\"}]}"
}
]
这归结为我们软件中的计时问题。由于我们的发布过程和我们 API 请求的时间略有延迟,Facebook 正在抓取过时的信息。
我的组织运营着一个新闻网站,该网站要求 Facebook 通过我们的发布流程抓取我们的内容。此请求通过一批请求发送到图形 API。该批次包含每个已发布文章的 URL 以及设置为 POST.
的 HTTP 方法请求似乎总是 return 200 OK,但内容并不总是被 Facebook 正确抓取。这是用户点击我们应用程序前端文章的分享按钮的证据。 Facebook 分享对话框显示 过时或默认值。正确的值应该反映我们的 Open Graph 元标记。
可靠地解决这个问题的唯一方法是使用 Facebook 共享调试器重新抓取似乎每次都有效的内容。
Facebook 的一位开发人员表示问题出在我们的实施上,如下所示:
private static async Task RefreshFacebookCacheAsync(IList<string> newsReleaseUris, CancellationToken token)
{
string appId = "our-app-id";
string appSecret = "our-app-secret";
bool newsSiteIsNotAccessibleFromFacebook = string.IsNullOrEmpty(appId) || string.IsNullOrEmpty(appSecret);
if (newsSiteIsNotAccessibleFromFacebook)
{
// removed for brevity...
}
string graphUrl = "https://graph.facebook.com/?access_token" + $"{appId}|{appSecret}";
HttpResponseMessage response = null;
try
{
// set the accept type and create the batch operations
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var requests = newsReleaseUris.Select(u => new { method = "POST", relative_url = $"?id={u}&scrape=true" }).ToList();
var serializedRequests = JsonConvert.SerializeObject(requests);
// encode the batch operations
var queryParams = new Dictionary<string, string>
{
{ "batch", serializedRequests }
};
// send the request and wait for the response
response = await client.PostAsync(graphUrl, new FormUrlEncodedContent(queryParams), token);
var reply = await response.Content.ReadAsStringAsync();
}
catch (TaskCanceledException tce)
{
// logging code goes here
}
}
// Calling code for this method:
await Task.Run(async () =>
{
cts = new CancellationTokenSource();
cts.CancelAfter(60000);
await RefreshFacebookCacheAsync(releaseUris, cts.Token);
cts = null;
});
编辑:
这是来自 Facebook 的回复,从回复正文中删除了详细信息:
[
{
"code": 200,
"headers": [
{
"name": "Access-Control-Allow-Origin",
"value": "*"
},
{
"name": "Strict-Transport-Security",
"value": "max-age=15552000; preload"
},
{
"name": "Expires",
"value": "Sat, 01 Jan 2000 00:00:00 GMT"
},
{
"name": "Content-Type",
"value": "text/javascript; charset=UTF-8"
},
{
"name": "Facebook-API-Version",
"value": "v2.12"
},
{
"name": "Cache-Control",
"value": "private, no-cache, no-store, must-revalidate"
},
{
"name": "Vary",
"value": "Accept-Encoding"
},
{
"name": "Pragma",
"value": "no-cache"
}
],
"body": "{\"url\":\"https:\/\/news.gov.bc.ca\/releases\/2017PREM0002-000050\",\"type\":\"article\",\"title\":\"Five conditions secure coastal protection and economic benefits for all British Columbians\",\"image\":[{\"url\":\"https:\/\/farm1.staticflickr.com\/376\/31446050403_570a0f0cac_b.jpg\"}],\"description\":\"Following the Trudeau government\u2019s approval of Kinder Morgan\u2019s Trans Mountain Pipeline Project, the Province\u2019s clear, consistent and principled position on its five conditions has resulted in tangible and significant investments that will protect British Columbia\u2019s environmental and economic interests.\",\"updated_time\":\"2018-02-19T17:08:49+0000\",\"pages\":[{\"name\":\"Government of British Columbia\",\"url\":\"https:\/\/www.facebook.com\/BCProvincialGovernment\/\"}]}"
}
]
这归结为我们软件中的计时问题。由于我们的发布过程和我们 API 请求的时间略有延迟,Facebook 正在抓取过时的信息。