使用 httr 对 github 个私有存储库进行身份验证
authentication to github private repositories with httr
我正在尝试使用 httr
访问 Github 上的私有存储库。如果我添加 github 令牌(作为环境变量存储在 GITHUB_TOKEN
中),我可以毫无问题地这样做:
httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN"))))
但是,如果我尝试指定另一个 header,则会出现错误。在这种情况下,我想下载与版本关联的二进制文件("asset",在 github 术语中):
httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN"))),
httr::add_headers(Accept = "application/octet-stream"))
?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message>
这只是消息的一部分(其余部分包括我的令牌)。
显然我的授权被发送了两次!我怎样才能防止这种情况发生?跟httr::handle_pool()
有关系吗
编辑 -- 连接信息
原来的请求似乎收到了包含签名的回复。然后将此签名连同我的令牌发回,从而导致错误。 these people
也发生了类似的事情
-> GET /repos/aammd/miniature-meme/releases/assets/2859674 HTTP/1.1
-> Host: api.github.com
-> User-Agent: libcurl/7.43.0 r-curl/2.3 httr/1.2.1.9000
-> Accept-Encoding: gzip, deflate
-> Authorization: token tttttttt
-> Accept: application/octet-stream
->
<- HTTP/1.1 302 Found
<- Server: GitHub.com
<- Date: Tue, 17 Jan 2017 13:28:12 GMT
<- Content-Type: text/html;charset=utf-8
<- Content-Length: 0
<- Status: 302 Found
<- X-RateLimit-Limit: 5000
<- X-RateLimit-Remaining: 4984
<- X-RateLimit-Reset: 1484662101
<- location: https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170117%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170117T132812Z&X-Amz-Expires=300&X-Amz-Signature=ssssssssss&X-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream
<- Access-Control-Expose-Headers: ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval
<- Access-Control-Allow-Origin: *
<- Content-Security-Policy: default-src 'none'
<- Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
<- X-Content-Type-Options: nosniff
<- X-Frame-Options: deny
<- X-XSS-Protection: 1; mode=block
<- Vary: Accept-Encoding
<- X-Served-By: 3e3b9690823fb031da84658eb58aa83b
<- X-GitHub-Request-Id: 82782802:6E1B:E9F0BE:587E1BEC
<-
-> GET /releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170117%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170117T132812Z&X-Amz-Expires=300&X-Amz-Signature=sssssssssssssss&X-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream HTTP/1.1
-> Host: github-cloud.s3.amazonaws.com
-> User-Agent: libcurl/7.43.0 r-curl/2.3 httr/1.2.1.9000
-> Accept-Encoding: gzip, deflate
-> Authorization: token ttttttttttttt
-> Accept: application/octet-stream
->
<- HTTP/1.1 400 Bad Request
<- x-amz-request-id: FA56B3D23B468704
<- x-amz-id-2: 49X1mT5j5BrZ4HApeR/+wb7iVOWA8yn1obrgMoeOy44RH414bo/Ov8AAWSx2baEXO0H/WHX5jK0=
<- Content-Type: application/xml
<- Transfer-Encoding: chunked
<- Date: Tue, 17 Jan 2017 13:28:12 GMT
<- Connection: close
<- Server: AmazonS3
<-
gh 也不行
我创建了一个 public 存储库来测试这个想法。 JSON 可以从 API 返回,但不是二进制文件:
# this works fine
gh::gh("https://api.github.com/repos/aammd/test_idea/releases/assets/2998763")
# this does not
gh::gh("https://api.github.com/repos/aammd/test_idea/releases/assets/2998763", .send_headers = c("Accept" = "application/octet-stream"))
wget
可能有效,但是
我找到了 shows how to do this with wget 的要点。关键组件似乎是:
wget -q --auth-no-challenge --header='Accept:application/octet-stream' \
https://$TOKEN:@api.github.com/repos/$REPO/releases/assets/$asset_id \
-O
但是,如果我尝试在 httr::GET
中复制它,我不会成功:
auth_url <- sprintf("https://%s:@api.github.com/repos/aammd/miniature-meme/releases/assets/2859674", Sys.getenv("GITHUB_TOKEN"))
httr::GET(auth_url,
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
从 R 调用 wget
确实 可行,但这个解决方案并不完全令人满意,因为我不能保证我的所有用户都安装了 wget
(除非有办法做到这一点?)。
system(sprintf("wget --auth-no-challenge --header='Accept:application/octet-stream' %s -O testwget.rds", auth_url))
此处包含 wget 的输出(注意上面 -q
的缺失)(同样,希望令牌和签名经过编辑):
--2017-01-18 13:21:55-- https://ttttt:*password*@api.github.com/repos/aammd/miniature-meme/releases/assets/2859674
Resolving api.github.com... 192.30.253.117, 192.30.253.116
Connecting to api.github.com|192.30.253.117|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170118T122156Z&X-Amz-Expires=300&X-Amz-Signature=SSSSSSSS-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream [following]
--2017-01-18 13:21:55-- https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170118T122156Z&X-Amz-Expires=300&X-Amz-Signature=SSSSSSSSSSSS-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream
Resolving github-cloud.s3.amazonaws.com... 52.216.226.120
Connecting to github-cloud.s3.amazonaws.com|52.216.226.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 682 [application/octet-stream]
Saving to: ‘testwget.rds’
0K 100% 15.5M=0s
2017-01-18 13:21:56 (15.5 MB/s) - ‘testwget.rds’ saved [682/682]
尝试发送授权令牌作为查询参数而不是授权 header。这样,当 GitHub 的 Oauth 重定向您时,它将去除原始令牌,并且 X-Amz-Algorithm 参数将被留下来完成它的工作。
httr::GET(paste("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674?access_token=", Sys.getenv("GITHUB_TOKEN")),
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"))
事实证明,这个问题有两种可能的解决方案!
方案一:token作为参数
按照@user7433058的建议,我们确实可以将令牌作为参数传递! 但是请注意 我们必须使用 paste0
。这是 Github 自己建议的方法 on their API documentation
## pass oauth in the url
httr::GET(paste0("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674?access_token=", Sys.getenv("GITHUB_TOKEN")),
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
tt <- readRDS("test.rds")
方案二:再问
另一种解决方案是第一次发出请求,然后提取 URL 并使用它发出第二次请求。由于问题是由两次发送授权信息引起的——一次在 URL 中,一次在 header 中——我们可以通过仅使用 URL.[=17= 来避免该问题]
## alternatively, get the query url (containing signature) from the (failed) html request made the first time
firsttry <- httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN")),
Accept = "application/octet-stream"))
httr::GET(firsttry$url, httr::write_disk("test.rds", overwrite = TRUE),
httr::write_disk("test2.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
tt2 <- readRDS("test2.rds")
我想这效率有点低(总共发出 3 个请求,而不是 2 个)。但是,由于只有第一个请求是针对实际的 github API,因此它只对您的 rate-limiting 步骤计为 1。
一个小的改进:没有来自 httr 的重定向
如果您告诉 httr
不要遵循重定向,我们只能发出 2 个而不是 3 个 http 请求。为此,请在两个请求中的第一个请求中使用 httr::config(followlocation = FALSE)
(即获取 firsttry
)
我正在尝试使用 httr
访问 Github 上的私有存储库。如果我添加 github 令牌(作为环境变量存储在 GITHUB_TOKEN
中),我可以毫无问题地这样做:
httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN"))))
但是,如果我尝试指定另一个 header,则会出现错误。在这种情况下,我想下载与版本关联的二进制文件("asset",在 github 术语中):
httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN"))),
httr::add_headers(Accept = "application/octet-stream"))
?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message>
这只是消息的一部分(其余部分包括我的令牌)。
显然我的授权被发送了两次!我怎样才能防止这种情况发生?跟httr::handle_pool()
编辑 -- 连接信息
原来的请求似乎收到了包含签名的回复。然后将此签名连同我的令牌发回,从而导致错误。 these people
也发生了类似的事情-> GET /repos/aammd/miniature-meme/releases/assets/2859674 HTTP/1.1
-> Host: api.github.com
-> User-Agent: libcurl/7.43.0 r-curl/2.3 httr/1.2.1.9000
-> Accept-Encoding: gzip, deflate
-> Authorization: token tttttttt
-> Accept: application/octet-stream
->
<- HTTP/1.1 302 Found
<- Server: GitHub.com
<- Date: Tue, 17 Jan 2017 13:28:12 GMT
<- Content-Type: text/html;charset=utf-8
<- Content-Length: 0
<- Status: 302 Found
<- X-RateLimit-Limit: 5000
<- X-RateLimit-Remaining: 4984
<- X-RateLimit-Reset: 1484662101
<- location: https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170117%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170117T132812Z&X-Amz-Expires=300&X-Amz-Signature=ssssssssss&X-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream
<- Access-Control-Expose-Headers: ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval
<- Access-Control-Allow-Origin: *
<- Content-Security-Policy: default-src 'none'
<- Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
<- X-Content-Type-Options: nosniff
<- X-Frame-Options: deny
<- X-XSS-Protection: 1; mode=block
<- Vary: Accept-Encoding
<- X-Served-By: 3e3b9690823fb031da84658eb58aa83b
<- X-GitHub-Request-Id: 82782802:6E1B:E9F0BE:587E1BEC
<-
-> GET /releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170117%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170117T132812Z&X-Amz-Expires=300&X-Amz-Signature=sssssssssssssss&X-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream HTTP/1.1
-> Host: github-cloud.s3.amazonaws.com
-> User-Agent: libcurl/7.43.0 r-curl/2.3 httr/1.2.1.9000
-> Accept-Encoding: gzip, deflate
-> Authorization: token ttttttttttttt
-> Accept: application/octet-stream
->
<- HTTP/1.1 400 Bad Request
<- x-amz-request-id: FA56B3D23B468704
<- x-amz-id-2: 49X1mT5j5BrZ4HApeR/+wb7iVOWA8yn1obrgMoeOy44RH414bo/Ov8AAWSx2baEXO0H/WHX5jK0=
<- Content-Type: application/xml
<- Transfer-Encoding: chunked
<- Date: Tue, 17 Jan 2017 13:28:12 GMT
<- Connection: close
<- Server: AmazonS3
<-
gh 也不行
我创建了一个 public 存储库来测试这个想法。 JSON 可以从 API 返回,但不是二进制文件:
# this works fine
gh::gh("https://api.github.com/repos/aammd/test_idea/releases/assets/2998763")
# this does not
gh::gh("https://api.github.com/repos/aammd/test_idea/releases/assets/2998763", .send_headers = c("Accept" = "application/octet-stream"))
wget
可能有效,但是
我找到了 shows how to do this with wget 的要点。关键组件似乎是:
wget -q --auth-no-challenge --header='Accept:application/octet-stream' \
https://$TOKEN:@api.github.com/repos/$REPO/releases/assets/$asset_id \
-O
但是,如果我尝试在 httr::GET
中复制它,我不会成功:
auth_url <- sprintf("https://%s:@api.github.com/repos/aammd/miniature-meme/releases/assets/2859674", Sys.getenv("GITHUB_TOKEN"))
httr::GET(auth_url,
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
从 R 调用 wget
确实 可行,但这个解决方案并不完全令人满意,因为我不能保证我的所有用户都安装了 wget
(除非有办法做到这一点?)。
system(sprintf("wget --auth-no-challenge --header='Accept:application/octet-stream' %s -O testwget.rds", auth_url))
此处包含 wget 的输出(注意上面 -q
的缺失)(同样,希望令牌和签名经过编辑):
--2017-01-18 13:21:55-- https://ttttt:*password*@api.github.com/repos/aammd/miniature-meme/releases/assets/2859674
Resolving api.github.com... 192.30.253.117, 192.30.253.116
Connecting to api.github.com|192.30.253.117|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170118T122156Z&X-Amz-Expires=300&X-Amz-Signature=SSSSSSSS-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream [following]
--2017-01-18 13:21:55-- https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170118T122156Z&X-Amz-Expires=300&X-Amz-Signature=SSSSSSSSSSSS-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream
Resolving github-cloud.s3.amazonaws.com... 52.216.226.120
Connecting to github-cloud.s3.amazonaws.com|52.216.226.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 682 [application/octet-stream]
Saving to: ‘testwget.rds’
0K 100% 15.5M=0s
2017-01-18 13:21:56 (15.5 MB/s) - ‘testwget.rds’ saved [682/682]
尝试发送授权令牌作为查询参数而不是授权 header。这样,当 GitHub 的 Oauth 重定向您时,它将去除原始令牌,并且 X-Amz-Algorithm 参数将被留下来完成它的工作。
httr::GET(paste("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674?access_token=", Sys.getenv("GITHUB_TOKEN")),
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"))
事实证明,这个问题有两种可能的解决方案!
方案一:token作为参数
按照@user7433058的建议,我们确实可以将令牌作为参数传递! 但是请注意 我们必须使用 paste0
。这是 Github 自己建议的方法 on their API documentation
## pass oauth in the url
httr::GET(paste0("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674?access_token=", Sys.getenv("GITHUB_TOKEN")),
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
tt <- readRDS("test.rds")
方案二:再问
另一种解决方案是第一次发出请求,然后提取 URL 并使用它发出第二次请求。由于问题是由两次发送授权信息引起的——一次在 URL 中,一次在 header 中——我们可以通过仅使用 URL.[=17= 来避免该问题]
## alternatively, get the query url (containing signature) from the (failed) html request made the first time
firsttry <- httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN")),
Accept = "application/octet-stream"))
httr::GET(firsttry$url, httr::write_disk("test.rds", overwrite = TRUE),
httr::write_disk("test2.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
tt2 <- readRDS("test2.rds")
我想这效率有点低(总共发出 3 个请求,而不是 2 个)。但是,由于只有第一个请求是针对实际的 github API,因此它只对您的 rate-limiting 步骤计为 1。
一个小的改进:没有来自 httr 的重定向
如果您告诉 httr
不要遵循重定向,我们只能发出 2 个而不是 3 个 http 请求。为此,请在两个请求中的第一个请求中使用 httr::config(followlocation = FALSE)
(即获取 firsttry
)