以编程方式在 R 中抓取响应 header
Programmatically scraping a response header within R
我正在尝试仅使用 R 及其 curl-based 网络抓取库来访问下方屏幕截图中突出显示的 响应 header: 位置 文本。通过访问 http://www.worldvaluessurvey.org/WVSDocumentationWVL.jsp,点击任何数据文件的下载,然后填写协议表格,可以在任何网络浏览器中轻松达到这一点。下载会在 Web 浏览器中自动开始。
我认为获得有效 cookie 的唯一方法是使用 library(curlconverter)
(参见 ),但该答案似乎不足以以编程方式确定 http url 的文件,只有在知道压缩文件后才下载压缩文件。
我在下面粘贴了一些代码,其中包含我玩过的不同 httr 和 curl 转换器代码,但我在这里遗漏了一些东西。同样,唯一的目标是以编程方式确定完全在 R (cross-platform) 中突出显示的文本。
library(curlconverter)
library(httr)
browserPOST <-
"curl 'http://www.worldvaluessurvey.org/AJDownload.jsp'
-H 'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
-H 'Accept-Encoding:gzip, deflate'
-H 'Accept-Language:en-US,en;q=0.8'
-H 'Cache-Control:max-age=0'
--compressed -H 'Connection:keep-alive'
-H 'Content-Length:188'
-H 'Content-Type:application/x-www-form-urlencoded'
-H 'Cookie:ASPSESSIONIDCASQAACD=IBLGBFOAEHFILMMJJCFEOEMI; JSESSIONID=50DABDEDD0B2FC370C415B4BD1855260; __atuvc=13%7C45; __atuvs=58224f37d312c42400c'
-H 'Host:www.worldvaluessurvey.org'
-H 'Origin:http://www.worldvaluessurvey.org'
-H 'Referer:http://www.worldvaluessurvey.org/AJDownloadLicense.jsp'
-H 'Upgrade-Insecure-Requests:1'
-H 'User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'"
form_data <-
list(
ulthost = "WVS" ,
CMSID = "" ,
LITITLE = "" ,
LINOMBRE = "fas" ,
LIEMPRESA = "asf" ,
LIEMAIL = "asdf" ,
LIPROJECT = "asfd" ,
LIUSE = "1" ,
LIPURPOSE = "asdf" ,
LIAGREE = "1" ,
DOID = "3996" ,
CndWAVE = "-1" ,
SAID = "-1" ,
AJArchive = "WVS Data Archive" ,
EdFunction = "" ,
DOP = ""
)
getDATA <- (straighten(browserPOST) %>% make_req)[[1]]()
a <- VERB(verb = "POST", url = "http://www.worldvaluessurvey.org/AJDownload.jsp",
httr::add_headers(Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
`Accept-Encoding` = "gzip, deflate", `Accept-Language` = "en-US,en;q=0.8",
`Cache-Control` = "max-age=0", Connection = "keep-alive",
`Content-Length` = "188", Host = "www.worldvaluessurvey.org",
Origin = "http://www.worldvaluessurvey.org", Referer = "http://www.worldvaluessurvey.org/AJDownloadLicense.jsp",
`Upgrade-Insecure-Requests` = "1", `User-Agent` = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"),
httr::set_cookies(`Cookie:ASPSESSIONIDCASQAACD` = "IBLGBFOAEHFILMMJJCFEOEMI",
JSESSIONID = "50DABDEDD0B2FC370C415B4BD1855260", `__atuvc` = "13%7C45",
`__atuvs` = "58224f37d312c42400c"), encode = "form",body=form_data)
根据 the source of the underlying httr::request_perform
,您从 VERB()
获得的对象如下所示:
res <- response(
url = resp$url,
status_code = resp$status_code,
headers = headers,
all_headers = all_headers,
cookies = curl::handle_cookies(handle),
content = resp$content,
date = date,
times = resp$times,
request = req,
handle = handle
)
那么,您对其 headers
或 all_headers
感兴趣(response
is but a structure). If a redirect was involved, all_headers
will have multiple sets of headers as returned by curl::parse_headers()
、headers
始终是最后一组。
这是一个很好的挑战!
问题与R语言无关。如果我们只是尝试 post 一些数据到下载脚本,我们将在任何语言中得到相同的结果。我们必须在这里处理某种安全“模式”。该站点限制用户检索文件 url,并要求他们用数据填写表格以提供这些 links。如果浏览器可以检索这些 link,那么我们也可以通过编写适当的 HTTP 调用来检索。问题是,我们需要确切地知道我们必须进行哪些调用。为了找到这一点,我们需要查看每当有人点击下载时网站所做的个人呼叫。这是我在 302 AJDownload.jsp
POST
成功调用之前发现的几个调用:
我们可以清楚地看到,如果我们查看 AJDocumentation.jsp
源代码,它使用 jQuery $.get
:
进行这些调用
$.get("http://ipinfo.io?token=xxxxxxxxxxxxxx", function (response) {
var geodatos=encodeURIComponent(response.ip+"\t"+response.country+"\t"+response.postal+"\t"+
response.loc+"\t"+response.region+"\t"+response.city+"\t"+
response.org);
$.get("jdsStatJD.jsp?ID="+geodatos+
"&url=http%3A%2F%2Fwww.worldvaluessurvey.org%2FAJDocumentation.jsp&referer=null&cms=Documentation",
function (resp2) {
});
}, "jsonp");
然后,在下面的几个调用中,我们可以看到状态为 302 Moved Temporarily
的成功 POST /AJDownload.jsp
以及其响应 headers 中想要的 Location
:
HTTP/1.1 302 Moved Temporarily
Content-Length: 0
Content-Type: text/html
Location: http://www.worldvaluessurvey.org/wvsdc/CO00001/F00003724-WVS_Longitudinal_1981-2014_stata_dta_v_2015_04_18.zip
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 01 Dec 2016 16:24:37 GMT
所以,这就是本网站的安全机制。它使用 ipinfo.io 来存储有关访问者 IP、位置甚至 ISP 组织的访问者信息,就在用户即将通过单击 link 开始下载之前。接收这些数据的脚本是 /jdsStatJD.jsp
。我没有使用 ipinfo.io,也没有为此服务使用他们的 API 密钥(将其隐藏在我的屏幕截图中),而是创建了一个虚拟的有效数据序列,只是为了验证请求。根本不需要“受保护”文件的 post 表单数据。无需 posting 这些数据即可下载文件。
此外,curlconverter
库不是必需的。我们所要做的就是使用 httr
库进行简单的 GET
和 POST
请求。我想指出的一个重要部分是,为了防止 httr
POST
函数跟随我们上次以 302
状态收到的 Location
header调用,我们需要使用配置设置 config(followlocation = FALSE)
这当然会阻止它跟随 Location
并让我们从 header 中获取 Location
。
输出
我的 R 脚本可以从命令行 运行 并且它可以接受 DOID
参数的数值以获取所需的文件。例如,如果我们要获取文件 WVS_Longitudinal_1981-2014_stata_dta_v_2015_04_18
的 link,那么我们必须将其 DOID
( 即 3724)添加到使用 Rscript
命令调用脚本时脚本的结尾:
Rscript wvs_fetch_downloads.r 3724
[1] "http://www.worldvaluessurvey.org/wvsdc/CO00001/F00003724-WVS_Longitudinal_1981-2014_stata_dta_v_2015_04_18.zip"
我创建了一个 R 函数来获取你想要的每个文件位置,只需传递 DOID
:
getFileById <- function(fileId)
可以去掉命令行参数解析,直接传DOID
使用函数:
#args <- commandArgs(TRUE)
#if(length(args) == 0) {
# print("No file id specified. Use './script.r ####'.")
# quit("no")
#}
#fileId <- args[1]
fileId <- "3724"
# DOID=3843 : WVS_EVS_Integrated_Dictionary_Codebook v_2014_09_22 (Excel)
# DOID=3844 : WVS_Values Surveys Integrated Dictionary_TimeSeries_v_2014-04-25 (Excel)
# DOID=3725 : WVS_Longitudinal_1981-2014_rdata_v_2015_04_18
# DOID=3996 : WVS_Longitudinal_1981-2014_sas_v_2015_04_18
# DOID=3723 : WVS_Longitudinal_1981-2014_spss_v_2015_04_18
# DOID=3724 : WVS_Longitudinal_1981-2014_stata_dta_v_2015_04_18
getFileById(fileId)
最终 R 工作脚本
library(httr)
getFileById <- function(fileId) {
response <- GET(
url = "http://www.worldvaluessurvey.org/AJDocumentation.jsp?CndWAVE=-1",
add_headers(
`Accept` = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
`Accept-Encoding` = "gzip, deflate",
`Accept-Language` = "en-US,en;q=0.8",
`Cache-Control` = "max-age=0",
`Connection` = "keep-alive",
`Host` = "www.worldvaluessurvey.org",
`User-Agent` = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0",
`Content-type` = "application/x-www-form-urlencoded",
`Referer` = "http://www.worldvaluessurvey.org/AJDownloadLicense.jsp",
`Upgrade-Insecure-Requests` = "1"))
set_cookie <- headers(response)$`set-cookie`
cookies <- strsplit(set_cookie, ';')
cookie <- cookies[[1]][1]
response <- GET(
url = "http://www.worldvaluessurvey.org/jdsStatJD.jsp?ID=2.72.48.149%09IT%09undefined%0941.8902%2C12.4923%09Lazio%09Roma%09Orange%20SA%20Telecommunications%20Corporation&url=http%3A%2F%2Fwww.worldvaluessurvey.org%2FAJDocumentation.jsp&referer=null&cms=Documentation",
add_headers(
`Accept` = "*/*",
`Accept-Encoding` = "gzip, deflate",
`Accept-Language` = "en-US,en;q=0.8",
`Cache-Control` = "max-age=0",
`Connection` = "keep-alive",
`X-Requested-With` = "XMLHttpRequest",
`Host` = "www.worldvaluessurvey.org",
`User-Agent` = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0",
`Content-type` = "application/x-www-form-urlencoded",
`Referer` = "http://www.worldvaluessurvey.org/AJDocumentation.jsp?CndWAVE=-1",
`Cookie` = cookie))
post_data <- list(
ulthost = "WVS",
CMSID = "",
CndWAVE = "-1",
SAID = "-1",
DOID = fileId,
AJArchive = "WVS Data Archive",
EdFunction = "",
DOP = "",
PUB = "")
response <- POST(
url = "http://www.worldvaluessurvey.org/AJDownload.jsp",
config(followlocation = FALSE),
add_headers(
`Accept` = "*/*",
`Accept-Encoding` = "gzip, deflate",
`Accept-Language` = "en-US,en;q=0.8",
`Cache-Control` = "max-age=0",
`Connection` = "keep-alive",
`Host` = "www.worldvaluessurvey.org",
`User-Agent` = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0",
`Content-type` = "application/x-www-form-urlencoded",
`Referer` = "http://www.worldvaluessurvey.org/AJDocumentation.jsp?CndWAVE=-1",
`Cookie` = cookie),
body = post_data,
encode = "form")
location <- headers(response)$location
location
}
args <- commandArgs(TRUE)
if(length(args) == 0) {
print("No file id specified. Use './script.r ####'.")
quit("no")
}
fileId <- args[1]
# DOID=3843 : WVS_EVS_Integrated_Dictionary_Codebook v_2014_09_22 (Excel)
# DOID=3844 : WVS_Values Surveys Integrated Dictionary_TimeSeries_v_2014-04-25 (Excel)
# DOID=3725 : WVS_Longitudinal_1981-2014_rdata_v_2015_04_18
# DOID=3996 : WVS_Longitudinal_1981-2014_sas_v_2015_04_18
# DOID=3723 : WVS_Longitudinal_1981-2014_spss_v_2015_04_18
# DOID=3724 : WVS_Longitudinal_1981-2014_stata_dta_v_2015_04_18
getFileById(fileId)
我正在尝试仅使用 R 及其 curl-based 网络抓取库来访问下方屏幕截图中突出显示的 响应 header: 位置 文本。通过访问 http://www.worldvaluessurvey.org/WVSDocumentationWVL.jsp,点击任何数据文件的下载,然后填写协议表格,可以在任何网络浏览器中轻松达到这一点。下载会在 Web 浏览器中自动开始。
我认为获得有效 cookie 的唯一方法是使用 library(curlconverter)
(参见
我在下面粘贴了一些代码,其中包含我玩过的不同 httr 和 curl 转换器代码,但我在这里遗漏了一些东西。同样,唯一的目标是以编程方式确定完全在 R (cross-platform) 中突出显示的文本。
library(curlconverter)
library(httr)
browserPOST <-
"curl 'http://www.worldvaluessurvey.org/AJDownload.jsp'
-H 'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
-H 'Accept-Encoding:gzip, deflate'
-H 'Accept-Language:en-US,en;q=0.8'
-H 'Cache-Control:max-age=0'
--compressed -H 'Connection:keep-alive'
-H 'Content-Length:188'
-H 'Content-Type:application/x-www-form-urlencoded'
-H 'Cookie:ASPSESSIONIDCASQAACD=IBLGBFOAEHFILMMJJCFEOEMI; JSESSIONID=50DABDEDD0B2FC370C415B4BD1855260; __atuvc=13%7C45; __atuvs=58224f37d312c42400c'
-H 'Host:www.worldvaluessurvey.org'
-H 'Origin:http://www.worldvaluessurvey.org'
-H 'Referer:http://www.worldvaluessurvey.org/AJDownloadLicense.jsp'
-H 'Upgrade-Insecure-Requests:1'
-H 'User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'"
form_data <-
list(
ulthost = "WVS" ,
CMSID = "" ,
LITITLE = "" ,
LINOMBRE = "fas" ,
LIEMPRESA = "asf" ,
LIEMAIL = "asdf" ,
LIPROJECT = "asfd" ,
LIUSE = "1" ,
LIPURPOSE = "asdf" ,
LIAGREE = "1" ,
DOID = "3996" ,
CndWAVE = "-1" ,
SAID = "-1" ,
AJArchive = "WVS Data Archive" ,
EdFunction = "" ,
DOP = ""
)
getDATA <- (straighten(browserPOST) %>% make_req)[[1]]()
a <- VERB(verb = "POST", url = "http://www.worldvaluessurvey.org/AJDownload.jsp",
httr::add_headers(Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
`Accept-Encoding` = "gzip, deflate", `Accept-Language` = "en-US,en;q=0.8",
`Cache-Control` = "max-age=0", Connection = "keep-alive",
`Content-Length` = "188", Host = "www.worldvaluessurvey.org",
Origin = "http://www.worldvaluessurvey.org", Referer = "http://www.worldvaluessurvey.org/AJDownloadLicense.jsp",
`Upgrade-Insecure-Requests` = "1", `User-Agent` = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"),
httr::set_cookies(`Cookie:ASPSESSIONIDCASQAACD` = "IBLGBFOAEHFILMMJJCFEOEMI",
JSESSIONID = "50DABDEDD0B2FC370C415B4BD1855260", `__atuvc` = "13%7C45",
`__atuvs` = "58224f37d312c42400c"), encode = "form",body=form_data)
根据 the source of the underlying httr::request_perform
,您从 VERB()
获得的对象如下所示:
res <- response(
url = resp$url,
status_code = resp$status_code,
headers = headers,
all_headers = all_headers,
cookies = curl::handle_cookies(handle),
content = resp$content,
date = date,
times = resp$times,
request = req,
handle = handle
)
那么,您对其 headers
或 all_headers
感兴趣(response
is but a structure). If a redirect was involved, all_headers
will have multiple sets of headers as returned by curl::parse_headers()
、headers
始终是最后一组。
这是一个很好的挑战!
问题与R语言无关。如果我们只是尝试 post 一些数据到下载脚本,我们将在任何语言中得到相同的结果。我们必须在这里处理某种安全“模式”。该站点限制用户检索文件 url,并要求他们用数据填写表格以提供这些 links。如果浏览器可以检索这些 link,那么我们也可以通过编写适当的 HTTP 调用来检索。问题是,我们需要确切地知道我们必须进行哪些调用。为了找到这一点,我们需要查看每当有人点击下载时网站所做的个人呼叫。这是我在 302 AJDownload.jsp
POST
成功调用之前发现的几个调用:
我们可以清楚地看到,如果我们查看 AJDocumentation.jsp
源代码,它使用 jQuery $.get
:
$.get("http://ipinfo.io?token=xxxxxxxxxxxxxx", function (response) {
var geodatos=encodeURIComponent(response.ip+"\t"+response.country+"\t"+response.postal+"\t"+
response.loc+"\t"+response.region+"\t"+response.city+"\t"+
response.org);
$.get("jdsStatJD.jsp?ID="+geodatos+
"&url=http%3A%2F%2Fwww.worldvaluessurvey.org%2FAJDocumentation.jsp&referer=null&cms=Documentation",
function (resp2) {
});
}, "jsonp");
然后,在下面的几个调用中,我们可以看到状态为 302 Moved Temporarily
的成功 POST /AJDownload.jsp
以及其响应 headers 中想要的 Location
:
HTTP/1.1 302 Moved Temporarily
Content-Length: 0
Content-Type: text/html
Location: http://www.worldvaluessurvey.org/wvsdc/CO00001/F00003724-WVS_Longitudinal_1981-2014_stata_dta_v_2015_04_18.zip
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 01 Dec 2016 16:24:37 GMT
所以,这就是本网站的安全机制。它使用 ipinfo.io 来存储有关访问者 IP、位置甚至 ISP 组织的访问者信息,就在用户即将通过单击 link 开始下载之前。接收这些数据的脚本是 /jdsStatJD.jsp
。我没有使用 ipinfo.io,也没有为此服务使用他们的 API 密钥(将其隐藏在我的屏幕截图中),而是创建了一个虚拟的有效数据序列,只是为了验证请求。根本不需要“受保护”文件的 post 表单数据。无需 posting 这些数据即可下载文件。
此外,curlconverter
库不是必需的。我们所要做的就是使用 httr
库进行简单的 GET
和 POST
请求。我想指出的一个重要部分是,为了防止 httr
POST
函数跟随我们上次以 302
状态收到的 Location
header调用,我们需要使用配置设置 config(followlocation = FALSE)
这当然会阻止它跟随 Location
并让我们从 header 中获取 Location
。
输出
我的 R 脚本可以从命令行 运行 并且它可以接受 DOID
参数的数值以获取所需的文件。例如,如果我们要获取文件 WVS_Longitudinal_1981-2014_stata_dta_v_2015_04_18
的 link,那么我们必须将其 DOID
( 即 3724)添加到使用 Rscript
命令调用脚本时脚本的结尾:
Rscript wvs_fetch_downloads.r 3724
[1] "http://www.worldvaluessurvey.org/wvsdc/CO00001/F00003724-WVS_Longitudinal_1981-2014_stata_dta_v_2015_04_18.zip"
我创建了一个 R 函数来获取你想要的每个文件位置,只需传递 DOID
:
getFileById <- function(fileId)
可以去掉命令行参数解析,直接传DOID
使用函数:
#args <- commandArgs(TRUE)
#if(length(args) == 0) {
# print("No file id specified. Use './script.r ####'.")
# quit("no")
#}
#fileId <- args[1]
fileId <- "3724"
# DOID=3843 : WVS_EVS_Integrated_Dictionary_Codebook v_2014_09_22 (Excel)
# DOID=3844 : WVS_Values Surveys Integrated Dictionary_TimeSeries_v_2014-04-25 (Excel)
# DOID=3725 : WVS_Longitudinal_1981-2014_rdata_v_2015_04_18
# DOID=3996 : WVS_Longitudinal_1981-2014_sas_v_2015_04_18
# DOID=3723 : WVS_Longitudinal_1981-2014_spss_v_2015_04_18
# DOID=3724 : WVS_Longitudinal_1981-2014_stata_dta_v_2015_04_18
getFileById(fileId)
最终 R 工作脚本
library(httr)
getFileById <- function(fileId) {
response <- GET(
url = "http://www.worldvaluessurvey.org/AJDocumentation.jsp?CndWAVE=-1",
add_headers(
`Accept` = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
`Accept-Encoding` = "gzip, deflate",
`Accept-Language` = "en-US,en;q=0.8",
`Cache-Control` = "max-age=0",
`Connection` = "keep-alive",
`Host` = "www.worldvaluessurvey.org",
`User-Agent` = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0",
`Content-type` = "application/x-www-form-urlencoded",
`Referer` = "http://www.worldvaluessurvey.org/AJDownloadLicense.jsp",
`Upgrade-Insecure-Requests` = "1"))
set_cookie <- headers(response)$`set-cookie`
cookies <- strsplit(set_cookie, ';')
cookie <- cookies[[1]][1]
response <- GET(
url = "http://www.worldvaluessurvey.org/jdsStatJD.jsp?ID=2.72.48.149%09IT%09undefined%0941.8902%2C12.4923%09Lazio%09Roma%09Orange%20SA%20Telecommunications%20Corporation&url=http%3A%2F%2Fwww.worldvaluessurvey.org%2FAJDocumentation.jsp&referer=null&cms=Documentation",
add_headers(
`Accept` = "*/*",
`Accept-Encoding` = "gzip, deflate",
`Accept-Language` = "en-US,en;q=0.8",
`Cache-Control` = "max-age=0",
`Connection` = "keep-alive",
`X-Requested-With` = "XMLHttpRequest",
`Host` = "www.worldvaluessurvey.org",
`User-Agent` = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0",
`Content-type` = "application/x-www-form-urlencoded",
`Referer` = "http://www.worldvaluessurvey.org/AJDocumentation.jsp?CndWAVE=-1",
`Cookie` = cookie))
post_data <- list(
ulthost = "WVS",
CMSID = "",
CndWAVE = "-1",
SAID = "-1",
DOID = fileId,
AJArchive = "WVS Data Archive",
EdFunction = "",
DOP = "",
PUB = "")
response <- POST(
url = "http://www.worldvaluessurvey.org/AJDownload.jsp",
config(followlocation = FALSE),
add_headers(
`Accept` = "*/*",
`Accept-Encoding` = "gzip, deflate",
`Accept-Language` = "en-US,en;q=0.8",
`Cache-Control` = "max-age=0",
`Connection` = "keep-alive",
`Host` = "www.worldvaluessurvey.org",
`User-Agent` = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0",
`Content-type` = "application/x-www-form-urlencoded",
`Referer` = "http://www.worldvaluessurvey.org/AJDocumentation.jsp?CndWAVE=-1",
`Cookie` = cookie),
body = post_data,
encode = "form")
location <- headers(response)$location
location
}
args <- commandArgs(TRUE)
if(length(args) == 0) {
print("No file id specified. Use './script.r ####'.")
quit("no")
}
fileId <- args[1]
# DOID=3843 : WVS_EVS_Integrated_Dictionary_Codebook v_2014_09_22 (Excel)
# DOID=3844 : WVS_Values Surveys Integrated Dictionary_TimeSeries_v_2014-04-25 (Excel)
# DOID=3725 : WVS_Longitudinal_1981-2014_rdata_v_2015_04_18
# DOID=3996 : WVS_Longitudinal_1981-2014_sas_v_2015_04_18
# DOID=3723 : WVS_Longitudinal_1981-2014_spss_v_2015_04_18
# DOID=3724 : WVS_Longitudinal_1981-2014_stata_dta_v_2015_04_18
getFileById(fileId)