有没有办法从 Julia 的 google 驱动器下载 excel 文件(多张)?
Is there a way to download an excel file (with multiple sheets) from google drive in Julia?
我试过使用“GoogleDrive”库中的 google_download(“URL”, “local_path”),但它似乎只得到第一个 sheet 为 csv 格式。
有什么线索吗?
看起来 GoogleDrive.jl
的胆量只是在做一些 url 操作。
https://github.com/tejasvaidhyadev/GoogleDrive.jl/blob/master/src/GoogleDrive.jl#L26
isg_sheet(url) = occursin("docs.google.com/spreadsheets", url)
isg_drive(url) = occursin("drive.google.com", url)
function sheet_handler(url; format=:csv)
link, expo = splitdir(url)
if startswith(expo, "edit") || expo == ""
url = link * "/export?format=$format"
elseif startswith(expo, "export")
url = replace(url, r"format=([a-zA-Z]*)(.*)"=>SubstitutionString("format=$format\2"))
end
url
end
function google_download(url, localdir)
long_url = unshortlink(url)
if isg_sheet(long_url)
long_url = sheet_handler(long_url)
end
if isg_drive(long_url)
drive_download(long_url, localdir)
else
DataDeps.fetch_http(long_url, localdir)
end
end
Google Sheets API
如果你想做的远不止这些,你需要实际使用 Google Sheet 的 API.
https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get
GET https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}
https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet
在传播sheet ID 上使用 GET(在传播sheet 的 URL 中找到)响应包含所有 sheet 中的 ID点差sheet.
{
"spreadsheetId": string,
"properties": {
object (SpreadsheetProperties)
},
"sheets": [
{
object (Sheet)
}
],
"namedRanges": [
{
object (NamedRange)
}
],
"spreadsheetUrl": string,
"developerMetadata": [
{
object (DeveloperMetadata)
}
]
}
然后你可以提取那些 sheet 的 ID 并用它们做一些事情,比如用 csv 格式对每个 ID 做一个导出请求。
为每个 sheet
手动转义 gid
如果您不想使用 API,只需在点击每个 ID 时从浏览器中复制这些 ID URL,您可以将它们转换为下载 links.
为了让它们正确地通过导出 link,您需要像这样将 gid 传递到导出 url 中:
(这是 GoogleDrive.sheet_handler
的略微修改版本,它包含 sheet gid)
function sheet_handler(url; format=:csv, sheet_gid=0)
link, expo = splitdir(url)
if startswith(expo, "edit") || expo == ""
url = link * "/export?format=$format&gid=$sheet_gid"
elseif startswith(expo, "export")
url = replace(url, r"format=([a-zA-Z]*)(.*)"=>SubstitutionString("format=$format&gid=$sheet_gid\2"))
end
url
end
所以对于我的示例测试 sheet 我有三个 sheet 具有以下 gids
- Sheet1, gid=0
- Sheet2, gid=972467363
- Sheet3, gid=1251741166
所以为了抓住第三个,我这样做了:
DataDeps.fetch_http(sheet_handler(url; format=:csv, sheet_gid=1251741166), ".")
这里是例子运行:
julia> using GoogleDrive
julia> using GoogleDrive.DataDeps
julia> url = read("link.txt", String)
"https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/edit?usp=sharing"
julia> DataDeps.fetch_http(sheet_handler(url; format=:csv, sheet_gid=1251741166), ".")
┌ Info: Downloading
│ source = "https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/export?format=csv&gid=1251741166"
│ dest = "./export?format=csv&gid=1251741166"
│ progress = NaN
│ time_taken = "0.0 s"
│ time_remaining = "NaN s"
│ average_speed = "∞ B/s"
│ downloaded = "404 bytes"
│ remaining = "∞ B"
└ total = "∞ B"
┌ Info: Downloading
│ source = "https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/export?format=csv&gid=1251741166"
│ dest = "./download-test-julia-Sheet3.csv"
│ progress = NaN
│ time_taken = "0.0 s"
│ time_remaining = "NaN s"
│ average_speed = "7.324 KiB/s"
│ downloaded = "15 bytes"
│ remaining = "∞ B"
└ total = "∞ B"
"./download-test-julia-Sheet3.csv"
shell> cat download-test-julia-Sheet3.csv
Data on Sheet 3
下载为包含所有 sheet
的格式
如果您使用 sheet_handler
调用并传入支持多个 sheet 的格式,那么您可以在本地解析和操作输出。例如 xlsx
。我刚才还没有尝试过,但是调用会是这样的:
DataDeps.fetch_http(sheet_handler(url; format=:xlsx), ".")
然后找到您最喜欢的 Julia Excel 图书馆,您就可以开始比赛了。
我试过使用“GoogleDrive”库中的 google_download(“URL”, “local_path”),但它似乎只得到第一个 sheet 为 csv 格式。
有什么线索吗?
看起来 GoogleDrive.jl
的胆量只是在做一些 url 操作。
https://github.com/tejasvaidhyadev/GoogleDrive.jl/blob/master/src/GoogleDrive.jl#L26
isg_sheet(url) = occursin("docs.google.com/spreadsheets", url)
isg_drive(url) = occursin("drive.google.com", url)
function sheet_handler(url; format=:csv)
link, expo = splitdir(url)
if startswith(expo, "edit") || expo == ""
url = link * "/export?format=$format"
elseif startswith(expo, "export")
url = replace(url, r"format=([a-zA-Z]*)(.*)"=>SubstitutionString("format=$format\2"))
end
url
end
function google_download(url, localdir)
long_url = unshortlink(url)
if isg_sheet(long_url)
long_url = sheet_handler(long_url)
end
if isg_drive(long_url)
drive_download(long_url, localdir)
else
DataDeps.fetch_http(long_url, localdir)
end
end
Google Sheets API
如果你想做的远不止这些,你需要实际使用 Google Sheet 的 API.
https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get
GET https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}
https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet
在传播sheet ID 上使用 GET(在传播sheet 的 URL 中找到)响应包含所有 sheet 中的 ID点差sheet.
{
"spreadsheetId": string,
"properties": {
object (SpreadsheetProperties)
},
"sheets": [
{
object (Sheet)
}
],
"namedRanges": [
{
object (NamedRange)
}
],
"spreadsheetUrl": string,
"developerMetadata": [
{
object (DeveloperMetadata)
}
]
}
然后你可以提取那些 sheet 的 ID 并用它们做一些事情,比如用 csv 格式对每个 ID 做一个导出请求。
为每个 sheet
手动转义gid
如果您不想使用 API,只需在点击每个 ID 时从浏览器中复制这些 ID URL,您可以将它们转换为下载 links.
为了让它们正确地通过导出 link,您需要像这样将 gid 传递到导出 url 中:
(这是 GoogleDrive.sheet_handler
的略微修改版本,它包含 sheet gid)
function sheet_handler(url; format=:csv, sheet_gid=0)
link, expo = splitdir(url)
if startswith(expo, "edit") || expo == ""
url = link * "/export?format=$format&gid=$sheet_gid"
elseif startswith(expo, "export")
url = replace(url, r"format=([a-zA-Z]*)(.*)"=>SubstitutionString("format=$format&gid=$sheet_gid\2"))
end
url
end
所以对于我的示例测试 sheet 我有三个 sheet 具有以下 gids
- Sheet1, gid=0
- Sheet2, gid=972467363
- Sheet3, gid=1251741166
所以为了抓住第三个,我这样做了:
DataDeps.fetch_http(sheet_handler(url; format=:csv, sheet_gid=1251741166), ".")
这里是例子运行:
julia> using GoogleDrive
julia> using GoogleDrive.DataDeps
julia> url = read("link.txt", String)
"https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/edit?usp=sharing"
julia> DataDeps.fetch_http(sheet_handler(url; format=:csv, sheet_gid=1251741166), ".")
┌ Info: Downloading
│ source = "https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/export?format=csv&gid=1251741166"
│ dest = "./export?format=csv&gid=1251741166"
│ progress = NaN
│ time_taken = "0.0 s"
│ time_remaining = "NaN s"
│ average_speed = "∞ B/s"
│ downloaded = "404 bytes"
│ remaining = "∞ B"
└ total = "∞ B"
┌ Info: Downloading
│ source = "https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/export?format=csv&gid=1251741166"
│ dest = "./download-test-julia-Sheet3.csv"
│ progress = NaN
│ time_taken = "0.0 s"
│ time_remaining = "NaN s"
│ average_speed = "7.324 KiB/s"
│ downloaded = "15 bytes"
│ remaining = "∞ B"
└ total = "∞ B"
"./download-test-julia-Sheet3.csv"
shell> cat download-test-julia-Sheet3.csv
Data on Sheet 3
下载为包含所有 sheet
的格式如果您使用 sheet_handler
调用并传入支持多个 sheet 的格式,那么您可以在本地解析和操作输出。例如 xlsx
。我刚才还没有尝试过,但是调用会是这样的:
DataDeps.fetch_http(sheet_handler(url; format=:xlsx), ".")
然后找到您最喜欢的 Julia Excel 图书馆,您就可以开始比赛了。