有没有办法从 Julia 的 google 驱动器下载 excel 文件(多张)?

Is there a way to download an excel file (with multiple sheets) from google drive in Julia?

我试过使用“GoogleDrive”库中的 google_download(“URL”, “local_path”),但它似乎只得到第一个 sheet 为 csv 格式。

有什么线索吗?

看起来 GoogleDrive.jl 的胆量只是在做一些 url 操作。

https://github.com/tejasvaidhyadev/GoogleDrive.jl/blob/master/src/GoogleDrive.jl#L26

isg_sheet(url) = occursin("docs.google.com/spreadsheets", url)
isg_drive(url) = occursin("drive.google.com", url)

function sheet_handler(url; format=:csv)
    link, expo = splitdir(url)
    if startswith(expo, "edit") || expo == ""
        url = link * "/export?format=$format"
    elseif startswith(expo, "export")
        url = replace(url, r"format=([a-zA-Z]*)(.*)"=>SubstitutionString("format=$format\2"))
    end
    url
end

function google_download(url, localdir)
    long_url = unshortlink(url)
    if isg_sheet(long_url)

        long_url = sheet_handler(long_url)
    end

    if isg_drive(long_url)
        drive_download(long_url, localdir)
    else
        DataDeps.fetch_http(long_url, localdir)
    end
end

Google Sheets API

如果你想做的远不止这些,你需要实际使用 Google Sheet 的 API.

https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get

GET https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}

https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet

在传播sheet ID 上使用 GET(在传播sheet 的 URL 中找到)响应包含所有 sheet 中的 ID点差sheet.

{
  "spreadsheetId": string,
  "properties": {
    object (SpreadsheetProperties)
  },
  "sheets": [
    {
      object (Sheet)
    }
  ],
  "namedRanges": [
    {
      object (NamedRange)
    }
  ],
  "spreadsheetUrl": string,
  "developerMetadata": [
    {
      object (DeveloperMetadata)
    }
  ]
}

然后你可以提取那些 sheet 的 ID 并用它们做一些事情,比如用 csv 格式对每个 ID 做一个导出请求。

为每个 sheet

手动转义 gid

如果您不想使用 API,只需在点击每个 ID 时从浏览器中复制这些 ID URL,您可以将它们转换为下载 links.

为了让它们正确地通过导出 link,您需要像这样将 gid 传递到导出 url 中:

(这是 GoogleDrive.sheet_handler 的略微修改版本,它包含 sheet gid)

function sheet_handler(url; format=:csv, sheet_gid=0)
    link, expo = splitdir(url)
    if startswith(expo, "edit") || expo == ""
        url = link * "/export?format=$format&gid=$sheet_gid"
    elseif startswith(expo, "export")
        url = replace(url, r"format=([a-zA-Z]*)(.*)"=>SubstitutionString("format=$format&gid=$sheet_gid\2"))
    end
    url
end

所以对于我的示例测试 sheet 我有三个 sheet 具有以下 gids

  • Sheet1, gid=0
  • Sheet2, gid=972467363
  • Sheet3, gid=1251741166

所以为了抓住第三个,我这样做了:

DataDeps.fetch_http(sheet_handler(url; format=:csv, sheet_gid=1251741166), ".")

这里是例子运行:

julia> using GoogleDrive

julia> using GoogleDrive.DataDeps

julia> url = read("link.txt", String)
"https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/edit?usp=sharing"

julia> DataDeps.fetch_http(sheet_handler(url; format=:csv, sheet_gid=1251741166), ".")
┌ Info: Downloading
│   source = "https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/export?format=csv&gid=1251741166"
│   dest = "./export?format=csv&gid=1251741166"
│   progress = NaN
│   time_taken = "0.0 s"
│   time_remaining = "NaN s"
│   average_speed = "∞ B/s"
│   downloaded = "404 bytes"
│   remaining = "∞ B"
└   total = "∞ B"
┌ Info: Downloading
│   source = "https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/export?format=csv&gid=1251741166"
│   dest = "./download-test-julia-Sheet3.csv"
│   progress = NaN
│   time_taken = "0.0 s"
│   time_remaining = "NaN s"
│   average_speed = "7.324 KiB/s"
│   downloaded = "15 bytes"
│   remaining = "∞ B"
└   total = "∞ B"
"./download-test-julia-Sheet3.csv"

shell> cat download-test-julia-Sheet3.csv
Data on Sheet 3

下载为包含所有 sheet

的格式

如果您使用 sheet_handler 调用并传入支持多个 sheet 的格式,那么您可以在本地解析和操作输出。例如 xlsx。我刚才还没有尝试过,但是调用会是这样的:

DataDeps.fetch_http(sheet_handler(url; format=:xlsx), ".")

然后找到您最喜欢的 Julia Excel 图书馆,您就可以开始比赛了。