使用 URL 打开多个 Json 文件并使用 Python 下载每个文件中包含的文件

Question

我们将在一个单独的目录中收到多达 10k JSON 个文件，这些文件必须被解析并转换为单独的 .csv 文件。然后每个中URL处的文件必须下载到另一个目录。我计划在 Mac 上的 Automator 中执行此操作并调用 Python 脚本来下载文件。我已经将 shell 脚本的一部分转换为 CSV，但不知道从哪里开始 python 下载 URLs.

这是我目前对 Automator 的了解：

    - Shell = /bin/bash
    - Pass input = as arguments
    - Code = as follows


#!/bin/bash

/usr/bin/perl -CSDA -w <<'EOF' - "$@" > ~/Desktop/out_"$(date '+%F_%H%M%S')".csv
use strict;
use JSON::Syck;
$JSON::Syck::ImplicitUnicode = 1;

# json node paths to extract
 my @paths = ('/upload_date', '/title', '/webpage_url');

for (@ARGV) {
    my $json;
    open(IN, "<", $_) or die "$!";
    {
        local $/; 
        $json = <IN>;
    }
    close IN;
    my $data = JSON::Syck::Load($json) or next;
    my @values = map { &json_node_at_path($data, $_) } @paths;
    {
        #   output CSV spec
        #   - field separator = SPACE
        #   - record separator = LF
        #   - every field is quoted
        local $, = qq( );
        local $\ = qq(\n);
        print map { s/"/""/og; q(").$_.q("); } @values;
    }
}

sub json_node_at_path ($$) {
    #   $ : (reference) json object
    #   $ : (string) node path
    # 
    #   E.g. Given node path = '/abc/0/def', it returns either
    #       $obj->{'abc'}->[0]->{'def'}   if $obj->{'abc'} is ARRAY; or
    #       $obj->{'abc'}->{'0'}->{'def'} if $obj->{'abc'} is HASH.
    my ($obj, $path) = @_;  
    my $r = $obj;
    for ( map { /(^.+$)/ } split /\//, $path ) {
        if ( /^[0-9]+$/ && ref($r) eq 'ARRAY' ) {
        $r = $r->[$_];
        }
        else {
             $r = $r->{$_};
        }
    }
    return $r;
}
EOF

Answer 1

我不熟悉 Automator，所以也许其他人可以解决这个问题，但就 Python 部分而言，从 url 下载文件相当简单。它会是这样的：

import requests

r = requests.get(url) # assuming you don't need to do any authentication
with open("my_file_name", "wb") as f:
    f.write(r.content)

Requests 是处理 http(s) 的一个很棒的库，因为 Response 的内容属性是一个字节字符串，我们可以打开一个文件来写入字节（“wb”）并直接写入。这也适用于可执行负载，因此请确保您知道您正在下载什么。如果您还没有安装请求运行 pip install requests 或 Mac 等价物。

如果您倾向于在 python 中完成整个过程，我建议您查看 json and csv 包。这两个都是标准库的一部分，并为您正在做的事情提供高级接口

编辑： 这是一个示例，如果您在这样的文件上使用 json 模块：

[
  {
  "url": <some url>,
  "name": <the name of the file>
  }
]

您的 Python 代码可能与此类似：

import requests
import json

with open("my_json_file.json", "r") as json_f:
    for item in json.load(json_f)
        r = requests.get(item["url"])
        with open(item["name"], "wb") as f:
            f.write(r.content)

使用 URL 打开多个 Json 文件并使用 Python 下载每个文件中包含的文件

Open multiple Json files with URL's and download the files contained in each using Python

python

automator

python-3.x