Python 尝试使用日语编码调用 wget 时行为异常

Question

我正在创建一个 python 脚本，该脚本使用 bash 对文件中包含的日语单词列表执行 wget。我只会使用 curl 但这有编码问题。使用 wget 它确实下载了 html，但它会将其转储到当前目录中，并带有诗意的标题，例如：

   試%E8%A1%8C%E9%8C%AF誤

我想让它把 html 放到 pretty-sounding 的地方，比如 "output/混合.txt"。它确实创建了这些 pretty-sounding 文件，但其中没有任何内容。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os

with open("words") as f:
    for line in f:
        text = unicode(line, "utf-8")
        os.system("wget \'https://kotobank.jp/word/" + line.strip() + "'> output/" + line.strip() + ".txt")
        #print("wget \'https://kotobank.jp/word/" + line.strip() + "'>> output/out.txt")

而文件"words"如下：

追究
花器
陶磁器
枯渇
風合い
繊維
混合
アボード
受け継い
試行錯誤
硬質

Answer 1

使用 -O file 选项而不是重定向输出：

os.system("wget \'https://kotobank.jp/word/" + line.strip() + "' -O " + line.strip() + ".txt"

有关详细信息，请参阅 wget documentation。

Python 尝试使用日语编码调用 wget 时行为异常

Python acting strange when trying to call wget with encoding for Japanese

python

bash

encoding

wget

cjk