为什么我的图片下载写入python后就损坏了?

Why are my pictures corrupted after downloading and writing them in python?

前言

这是我在 Whosebug 上的第一个 post,所以如果我在某个地方搞砸了,我深表歉意。我在互联网和 Whosebug 上大量搜索以寻找解决我的问题的方法,但我找不到任何东西。

情况

我正在做的是用我的 raspberry pi 创建一个数码相框,它还会自动从我妻子的 facebook 页面下载图片。幸运的是,我找到了从事类似工作的人:

https://github.com/samuelclay/Raspberry-Pi-Photo-Frame

一个月前,这位先生添加了 download_facebook.py 脚本。这就是我需要的!所以几天前我开始研究这个脚本,首先让它在我的 windows 环境中工作(在我把它扔到 pi 上之前)。不幸的是,没有特定于该脚本的文档,而且我缺乏 python 经验。

根据 from urllib import urlopen 的说法,我可以假设这个脚本是为 Python 2.x 编写的。这是因为 Python 3.x 现在是 from urlib import request

所以我安装了 Python 2.7.9 解释器,与尝试使用 Python 3.4.3 解释器时相比,我遇到的问题更少。

问题

我已经得到了从facebook账户下载图片的脚本;但是,图片已损坏。

问题图片如下:http://imgur.com/a/3u7cG

现在,我最初使用的是 Python 3.4.3,我的方法 urlrequest(url) 有问题(请参阅post) 底部的代码以及它如何处理图像数据。我尝试使用不同的格式解码,例如 utf-8 和 utf-16,但根据内容 headers,它显示 utf-8 格式(我认为)。

结论

我不太确定问题出在下载图像还是将图像写入文件

如果有人能帮助我解决这个问题,我将永远感激不已!也让我知道我将来可以做些什么来改进我的 post。

提前致谢。

代码

from urllib import urlopen
from json import loads
from sys import argv
import dateutil.parser as dateparser
import logging

# plugin your username and access_token (Token can be get and
# modified in the Explorer's Get Access Token button):
# https://graph.facebook.com/USER_NAME/photos?type=uploaded&fields=source&access_token=ACCESS_TOKEN_HERE
FACEBOOK_USER_ID = "**USER ID REMOVED"
FACEBOOK_ACCESS_TOKEN = "** TOKEN REMOVED - GET YOUR OWN **"

def get_logger(label='lvm_cli', level='INFO'):
    """
    Return a generic logger.
    """
    format = '%(asctime)s - %(levelname)s - %(message)s'
    logging.basicConfig(format=format)
    logger = logging.getLogger(label)
    logger.setLevel(getattr(logging, level))
    return logger

def urlrequest(url):
    """
    Make a url request
    """
    req = urlopen(url)
    data = req.read()
    return data

def get_json(url):
    """
    Make a url request and return as a JSON object
    """
    res = urlrequest(url)
    data = loads(res)
    return data

def get_next(data):
    """
    Get next element from facebook JSON response,
    or return None if no next present.
    """
    try:
        return data['paging']['next']
    except KeyError:
        return None

def get_images(data):
    """
    Get all images from facebook JSON response,
    or return None if no data present.
    """
    try:
        return data['data']
    except KeyError:
        return []

def get_all_images(url):
    """
    Get all images using recursion.
    """
    data = get_json(url)
    images = get_images(data)
    next = get_next(data)

    if not next:
        return images
    else:
        return images + get_all_images(next)

def get_url(userid, access_token):
    """
    Generates a useable facebook graph API url
    """
    root = 'https://graph.facebook.com/'
    endpoint = '%s/photos?type=uploaded&fields=source,updated_time&access_token=%s' % \
                (userid, access_token)
    return '%s%s' % (root, endpoint)

def download_file(url, filename):
    """
    Write image to a file.
    """
    data = urlrequest(url)
    path = 'C:/photos/%s' % filename
    f = open(path, 'w')
    f.write(data)
    f.close()

def create_time_stamp(timestring):
    """
    Creates a pretty string from time
    """
    date = dateparser.parse(timestring)
    return date.strftime('%Y-%m-%d-%H-%M-%S')

def download(userid, access_token):
    """
    Download all images to current directory.
    """
    logger = get_logger()
    url = get_url(userid, access_token)
    logger.info('Requesting image direct link, please wait..')
    images = get_all_images(url)

    for image in images:
        logger.info('Downloading %s' % image['source'])
        filename = '%s.jpg' % create_time_stamp(image['created_time'])
        download_file(image['source'], filename)

if __name__ == '__main__':
    download(FACEBOOK_USER_ID, FACEBOOK_ACCESS_TOKEN)

Alastair McCormack 发布了一些有用的东西!

他说打开文件写入时尝试设置二进制模式:f = open(path, 'wb')

现在正在成功正确下载图像。有谁知道为什么会这样?

从评论中回答@Alastair 的解决方案为何有效的问题:

f = open(path, 'wb')

来自https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files

On Windows, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. On Unix, it doesn’t hurt to append a 'b' to the mode, so you can use it platform-independently for all binary files.

(我在 Mac,这解释了为什么我没有重现问题。)