Celery、Django 和 S3 默认存储导致文件读取问题

Celery, Django and S3 Default Storage causes file reading issues

我有一个过程,网络服务器通过该过程注入一个文件(通过上传),使用 default_storages 将该文件保存到 S3,然后为该文件创建一个任务以供后端通过 celery 处理。

def upload_file(request):
  path = 'uploads/my_file.csv'
  with default_storage.open(path, 'w') as file:
    file.write(request.FILES['upload'].read().decode('utf-8-sig'))
  process_upload.delay(path)
  return HttpResponse()

@shared_task
def process_upload(path):
  with default_storage.open(path, 'r') as file:
    dialect = csv.Sniffer().sniff(file.read(1024]))
    file.seek(0)
    reader = csv.DictReader(content, dialect=dialect)
    for row in reader:
      # etc...

问题是,虽然我在写入和读取时明确使用文本模式,但当我读取文件时,它以 bytes 形式出现,这是 csv 库无法处理的。有没有办法在不读取和解码内存中的整个文件的情况下解决这个问题?

您似乎需要将 b(二进制模式)添加到 open 调用中:

来自docs

'b' appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.

@shared_task
def process_upload(path):
  with default_storage.open(path, 'rb') as file:
      # Rest of your code goes here.