如何从 python 中的 Azure 函数将 xlsx blob 读入 pandas

How to read xlsx blob into pandas from Azure function in python

我正在 azure 函数中从 blob 读取 .xslx 数据。我的代码看起来像这样:

def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):

    # Load in the tech and crm data
    crm_data = pd.read_excel(crmdatablob.read().decode('ISO-8859-1'))
    tech_data = pd.read_excel(techdatablob.read().decode('ISO-8859-1'))
   

问题是当我尝试解码文件时,出现以下错误:

ValueError: Protocol not known: PK...

以及“...”之后的许多奇怪字符。关于如何正确读取这些文件有什么想法吗?

请参考我的代码,好像不需要加decode('ISO-8859-1'):

import logging
import pandas as pd
import azure.functions as func


def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
    logging.info(f"Python blob trigger function processed blob \n"
                 f"Name: {techdatablob.name}\n"
                 f"Blob Size: {techdatablob.length} bytes")

    # Load in the tech and crm data
    crm_data = pd.read_excel(crmdatablob.read())
    logging.info(f"{crm_data}")
    tech_data = pd.read_excel(techdatablob.read())
    logging.info(f"{tech_data}")

注意:您的 function.json 应该如下所示。否则会报错。

{
      "name": "techdatablob",
      "type": "blobTrigger",
      "direction": "in",
      "path": "path1/{name}",
      "connection": "example"
    },
    {
      "name": "crmdatablob",
      "dataType": "binary",
      "type": "blob",
      "direction": "in",
      "path": "path2/data.xlsx",
      "connection": "example"
    },
    {
      "name": "outputblob",
      "type": "blob",
      "direction": "out",
      "path": "path3/out.xlsx",
      "connection": "example"
    }

这与您的 function.json 之间的区别在于您缺少 dataType 属性。

我的测试结果是这样的,好像没什么问题。