如何从 python 中的 Azure 函数将 xlsx blob 读入 pandas
How to read xlsx blob into pandas from Azure function in python
我正在 azure 函数中从 blob 读取 .xslx 数据。我的代码看起来像这样:
def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
# Load in the tech and crm data
crm_data = pd.read_excel(crmdatablob.read().decode('ISO-8859-1'))
tech_data = pd.read_excel(techdatablob.read().decode('ISO-8859-1'))
问题是当我尝试解码文件时,出现以下错误:
ValueError: Protocol not known: PK...
以及“...”之后的许多奇怪字符。关于如何正确读取这些文件有什么想法吗?
请参考我的代码,好像不需要加decode('ISO-8859-1')
:
import logging
import pandas as pd
import azure.functions as func
def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {techdatablob.name}\n"
f"Blob Size: {techdatablob.length} bytes")
# Load in the tech and crm data
crm_data = pd.read_excel(crmdatablob.read())
logging.info(f"{crm_data}")
tech_data = pd.read_excel(techdatablob.read())
logging.info(f"{tech_data}")
注意:您的 function.json
应该如下所示。否则会报错。
{
"name": "techdatablob",
"type": "blobTrigger",
"direction": "in",
"path": "path1/{name}",
"connection": "example"
},
{
"name": "crmdatablob",
"dataType": "binary",
"type": "blob",
"direction": "in",
"path": "path2/data.xlsx",
"connection": "example"
},
{
"name": "outputblob",
"type": "blob",
"direction": "out",
"path": "path3/out.xlsx",
"connection": "example"
}
这与您的 function.json
之间的区别在于您缺少 dataType
属性。
我的测试结果是这样的,好像没什么问题。
我正在 azure 函数中从 blob 读取 .xslx 数据。我的代码看起来像这样:
def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
# Load in the tech and crm data
crm_data = pd.read_excel(crmdatablob.read().decode('ISO-8859-1'))
tech_data = pd.read_excel(techdatablob.read().decode('ISO-8859-1'))
问题是当我尝试解码文件时,出现以下错误:
ValueError: Protocol not known: PK...
以及“...”之后的许多奇怪字符。关于如何正确读取这些文件有什么想法吗?
请参考我的代码,好像不需要加decode('ISO-8859-1')
:
import logging
import pandas as pd
import azure.functions as func
def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {techdatablob.name}\n"
f"Blob Size: {techdatablob.length} bytes")
# Load in the tech and crm data
crm_data = pd.read_excel(crmdatablob.read())
logging.info(f"{crm_data}")
tech_data = pd.read_excel(techdatablob.read())
logging.info(f"{tech_data}")
注意:您的 function.json
应该如下所示。否则会报错。
{
"name": "techdatablob",
"type": "blobTrigger",
"direction": "in",
"path": "path1/{name}",
"connection": "example"
},
{
"name": "crmdatablob",
"dataType": "binary",
"type": "blob",
"direction": "in",
"path": "path2/data.xlsx",
"connection": "example"
},
{
"name": "outputblob",
"type": "blob",
"direction": "out",
"path": "path3/out.xlsx",
"connection": "example"
}
这与您的 function.json
之间的区别在于您缺少 dataType
属性。
我的测试结果是这样的,好像没什么问题。