我正在嘗試在管道中的 Azure 數據工廠中運行我的以下腳本。我的 Python 代碼從 Blob 存儲中檢索 2 個 CSV 文件,並根據密鑰將它們合併為一個文件,然後將其上傳到數據湖存儲。我嘗試過使用功能應用程序塊,它給了我 InternalServerError,我還嘗試了運行沒有錯誤的 Web 活動。問題是當我運行管道時沒有創建文件,即使管道運行成功(使用 Web 塊)。當我調用 main 函數並在數據湖存儲中創建文件時,該函數也會在本地運行。我在 VS Code 中也嘗試過 http 觸發器和持久函數,但沒有一個在 Azure 中創建了“merged.csv”文件。
我的 Python 腳本(init .py):
import pandas as pd
import logging
from azure.storage.blob import BlobServiceClient
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
STORAGEACCOUNTURL= 'https://storage.blob.core.windows.net/'
STORAGEACCOUNTKEY= '****'
LOCALFILENAME= ['file1.csv', 'file2.csv']
CONTAINERNAME= 'inputblob'
file1 = pd.DataFrame()
file2 = pd.DataFrame()
#download from blob
blob_service_client_instance = BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)
for i in LOCALFILENAME:
with open(i, "wb") as my_blobs:
blob_client_instance = blob_service_client_instance.get_blob_client(container=CONTAINERNAME, blob=i, snapshot=None)
blob_data = blob_client_instance.download_blob()
blob_data.readinto(my_blobs)
if i == 'file1.csv':
file1 = pd.read_csv(i)
if i == 'file2.csv':
file2 = pd.read_csv(i)
# load
summary = pd.merge(left=file1, right=file2, on='key', how='inner')
summary.to_csv()
global service_client
service_client = DataLakeServiceClient(account_url="https://storage.dfs.core.windows.net/", credential='****')
file_system_client = service_client.get_file_system_client(file_system="outputdatalake")
directory_client = file_system_client.get_directory_client("functionapp")
file_client = directory_client.create_file("merged.csv")
file_contents = summary.to_csv()
file_client.upload_data(file_contents, overwrite=True)
return("This HTTP triggered function executed successfully.")
我的 JSON 文件(function.json):
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
}
]
}
我能想到的原因有 2 個,這可能是導致您出現問題的原因。
A - 檢查您的 requirements.txt。你所有的 python 庫都應該在那裡。它應該是這樣的。
azure-functions
pandas==1.3.4
azure-storage-blob==12.9.0
azure-storage-file-datalake==12.5.0
B - 接下來,您似乎正在將文件寫入 Functions 工作內存中。這是不允許的,完全沒有必要的。這將解釋為什麼它可以在您的本地計算機中運行,而不能在 Azure 中運行。你可以在不這樣做的情況下實現你想要的。請參閱下面的代碼部分,它應該符合您的目的。我們將 csv 從 blob 加載到數據幀的方式略有變化。
import pandas as pd
import logging
from azure.storage.blob import BlobServiceClient
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func
from io import StringIO
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
STORAGEACCOUNTURL= 'https://storage.blob.core.windows.net/'
STORAGEACCOUNTKEY= 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'
LOCALFILENAME= ['file1.csv', 'file2.csv']
CONTAINERNAME= 'inputblob'
file1 = pd.DataFrame()
file2 = pd.DataFrame()
#download from blob
blob_service_client_instance = BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)
for i in LOCALFILENAME:
blob_client_instance = blob_service_client_instance.get_blob_client(container=CONTAINERNAME, blob=i, snapshot=None)
blob_data = blob_client_instance.download_blob()
if i == 'file1.csv':
file1 = pd.read_csv(StringIO(blob_data.content_as_text()))
if i == 'file2.csv':
file2 = pd.read_csv(StringIO(blob_data.content_as_text()))
# load
summary = pd.merge(left=file1, right=file2, on='key', how='inner')
summary.to_csv()
service_client = DataLakeServiceClient(account_url="https://storage.dfs.core.windows.net/", credential=STORAGEACCOUNTKEY)
file_system_client = service_client.get_file_system_client(file_system="outputdatalake")
directory_client = file_system_client.get_directory_client("my-directory")
file_client = directory_client.create_file("merged.csv")
file_contents = summary.to_csv()
file_client.upload_data(file_contents, overwrite=True)
return("This HTTP triggered function executed successfully.")
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句