Azure DataLakeServiceClient Python - 如何追加,如何设置偏移量和刷新长度?
Azure DataLakeServiceClient Python - How to append, How to set Offset and Flush Length?
我想使用 DataLakeServiceClient(azure.storage.filedatalake 包)创建并重复附加到 csv 文件。 Inital create/write 的工作原理如下。
from azure.storage.filedatalake import DataLakeServiceClient
datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
myfilesystem = "ContainerName"
myfolder = "FolderName"
myfile = "FileName.csv"
file_system_client = datalake_service_client.get_file_system_client(myfilesystem)
try:
directory_client = file_system_client.create_directory(myfolder)
except Exception as e:
directory_client = file_system_client.get_directory_client(myfolder)
file_client = directory_client.create_file(myfile)
data = """Test1"""
file_client.append_data(data, offset=0, length=len(data))
file_client.flush_data(len(data))
假设下一个append是data = """Test2"", 如何设置offset flush_data?
谢谢。
首先,您使用的是directory_client.create_file(myfile)
,每次都会创建新文件。所以你的代码永远不会附加任何内容。
其次,需要添加一个判断条件,判断是否存在,如果存在,则使用get_file_client方法。如果不存在,则使用 create_file 方法。总代码如下:(我这边是用.txt文件来测试的。)
from azure.storage.filedatalake import DataLakeServiceClient
connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
myfilesystem = "test"
myfolder = "test"
myfile = "FileName.txt"
file_system_client = datalake_service_client.get_file_system_client(myfilesystem)
directory_client = file_system_client.create_directory(myfolder)
directory_client = file_system_client.get_directory_client(myfolder)
print("11111")
try:
file_client = directory_client.get_file_client(myfile)
file_client.get_file_properties().size
data = "Test2"
print("length of data is "+str(len(data)))
print("This is a test123")
filesize_previous = file_client.get_file_properties().size
print("length of currentfile is "+str(filesize_previous))
file_client.append_data(data, offset=filesize_previous, length=len(data))
file_client.flush_data(filesize_previous+len(data))
except:
file_client = directory_client.create_file(myfile)
data = "Test2"
print("length of data is "+str(len(data)))
print("This is a test")
filesize_previous = 0
print("length of currentfile is "+str(filesize_previous))
file_client.append_data(data, offset=filesize_previous, length=len(data))
file_client.flush_data(filesize_previous+len(data))
我这边没问题,请你这边试试(以上只是举例,你可以设计的更好更精简)
我想使用 DataLakeServiceClient(azure.storage.filedatalake 包)创建并重复附加到 csv 文件。 Inital create/write 的工作原理如下。
from azure.storage.filedatalake import DataLakeServiceClient
datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
myfilesystem = "ContainerName"
myfolder = "FolderName"
myfile = "FileName.csv"
file_system_client = datalake_service_client.get_file_system_client(myfilesystem)
try:
directory_client = file_system_client.create_directory(myfolder)
except Exception as e:
directory_client = file_system_client.get_directory_client(myfolder)
file_client = directory_client.create_file(myfile)
data = """Test1"""
file_client.append_data(data, offset=0, length=len(data))
file_client.flush_data(len(data))
假设下一个append是data = """Test2"", 如何设置offset flush_data?
谢谢。
首先,您使用的是directory_client.create_file(myfile)
,每次都会创建新文件。所以你的代码永远不会附加任何内容。
其次,需要添加一个判断条件,判断是否存在,如果存在,则使用get_file_client方法。如果不存在,则使用 create_file 方法。总代码如下:(我这边是用.txt文件来测试的。)
from azure.storage.filedatalake import DataLakeServiceClient
connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
myfilesystem = "test"
myfolder = "test"
myfile = "FileName.txt"
file_system_client = datalake_service_client.get_file_system_client(myfilesystem)
directory_client = file_system_client.create_directory(myfolder)
directory_client = file_system_client.get_directory_client(myfolder)
print("11111")
try:
file_client = directory_client.get_file_client(myfile)
file_client.get_file_properties().size
data = "Test2"
print("length of data is "+str(len(data)))
print("This is a test123")
filesize_previous = file_client.get_file_properties().size
print("length of currentfile is "+str(filesize_previous))
file_client.append_data(data, offset=filesize_previous, length=len(data))
file_client.flush_data(filesize_previous+len(data))
except:
file_client = directory_client.create_file(myfile)
data = "Test2"
print("length of data is "+str(len(data)))
print("This is a test")
filesize_previous = 0
print("length of currentfile is "+str(filesize_previous))
file_client.append_data(data, offset=filesize_previous, length=len(data))
file_client.flush_data(filesize_previous+len(data))
我这边没问题,请你这边试试(以上只是举例,你可以设计的更好更精简)