通过 Python tcp 套接字传输标题为阿拉伯语的文件时出现 ValueError
ValueError when transmitting a file titled in Arabic over a Python tcp socket
我正在构建一个套接字程序以在两台计算机之间传输文件。带有英文标题的文件已成功传输,但是当我尝试发送阿拉伯文标题文件(例如وثيق.docx)时,我得到一长串 ValueErrors,开头为:
invalid literal for int() with base 2: b'.docx000000000000000000010000001'
invalid literal for int() with base 2: b'10001PK\x03\x04\x14\x00\x08\x08\x08\x00\xe0'
我的代码是:
服务器:
import socket
serversock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = 'localhost'
port = 9000
serversock.bind((host,port))
filename = ""
serversock.listen(10)
print("Waiting for a connection.....")
clientsocket, addr = serversock.accept()
print("Got a connection from %s" % str(addr))
while True:
try:
size = clientsocket.recv(16) # Note that you limit your filename length to 255 bytes.
if not size:
clientsocket, addr = serversock.accept()
print("Got a connection from %s" % str(addr))
continue
size = int(size, 2)
print('SIZE', size)
filename = clientsocket.recv(size)
print('filename', filename)
filesize = clientsocket.recv(32)
print('FILESIZE', filesize, 'TYPE', type(filesize))
filesize = int(filesize, 2) ##########
file_to_write = open(filename, 'wb')
chunksize = 4096
while filesize > 0:
if filesize < chunksize:
chunksize = filesize
data = clientsocket.recv(chunksize)
file_to_write.write(data)
filesize -= len(data)
file_to_write.close()
print('File (%s) received successfully' % filename.decode('utf-8'))
except ValueError as verr:
print(verr)
#continue
except FileNotFoundError:
print('FileNotFoundError')
#continue
serversock.close()
客户:
进口
socket
import os
from file_walker import files_to_transmit
def transmit(host, port):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
s.connect((host, port))
directory = files_to_transmit()
for file in directory:
filename = file
size = len(filename.split('/')[-1]) # split() to get bare file name to transmit to server
size = bin(size)[2:].zfill(16) # encode filename size as 16 bit binary
s.send(size.encode('utf-8'))
s.send(filename.split('/')[-1].encode('utf-8')) # split() to get bare file name
filename = file
filesize = os.path.getsize(filename)
filesize = bin(filesize)[2:].zfill(32) # encode filesize as 32 bit binary
s.send(filesize.encode('utf-8'))
file_to_send = open(filename, 'rb')
l = file_to_send.read()
s.sendall(l)
file_to_send.close()
print('File Sent')
s.close()
except ConnectionRefusedError:
print('ConnectionRefusedError: Server may not be running')
except ValueError as e:
print(e)
transmit('localhost', 9000)
这里有什么问题?请帮忙。
根据我的经验,遇到了类似的问题。
我的建议是使用一种算法来编码和解码像 base64 这样的文件。
在你程序的某个地方,你正在读取文件以将他保存在内存中并在第二步发送。好吧,而不是只读编码也
import base64
with open("yourfile.ext", "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
显然在另一边你需要解码并写入磁盘。您还可以对文件的标题进行编码和解码以避免 ValueError
您发送标题的unicode 字符大小,然后尝试以utf-8 编码发送。例如你的例子:
title = 'وثيقة.docx'
print(len(title), len(title.encode('utf8')))
给予
10 15
对方只会用10个字节作为文件名,剩下的5个字节作为文件大小的开始。并且会因为 .docx
不是二进制数的开头而窒息。会发生什么...
修复很容易,在计算长度之前构建字节字符串:
...
for file in directory:
filename = file.split('/')[-1].encode('utf_8') # split() to get bare file name
size = len(filename)
size = bin(size)[2:].zfill(16) # encode filename size as 16 bit binary
s.send(size.encode('utf-8'))
s.send(filename)
...
我正在构建一个套接字程序以在两台计算机之间传输文件。带有英文标题的文件已成功传输,但是当我尝试发送阿拉伯文标题文件(例如وثيق.docx)时,我得到一长串 ValueErrors,开头为:
invalid literal for int() with base 2: b'.docx000000000000000000010000001'
invalid literal for int() with base 2: b'10001PK\x03\x04\x14\x00\x08\x08\x08\x00\xe0'
我的代码是: 服务器:
import socket
serversock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = 'localhost'
port = 9000
serversock.bind((host,port))
filename = ""
serversock.listen(10)
print("Waiting for a connection.....")
clientsocket, addr = serversock.accept()
print("Got a connection from %s" % str(addr))
while True:
try:
size = clientsocket.recv(16) # Note that you limit your filename length to 255 bytes.
if not size:
clientsocket, addr = serversock.accept()
print("Got a connection from %s" % str(addr))
continue
size = int(size, 2)
print('SIZE', size)
filename = clientsocket.recv(size)
print('filename', filename)
filesize = clientsocket.recv(32)
print('FILESIZE', filesize, 'TYPE', type(filesize))
filesize = int(filesize, 2) ##########
file_to_write = open(filename, 'wb')
chunksize = 4096
while filesize > 0:
if filesize < chunksize:
chunksize = filesize
data = clientsocket.recv(chunksize)
file_to_write.write(data)
filesize -= len(data)
file_to_write.close()
print('File (%s) received successfully' % filename.decode('utf-8'))
except ValueError as verr:
print(verr)
#continue
except FileNotFoundError:
print('FileNotFoundError')
#continue
serversock.close()
客户:
进口
socket
import os
from file_walker import files_to_transmit
def transmit(host, port):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
s.connect((host, port))
directory = files_to_transmit()
for file in directory:
filename = file
size = len(filename.split('/')[-1]) # split() to get bare file name to transmit to server
size = bin(size)[2:].zfill(16) # encode filename size as 16 bit binary
s.send(size.encode('utf-8'))
s.send(filename.split('/')[-1].encode('utf-8')) # split() to get bare file name
filename = file
filesize = os.path.getsize(filename)
filesize = bin(filesize)[2:].zfill(32) # encode filesize as 32 bit binary
s.send(filesize.encode('utf-8'))
file_to_send = open(filename, 'rb')
l = file_to_send.read()
s.sendall(l)
file_to_send.close()
print('File Sent')
s.close()
except ConnectionRefusedError:
print('ConnectionRefusedError: Server may not be running')
except ValueError as e:
print(e)
transmit('localhost', 9000)
这里有什么问题?请帮忙。
根据我的经验,遇到了类似的问题。 我的建议是使用一种算法来编码和解码像 base64 这样的文件。
在你程序的某个地方,你正在读取文件以将他保存在内存中并在第二步发送。好吧,而不是只读编码也
import base64
with open("yourfile.ext", "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
显然在另一边你需要解码并写入磁盘。您还可以对文件的标题进行编码和解码以避免 ValueError
您发送标题的unicode 字符大小,然后尝试以utf-8 编码发送。例如你的例子:
title = 'وثيقة.docx'
print(len(title), len(title.encode('utf8')))
给予
10 15
对方只会用10个字节作为文件名,剩下的5个字节作为文件大小的开始。并且会因为 .docx
不是二进制数的开头而窒息。会发生什么...
修复很容易,在计算长度之前构建字节字符串:
...
for file in directory:
filename = file.split('/')[-1].encode('utf_8') # split() to get bare file name
size = len(filename)
size = bin(size)[2:].zfill(16) # encode filename size as 16 bit binary
s.send(size.encode('utf-8'))
s.send(filename)
...