熵和加密文件
Entropy and encrypted files
我有一个问题需要帮助。
通常,在加密文件中,文件大小比未加密文件大。熵在这些时候会减少吗?我知道熵的计算方式如下 python:
print('myfile.text'.format)
with open(r"C:\Users\Parisa\Desktop\myfile.txt", 'rb') as f:
byteArr = list(f.read())
fileSize = len(byteArr)
print
print('File size in bytes: {:,d}'.format(fileSize))
# calculate the frequency of each byte value in the file
print('Calculating Shannon entropy of file. Please wait...')
freqList = []
for b in range(256):
ctr = 0
for byte in byteArr:
if byte == b:
ctr += 1
freqList.append(float(ctr) / fileSize)
# Shannon entropy
ent = 0.0
for freq in freqList:
if freq > 0:
ent = ent + freq * math.log(freq, 2)
ent = -ent
print('Shannon entropy: {}'.format(ent))[![enter image description here][1]][1]
stack.imgur.com/jMhkc.png
这是加密后的
这是在加密之前
加密文件的熵最大化。熵是文件“随机性”的度量。
这是文本文件通常的样子,根据文件内容绘制熵:
这是加密文件的样子(忽略低熵的微小尖峰,它是由某些 header 信息或等效信息引起的):
加密文件的熵接近 1。如果不是这样,它们就不会被很好地加密。模式 == 低熵,低熵 == 糟糕的加密。
你真的触及了一些难点。让我引用过去的一些人的话:-
“My greatest concern was what to call it. I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.”
-克劳德·香农
正如所讨论的那样,elsewhere 'entropy' 通常会在您加密 source/plain 文本时增加。我可能没有深入细节,但根据上面的引述,熵有不同的定义。特别是在密码学中。密文看起来会比较复杂
使用 here 中的 ent
。
我有一个问题需要帮助。 通常,在加密文件中,文件大小比未加密文件大。熵在这些时候会减少吗?我知道熵的计算方式如下 python:
print('myfile.text'.format)
with open(r"C:\Users\Parisa\Desktop\myfile.txt", 'rb') as f:
byteArr = list(f.read())
fileSize = len(byteArr)
print
print('File size in bytes: {:,d}'.format(fileSize))
# calculate the frequency of each byte value in the file
print('Calculating Shannon entropy of file. Please wait...')
freqList = []
for b in range(256):
ctr = 0
for byte in byteArr:
if byte == b:
ctr += 1
freqList.append(float(ctr) / fileSize)
# Shannon entropy
ent = 0.0
for freq in freqList:
if freq > 0:
ent = ent + freq * math.log(freq, 2)
ent = -ent
print('Shannon entropy: {}'.format(ent))[![enter image description here][1]][1]
这是在加密之前
加密文件的熵最大化。熵是文件“随机性”的度量。
这是文本文件通常的样子,根据文件内容绘制熵:
这是加密文件的样子(忽略低熵的微小尖峰,它是由某些 header 信息或等效信息引起的):
加密文件的熵接近 1。如果不是这样,它们就不会被很好地加密。模式 == 低熵,低熵 == 糟糕的加密。
你真的触及了一些难点。让我引用过去的一些人的话:-
“My greatest concern was what to call it. I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.”
-克劳德·香农
正如所讨论的那样,elsewhere 'entropy' 通常会在您加密 source/plain 文本时增加。我可能没有深入细节,但根据上面的引述,熵有不同的定义。特别是在密码学中。密文看起来会比较复杂
使用 here 中的 ent
。